Задать вопрос

Как ускорить ORDER BY в Postgres?

Приветствую. Подскажите, пожалуйста, есть такая ситуация.
Таблица дерева каталога.
e20b3a1fb3fc20782168cfdaa3f7d2b536603606
Таблица соответствий
f2e687765aa089f2719ce10fbb043721aeff1fc4

Есть запрос, который строит это дерево в табличном виде

SELECT DISTINCT t0.groupname, t1.groupname,  t2.groupname
FROM grouptree as t0 
left join models as m on m.id = t0.idmodel 
left join grouptree as t1 on t1.parent = t0.groupno AND t1.idmodel = t0.idmodel
left join grouptree as t2 on t2.parent = t1.groupno AND t2.idmodel = t1.idmodel
where t0.parent = 0 AND m.typeauto = 0


В условиях мы выбираем первый уровень дерева и отношение к типу автомобилей.
Вот план запроса
"HashAggregate  (cost=144714.35..145034.98 rows=32063 width=117) (actual time=1209.785..1211.691 rows=10716 loops=1)"
"  ->  Nested Loop Left Join  (cost=2379.07..144473.88 rows=32063 width=117) (actual time=28.075..840.266 rows=1391881 loops=1)"
"        ->  Nested Loop Left Join  (cost=2378.64..128016.41 rows=32063 width=86) (actual time=28.066..171.952 rows=264099 loops=1)"
"              ->  Hash Join  (cost=2378.21..25071.47 rows=32063 width=47) (actual time=28.057..69.569 rows=30423 loops=1)"
"                    Hash Cond: (t0.idmodel = m.id)"
"                    ->  Bitmap Heap Scan on grouptree t0  (cost=660.19..22638.32 rows=35066 width=47) (actual time=20.854..35.009 rows=34960 loops=1)"
"                          Recheck Cond: (parent = 0)"
"                          ->  Bitmap Index Scan on "2-parent-to-idmodel"  (cost=0.00..651.42 rows=35066 width=0) (actual time=15.255..15.255 rows=34960 loops=1)"
"                                Index Cond: (parent = 0)"
"                    ->  Hash  (cost=1346.41..1346.41 rows=29729 width=4) (actual time=7.171..7.171 rows=29715 loops=1)"
"                          Buckets: 4096  Batches: 1  Memory Usage: 1045kB"
"                          ->  Seq Scan on models m  (cost=0.00..1346.41 rows=29729 width=4) (actual time=0.004..4.488 rows=29715 loops=1)"
"                                Filter: (typeauto = 0)"
"                                Rows Removed by Filter: 2798"
"              ->  Index Scan using "2-parent-to-idmodel" on grouptree t1  (cost=0.43..3.20 rows=1 width=51) (actual time=0.001..0.002 rows=9 loops=30423)"
"                    Index Cond: ((parent = t0.groupno) AND (idmodel = t0.idmodel))"
"        ->  Index Scan using "2-parent-to-idmodel" on grouptree t2  (cost=0.43..0.50 rows=1 width=47) (actual time=0.001..0.002 rows=5 loops=264099)"
"              Index Cond: ((parent = t1.groupno) AND (idmodel = t1.idmodel))"
"Total runtime: 1212.145 ms"


Все хорошо и работает как нужно. Но хотелось бы отсортировать данные.
Добавляю всего 1 строку в конец
...
order by t0.groupname ASC


Картина в корне меняется. И время выполнения запроса увеличивается в 20 раз.
"Unique  (cost=146873.58..147194.21 rows=32063 width=117) (actual time=21456.087..21864.988 rows=10716 loops=1)"
"  Output: t0.groupname, t1.groupname, t2.groupname"
"  Buffers: shared hit=1240970"
"  ->  Sort  (cost=146873.58..146953.73 rows=32063 width=117) (actual time=21456.085..21594.183 rows=1391881 loops=1)"
"        Output: t0.groupname, t1.groupname, t2.groupname"
"        Sort Key: t0.groupname, t1.groupname, t2.groupname"
"        Sort Method: quicksort  Memory: 312756kB"
"        Buffers: shared hit=1240970"
"        ->  Nested Loop Left Join  (cost=2379.07..144473.88 rows=32063 width=117) (actual time=25.376..867.222 rows=1391881 loops=1)"
"              Output: t0.groupname, t1.groupname, t2.groupname"
"              Buffers: shared hit=1240970"
"              ->  Nested Loop Left Join  (cost=2378.64..128016.41 rows=32063 width=86) (actual time=25.366..172.208 rows=264099 loops=1)"
"                    Output: t0.groupname, t1.groupname, t1.idmodel, t1.groupno"
"                    Buffers: shared hit=162206"
"                    ->  Hash Join  (cost=2378.21..25071.47 rows=32063 width=47) (actual time=25.357..66.067 rows=30423 loops=1)"
"                          Output: t0.groupname, t0.idmodel, t0.groupno"
"                          Hash Cond: (t0.idmodel = m.id)"
"                          Buffers: shared hit=21731"
"                          ->  Bitmap Heap Scan on public.grouptree t0  (cost=660.19..22638.32 rows=35066 width=47) (actual time=17.457..31.343 rows=34960 loops=1)"
"                                Output: t0.idmodel, t0.groupno, t0.parent, t0.groupname, t0.groupnameen, t0.pictureindex, t0.mark, t0.sortorder"
"                                Recheck Cond: (t0.parent = 0)"
"                                Buffers: shared hit=20791"
"                                ->  Bitmap Index Scan on "2-parent-to-idmodel"  (cost=0.00..651.42 rows=35066 width=0) (actual time=14.128..14.128 rows=34960 loops=1)"
"                                      Index Cond: (t0.parent = 0)"
"                                      Buffers: shared hit=98"
"                          ->  Hash  (cost=1346.41..1346.41 rows=29729 width=4) (actual time=7.868..7.868 rows=29715 loops=1)"
"                                Output: m.id"
"                                Buckets: 4096  Batches: 1  Memory Usage: 1045kB"
"                                Buffers: shared hit=940"
"                                ->  Seq Scan on public.models m  (cost=0.00..1346.41 rows=29729 width=4) (actual time=0.003..5.048 rows=29715 loops=1)"
"                                      Output: m.id"
"                                      Filter: (m.typeauto = 0)"
"                                      Rows Removed by Filter: 2798"
"                                      Buffers: shared hit=940"
"                    ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t1  (cost=0.43..3.20 rows=1 width=51) (actual time=0.001..0.002 rows=9 loops=30423)"
"                          Output: t1.idmodel, t1.groupno, t1.parent, t1.groupname, t1.groupnameen, t1.pictureindex, t1.mark, t1.sortorder"
"                          Index Cond: ((t1.parent = t0.groupno) AND (t1.idmodel = t0.idmodel))"
"                          Buffers: shared hit=140475"
"              ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t2  (cost=0.43..0.50 rows=1 width=47) (actual time=0.001..0.002 rows=5 loops=264099)"
"                    Output: t2.idmodel, t2.groupno, t2.parent, t2.groupname, t2.groupnameen, t2.pictureindex, t2.mark, t2.sortorder"
"                    Index Cond: ((t2.parent = t1.groupno) AND (t2.idmodel = t1.idmodel))"
"                    Buffers: shared hit=1078764"
"Total runtime: 21879.380 ms"


Видно из профайлера что на сортировке он застревает. Причем сортирует сразу по 3 полям, хотя указано только 1. Подскажите, что я делаю не так ?
  • Вопрос задан
  • 3319 просмотров
Подписаться 3 Оценить Комментировать
Пригласить эксперта
Ответы на вопрос 2
@whats Автор вопроса
work_mem - 1GB Сортируется в памяти. Какие еще параметры нужно назвать для информативности ?

Кстати если сделать вот так :
select * from (SELECT DISTINCT t0.groupname as s1, t1.groupname,  t2.groupname
FROM grouptree as t0 
left join models as m on m.id = t0.idmodel 
left join grouptree as t1 on t1.parent = t0.groupno AND t1.idmodel = t0.idmodel
left join grouptree as t2 on t2.parent = t1.groupno AND t2.idmodel = t1.idmodel
where t0.parent = 0 AND m.typeauto = 0) as t
order by t.s1 ASC

Что, я считаю, в корне не верно, то запрос отработает за секунды.
"Sort  (cost=147755.31..147835.47 rows=32063 width=117) (actual time=1241.491..1241.930 rows=10716 loops=1)"
"  Output: t0.groupname, t1.groupname, t2.groupname"
"  Sort Key: t0.groupname"
"  Sort Method: quicksort  Memory: 2612kB"
"  Buffers: shared hit=1240970"
"  ->  HashAggregate  (cost=144714.35..145034.98 rows=32063 width=117) (actual time=1228.137..1229.846 rows=10716 loops=1)"
"        Output: t0.groupname, t1.groupname, t2.groupname"
"        Buffers: shared hit=1240970"
"        ->  Nested Loop Left Join  (cost=2379.07..144473.88 rows=32063 width=117) (actual time=29.937..852.961 rows=1391881 loops=1)"
"              Output: t0.groupname, t1.groupname, t2.groupname"
"              Buffers: shared hit=1240970"
"              ->  Nested Loop Left Join  (cost=2378.64..128016.41 rows=32063 width=86) (actual time=29.927..172.365 rows=264099 loops=1)"
"                    Output: t0.groupname, t1.groupname, t1.idmodel, t1.groupno"
"                    Buffers: shared hit=162206"
"                    ->  Hash Join  (cost=2378.21..25071.47 rows=32063 width=47) (actual time=29.915..67.004 rows=30423 loops=1)"
"                          Output: t0.groupname, t0.idmodel, t0.groupno"
"                          Hash Cond: (t0.idmodel = m.id)"
"                          Buffers: shared hit=21731"
"                          ->  Bitmap Heap Scan on public.grouptree t0  (cost=660.19..22638.32 rows=35066 width=47) (actual time=22.059..35.273 rows=34960 loops=1)"
"                                Output: t0.idmodel, t0.groupno, t0.parent, t0.groupname, t0.groupnameen, t0.pictureindex, t0.mark, t0.sortorder"
"                                Recheck Cond: (t0.parent = 0)"
"                                Buffers: shared hit=20791"
"                                ->  Bitmap Index Scan on "2-parent-to-idmodel"  (cost=0.00..651.42 rows=35066 width=0) (actual time=14.175..14.175 rows=34960 loops=1)"
"                                      Index Cond: (t0.parent = 0)"
"                                      Buffers: shared hit=98"
"                          ->  Hash  (cost=1346.41..1346.41 rows=29729 width=4) (actual time=7.824..7.824 rows=29715 loops=1)"
"                                Output: m.id"
"                                Buckets: 4096  Batches: 1  Memory Usage: 1045kB"
"                                Buffers: shared hit=940"
"                                ->  Seq Scan on public.models m  (cost=0.00..1346.41 rows=29729 width=4) (actual time=0.003..5.017 rows=29715 loops=1)"
"                                      Output: m.id"
"                                      Filter: (m.typeauto = 0)"
"                                      Rows Removed by Filter: 2798"
"                                      Buffers: shared hit=940"
"                    ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t1  (cost=0.43..3.20 rows=1 width=51) (actual time=0.001..0.002 rows=9 loops=30423)"
"                          Output: t1.idmodel, t1.groupno, t1.parent, t1.groupname, t1.groupnameen, t1.pictureindex, t1.mark, t1.sortorder"
"                          Index Cond: ((t1.parent = t0.groupno) AND (t1.idmodel = t0.idmodel))"
"                          Buffers: shared hit=140475"
"              ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t2  (cost=0.43..0.50 rows=1 width=47) (actual time=0.001..0.002 rows=5 loops=264099)"
"                    Output: t2.idmodel, t2.groupno, t2.parent, t2.groupname, t2.groupnameen, t2.pictureindex, t2.mark, t2.sortorder"
"                    Index Cond: ((t2.parent = t1.groupno) AND (t2.idmodel = t1.idmodel))"
"                    Buffers: shared hit=1078764"
"Total runtime: 1242.392 ms"


Как правильно отсортировать ?
Ответ написан
Комментировать
lesovsky
@lesovsky
System engineer and PostgreSQL DBA
вот ответ от более опытных коллег (я еще сам учусь))):
конкретно тот запрос не ускоряется ни как... проблему при order by снимать через WITH или offset 0 в подзапросе и только после накладыванием сортировки.
т.е. там сама постановка задачи с 1.4M строк в сортировке и 10k строк на выходе - не рабочая

вот как-то так, совсем безрадостно ((
Ответ написан
Комментировать
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы