PostgreSQL 13: 支持增量排序(Incremental Sorting)

PostgreSQL 13 版本的一个重要特性是支持增量排序(Incremental Sorting),加速数据排序,例如以下SQL:

SELECT * FROM t ORDER BY a,b LIMIT 10;

如果在字段a上建立了索引,由于索引是排序的,查询结果集的a字段是已排序的,这种场景下,PostgreSQL 13 的增量排序可以发挥重要作用,大幅加速查询,因为ORDER BY a,b中的字段a是已排序好的,只需要在此基础上对字段b进行批量排序即可。

关于增量排序,手册说明如下。

手册说明

PostgreSQL 13 adds incremental sorting, which accelerates sorting data when data that is sorted from earlier parts of a query are already sorted

Implement incremental sorting (James Coleman, Alexander Korotkov, Tomas Vondra)

If a result is already sorted by several leading keys, this allows for batch sorting of additional trailing keys because the previous keys are already equal. This is controlled by enable_incrementalsort.

Enable_incrementalsort

PostgreSQL 13 版本的Planner Method Configuration新增了enable_incrementalsort参数控制是否开启增量排序,此参数默认开启。

测试准备

计划在PostgreSQL 13 演示增量排序,创建测试表,并插入测试数据,如下:

1
2
3
4
CREATE TABLE t_is(a int4,b int4,ctime timestamp(6) without time zone);
INSERT INTO t_is(a,b,ctime) SELECT n,round(random()*100000000), clock_timestamp() FROM generate_series(1,1000000) n;
INSERT INTO t_is(a,b,ctime) SELECT n,round(random()*100000000), clock_timestamp() FROM generate_series(1,1000000) n;
CREATE INDEX idx_t_is_a ON t_is USING BTREE(a);

查看测试数据,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
postgres=# SELECT * FROM t_is ORDER BY a,b LIMIT 10;
a | b | ctime
---+----------+----------------------------
1 | 60379526 | 2020-07-21 16:28:42.034869
1 | 73197294 | 2020-07-21 16:28:45.496297
2 | 943408 | 2020-07-21 16:28:45.496346
2 | 27584454 | 2020-07-21 16:28:42.036121
3 | 31616182 | 2020-07-21 16:28:45.496348
3 | 88997913 | 2020-07-21 16:28:42.036134
4 | 21557231 | 2020-07-21 16:28:45.49635
4 | 23206459 | 2020-07-21 16:28:42.036136
5 | 13268559 | 2020-07-21 16:28:45.496351
5 | 33672766 | 2020-07-21 16:28:42.036137
(10 rows)

PostgreSQL 13 测试

当开启enable_incrementalsort时,执行以下SQL,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
postgres=# SHOW enable_incrementalsort ;
enable_incrementalsort
------------------------
on
(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM t_is ORDER BY a,b LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.51..1.16 rows=10 width=16) (actual time=0.042..0.044 rows=10 loops=1)
-> Incremental Sort (cost=0.51..130115.72 rows=2000000 width=16) (actual time=0.041..0.042 rows=10 loops=1)
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
-> Index Scan using idx_t_is_a on t_is (cost=0.43..58848.31 rows=2000000 width=16) (actual time=0.021..0.027 rows=11 loops=1)
Planning Time: 0.106 ms
Execution Time: 0.064 ms
(8 rows)

执行计划中重点关注Presorted Key: aIncremental Sort两行,SQL执行时间为 0.064ms。

当关闭enable_incrementalsort时,执行以下SQL,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
postgres=# set enable_incrementalsort=off ;
SET

postgres=# EXPLAIN ANALYZE SELECT * FROM t_is ORDER BY a,b LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=38152.38..38153.55 rows=10 width=16) (actual time=157.108..157.112 rows=10 loops=1)
-> Gather Merge (cost=38152.38..232610.33 rows=1666666 width=16) (actual time=157.106..159.975 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=37152.36..39235.69 rows=833333 width=16) (actual time=145.993..145.993 rows=10 loops=3)
Sort Key: a, b
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Seq Scan on t_is (cost=0.00..19144.33 rows=833333 width=16) (actual time=0.067..68.815 rows=666667 loops=3)
Planning Time: 0.144 ms
Execution Time: 160.027 ms
(12 rows)

执行计划中没有Incremental Sort相关信息,SQL执行时间上升为 160.027ms,性能相比开启增量排序时差了好几个数量级。

PostgreSQL 12 测试

PostgreSQL 12 也创建以上测试表并加载数据,执行以下SQL,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
postgres=# EXPLAIN ANALYZE SELECT * FROM t_is ORDER BY a,b LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=38152.38..38153.55 rows=10 width=16) (actual time=144.892..144.896 rows=10 loops=1)
-> Gather Merge (cost=38152.38..232610.33 rows=1666666 width=16) (actual time=144.891..147.211 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=37152.36..39235.69 rows=833333 width=16) (actual time=141.592..141.593 rows=8 loops=3)
Sort Key: a, b
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Seq Scan on t_is (cost=0.00..19144.33 rows=833333 width=16) (actual time=0.016..72.129 rows=666667 loops=3)
Planning Time: 0.087 ms
Execution Time: 147.247 ms
(12 rows)

执行计划和 PostgreSQL 13 版本关闭enable_incrementalsort时一样。

总结

本文演示了 PostgreSQL 13 版本的增量排序,在本文的测试场景中,开启增量排序比不开启增量排序的SQL执行性能上升了好几个数量级,但并不能说明绝大多数开启了增量排序的SQL性能要比不开启性能高出好几个数量级,个人推测与排序的字段组合及排序字段数据分布也有关系,本文不做进一步的分析,有兴趣的朋友可构造测试案例进一步测试。

参考

最后推荐和张文升共同编写的《PostgreSQL实战》,本书基于PostgreSQL 10 编写,共18章,重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等,涵盖大量实战用例!

购买链接:https://item.jd.com/12405774.html

PostgreSQL实战
感谢支持!
0%