PostgreSQL 12: CTE 支持 Inlined With Queries

PostgreSQL 的 CTE( common table expressions ) 支持较复杂的查询,比如递归查询等场景, 12 版本之前 CTE 的 WITH 语句都是直接物化的,也就是说 WITH 语句执行一次并保持到一个类似的临时表中,供 WITH 语句外层的SQL引用,当 INSERT/UPDATE/DELETE 做CTE的 WITH 语句时是非常恰当的。

PostgreSQL 12 版本的一个重要特性是 CTE 支持 Inlined WITH queries,也就是说当 SELECT 做为CTE 的 WITH 语句时,支持将 WITH 语句中的查询条件下推到外层SQL中,从而提升 CTE 语句性能。

发行说明

Allow common table expressions (CTE) to be inlined in later parts of the query (Andreas Karlsson, Andrew Gierth, David Fetter, Tom Lane)

Specifically, CTEs are inlined if they are not recursive and are referenced only once later in the query. Inlining can be prevented by specifying MATERIALIZED, and forced by specifying NOT MATERIALIZED. Previously, CTEs were never inlined and were always evaluated before the rest of the query.

根据手册说明, CTE 的 Inlined WITH Queries 需要满足以下条件:

  • Are not recursive,非递归
  • Are referenced only once later in the query,外层查询仅调用一次

补丁说明

Tom Lane 大神提交的补丁,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
commit: 608b167f9f9c4553c35bb1ec0eab9ddae643989b
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Sat, 16 Feb 2019 16:11:12 -0500
Allow user control of CTE materialization, and change the default behavior.

Historically we haveve always materialized the full output of a CTE query,
treating WITH as an optimization fence (so that, for example, restrictions
from the outer query cannot be pushed into it). This is appropriate when
the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE
query is non-recursive and side-effect-free, there iss no hazard of changing
the query results by pushing restrictions down.

Another argument for materialization is that it can avoid duplicate
computation of an expensive WITH query --- but that only applies if
the WITH query is called more than once in the outer query. Even then
it could still be a net loss, if each call has restrictions that
would allow just a small part of the WITH query to be computed.

Hence, let s change the behavior for WITH queries that are non-recursive
and side-effect-free. By default, we will inline them into the outer
query (removing the optimization fence) if they are called just once.
If they are called more than once, we will keep the old behavior by
default, but the user can override this and force inlining by specifying
NOT MATERIALIZED. Lastly, the user can force the old behavior by
specifying MATERIALIZED; this would mainly be useful when the query had
deliberately been employing WITH as an optimization fence to prevent a
poor choice of plan.

Andreas Karlsson, Andrew Gierth, David Fetter

Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk

根据手册说明和补丁说明不一定能很好理解,下面通过实验演示。

环境准备

测试环境为1台8核16G虚拟机,创建测试表并插入 500 万数据,如下:

1
2
3
CREATE TABLE t (id int4, name text);
INSERT INTO t(id,name) SELECT n, 'francs_' || n FROM generate_series(1,5000000) n;
ALTER TABLE t ADD PRIMARY KEY (id);

构造一条 SELECT 做为 CTE 的 WITH 语句,如下:

1
2
3
4
WITH x AS (
SELECT * FROM t
)
SELECT * FROM x where id=1;

PostgreSQL 11 测试

11 版本测试如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
postgres=# EXPLAIN ANALYZE WITH x AS (
SELECT * FROM t
)
SELECT * FROM x where id=1;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
CTE Scan on x (cost=81838.00..194338.00 rows=25000 width=36) (actual time=0.057..1557.140 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 4999999
CTE x
-> Seq Scan on t (cost=0.00..81838.00 rows=5000000 width=36) (actual time=0.050..525.329 rows=5000000 loops=1
)
Planning Time: 0.128 ms
Execution Time: 1591.979 ms
(7 rows)

Time: 1592.534 ms (00:01.593)

从执行计划看出分两步走,首先是在表 t 上做全表扫描,之后进行 CTE Scan 并过滤 id=1 的记录,以上并没有走索引,执行时间为 1592.534 ms,性能很底。

PostgreSQL 12 测试

12 版本测试如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
postgres=# EXPLAIN ANALYZE WITH x AS (
SELECT * FROM t
)
SELECT * FROM x where id=1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using t_pkey on t (cost=0.43..8.45 rows=1 width=36) (actual time=0.021..0.022 rows=1 loops=1)
Index Cond: (id = 1)
Planning Time: 0.087 ms
Execution Time: 0.043 ms
(4 rows)

Time: 0.469 ms

从执行计划看出,将 WITH 语句中的条件( id=1 ) 下推到了外层SQL中,这样直接走了 Index Scan,执行时间降为 0.469 ms,大辐提升了 CTE 语句性能。

用户可设置是否使用 MATERIALIZED

12 版本用户可在 CTE 语句中控制是否使用物化(MATERIALIZED)。

以下 CTE 语句设置使用 MATERIALIZED,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
postgres=# EXPLAIN ANALYZE WITH x AS MATERIALIZED (
SELECT * FROM t
)
SELECT * FROM x where id=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
CTE Scan on x (cost=81837.99..194337.97 rows=25000 width=36) (actual time=0.023..1607.012 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 4999999
CTE x
-> Seq Scan on t (cost=0.00..81837.99 rows=4999999 width=18) (actual time=0.015..556.583 rows=5000000 loops=1)
Planning Time: 0.245 ms
Execution Time: 1640.071 ms
(7 rows)

Time: 1641.072 ms (00:01.641)

以下 CTE 语句设置不使用 MATERIALIZED,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
postgres=# EXPLAIN ANALYZE WITH x AS NOT MATERIALIZED (
SELECT * FROM t
)
SELECT * FROM x where id=1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using t_pkey on t (cost=0.43..8.45 rows=1 width=18) (actual time=0.072..0.074 rows=1 loops=1)
Index Cond: (id = 1)
Planning Time: 0.134 ms
Execution Time: 0.099 ms
(4 rows)

Time: 0.801 ms

总结

PostgreSQL 12 版本的 CTE 支持 Inlined WITH Queries 特性,由于 WITH 查询语句的条件可以外推到外层查询,避免中间结果数据产生,同时使用相关索引,从而大辐提升 CTE 性能。

参考

最后推荐和张文升共同编写的《PostgreSQL实战》,本书基于PostgreSQL 10 编写,共18章,重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等,涵盖大量实战用例!

购买链接:https://item.jd.com/12405774.html

PostgreSQL实战
感谢支持!
0%