PostgreSQL11: 分区表增加哈希分区

PostgreSQL 11 Beta3 版本已经发布,最近忙于工作和家里事情没有研究,后面抽时间补上。

PostgreSQL 11 的一个重量级新特性为分区表得到较大增强,例如支持哈希分区(HASH)表,因此 PostgreSQL 支持范围分区(RANGE)、列表分区(LIST)、>哈希分区(HASH)三种分区方式,本文简单演示下哈希分区表。

Hash Partitioning

The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

Hash分区表的分区定义包含两个属性,如下:

  • modulus: 指Hash分区个数。
  • remainder: 指Hash分区键取模余。

创建分区表语法

1
2
3
4
5
CREATE TABLE table_name (  ...  )
[ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) }
CREATE TABLE table_name
PARTITION OF parent_table [ (
) ] FOR VALUES partition_bound_spec

创建数据生成函数

为了便于生成测试数据,创建以下两个函数用来随机生成指定长度的字符串,创建 random_range(int4, int4) 函数如下:

1
2
3
4
5
6
CREATE OR REPLACE FUNCTION random_range(int4, int4)
RETURNS int4
LANGUAGE SQL
AS $$
SELECT ($1 + FLOOR(($2 - $1 + 1) * random() ))::int4;
$$;

接着创建random_text_simple(length int4)函数,此函数会调用random_range(int4, int4)函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CREATE OR REPLACE FUNCTION random_text_simple(length int4)
RETURNS text
LANGUAGE PLPGSQL
AS $$
DECLARE
possible_chars text := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
output text := '';
i int4;
pos int4;
BEGIN

FOR i IN 1..length LOOP
pos := random_range(1, length(possible_chars));
output := output || substr(possible_chars, pos, 1);
END LOOP;

RETURN output;
END;
$$;

random_text_simple(length int4)函数可以随机生成指定长度字符串,如下随机生成含三位字符的字符串。

1
2
3
4
5
mydb=> SELECT random_text_simple(3);
random_text_simple
--------------------
LL9
(1 row)

随机生成含六位字符的字符串,如下所示:

1
2
3
4
5
mydb=> SELECT random_text_simple(6);
random_text_simple
--------------------
B81BPW
(1 row)

后面会用到这个函数生成测试数据。

创建哈希分区父表

1
2
3
4
CREATE TABLE student (
stuname text ,
ctime timestamp(6) without time zone
) PARTITION BY HASH(stuname);

创建索引

1
CREATE INDEX idx_stuendt_stuname on student using btree(stuname);

创建子表

1
2
3
4
CREATE TABLE student_p0 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 0);
CREATE TABLE student_p1 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 1);
CREATE TABLE student_p2 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 2);
CREATE TABLE student_p3 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 3);

查看分区表定义

1
2
3
4
5
6
7
8
9
10
11
12
13
francs=> \d+ student
Table "francs.student"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------+--------------------------------+-----------+----------+---------+----------+--------------+-------------
stuname | text | | | | extended | |
ctime | timestamp(6) without time zone | | | | plain | |
Partition key: HASH (stuname)
Indexes:
"idx_stuendt_stuname" btree (stuname)
Partitions: student_p0 FOR VALUES WITH (modulus 4, remainder 0),
student_p1 FOR VALUES WITH (modulus 4, remainder 1),
student_p2 FOR VALUES WITH (modulus 4, remainder 2),
student_p3 FOR VALUES WITH (modulus 4, remainder 3)

从以上看出表 student 和它的四个分区。

插入测试数据

使用之前创建的函数 random_text_simple() 生成100万测试数据,如下。

1
INSERT INTO student(stuname,ctime) SELECT random_text_simple(6),clock_timestamp() FROM generate_series(1,1000000);

查看分区表数据

表数据如下

1
2
3
4
5
6
7
francs=> SELECT * FROM student LIMIT 3;
stuname | ctime
---------+---------------------
4JJOPN | 2018-09-20 10:45:06
NHQONC | 2018-09-20 10:45:06
8V5BGH | 2018-09-20 10:45:06
(3 rows)

统计分区数据量

1
2
3
4
5
6
7
8
francs=> SELECT tableoid::regclass,count(*) from student group by 1 order by 1;
tableoid | count
------------+--------
student_p0 | 250510
student_p1 | 249448
student_p2 | 249620
student_p3 | 250422
(4 rows)

可见数据均匀分布到了四个分区。

根据分区键查询

1
2
3
4
5
6
7
8
9
10
11
francs=> EXPLAIN ANALYZE SELECT * FROM student WHERE stuname='3LXBEV';
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------

Append (cost=0.42..8.44 rows=1 width=15) (actual time=0.017..0.018 rows=1 loops=1)
-> Index Scan using student_p3_stuname_idx on student_p3 (cost=0.42..8.44 rows=1 width=15) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: (stuname = '3LXBEV'::text)
Planning Time: 0.198 ms
Execution Time: 0.042 ms
(5 rows)

根据分区键stuname查询仅扫描分区 student_p3,并走了索引。

根据非分区键查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
francs=> EXPLAIN ANALYZE SELECT * FROM student WHERE ctime='2018-09-20 10:53:55.48392';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..13761.36 rows=4 width=15) (actual time=37.891..39.183 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..12760.96 rows=4 width=15) (actual time=23.753..35.006 rows=0 loops=3)
-> Parallel Seq Scan on student_p0 (cost=0.00..3196.99 rows=1 width=15) (actual time=0.014..28.550 rows=1 loops=1)
Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
Rows Removed by Filter: 250509
-> Parallel Seq Scan on student_p3 (cost=0.00..3195.34 rows=1 width=15) (actual time=29.543..29.543 rows=0 loops=1)
Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
Rows Removed by Filter: 250422
-> Parallel Seq Scan on student_p2 (cost=0.00..3185.44 rows=1 width=15) (actual time=8.260..8.260 rows=0 loops=3)
Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
Rows Removed by Filter: 83207
-> Parallel Seq Scan on student_p1 (cost=0.00..3183.18 rows=1 width=15) (actual time=22.135..22.135 rows=0 loops=1)
Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
Rows Removed by Filter: 249448
Planning Time: 0.183 ms
Execution Time: 39.219 ms
(18 rows)

根据非分区键ctime查询扫描了分区表所有分区。

总结

本文演示了 PostgreSQL 哈希分区表的创建、测试数据的生成导入和查询计划,后面博客演示分区表增强的其它方面。

参考

最后推荐和张文升共同编写的《PostgreSQL实战》,本书基于PostgreSQL 10 编写,共18章,重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等,涵盖大量实战用例!

购买链接:https://item.jd.com/12405774.html

PostgreSQL实战
感谢支持!
0%