PostgreSQL9.6:新增 Bloom 索引支持任意列组合查询

PostgreSQL 9.6 版本新增 bloom 索引类型,支持任意列的组合查询,我们通过例子演示,假如有以下表结构

测试准备

创建测试表

1
2
3
4
5
6
7
8
create table test_bloom (
i1 int4,
i2 int4,
i3 int4,
i4 int4,
i5 int4,
i6 int4
);

如果应用场景涉及以下查询 SQL,我们可能需要在表 test_bloom 创建四个索引,分别为 index1(i1),index2(i2),index3(i3,i4),index4(i5,i6),一张表上的索引越多插入维护成本越大。

1
2
3
4
select * from test_bloom where i1=xxx;
select * from test_bloom where i2=xxx;
select * from test_bloom where i3=xxx and i4=xxx;
select * from test_bloom where i5=xxx and i6=xxx;

9.6 版本开始支持 bloom 索引,这时只需要创建一个索引就可以了,接着测试。

插入测试数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
insert into test_bloom(i1,i2,i3,i4,i5,i6)
select
random()*1000000,
random()*1000000,
random()*1000000,
random()*1000000,
random()*1000000,
random()*1000000
from generate_series(1,2000000);

francs=> select * from test_bloom limit 3;
i1 | i2 | i3 | i4 | i5 | i6
--------+--------+--------+--------+--------+--------
315075 | 990584 | 21876 | 778755 | 163148 | 70877
851679 | 329673 | 86038 | 650005 | 37895 | 298422
184483 | 460258 | 237388 | 302736 | 294007 | 366822
(3 rows)

创建 Bloom 模块

1
2
francs=# create extension bloom;
CREATE EXTENSION

创建 bloom 索引

1
2
francs=> create index bloom_idx1 on test_bloom using bloom (i1,i2,i3,i4,i5,i6);
CREATE INDEX

备注:在(i1,i2,i3,i4,i5,i6)列上创建了组合 bloom 索引。

查询测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
francs=> set enable_seqscan=off;
SET

francs=> explain analyze select * from test_bloom where i1=315075 and i2=990584;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_bloom (cost=35692.00..35696.02 rows=1 width=24) (actual time=46.595..52.600 rows=1 loops=1)
Recheck Cond: ((i1 = 315075) AND (i2 = 990584))
Rows Removed by Index Recheck: 4355
Heap Blocks: exact=3657
-> Bitmap Index Scan on bloom_idx1 (cost=0.00..35692.00 rows=1 width=0) (actual time=45.709..45.709 rows=4356 loops=1)
Index Cond: ((i1 = 315075) AND (i2 = 990584))
Planning time: 0.190 ms
Execution time: 52.657 ms
(8 rows)

备注:根据 i1,i2 字段查询,执行时间需要 52 ms 左右,速度有些慢。

1
2
3
4
5
6
7
8
9
10
11
12
francs=> explain analyze select * from test_bloom where i3=21876 and i5=163148;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_bloom (cost=35692.00..35696.02 rows=1 width=24) (actual time=43.754..44.412 rows=1 loops=1)
Recheck Cond: ((i3 = 21876) AND (i5 = 163148))
Rows Removed by Index Recheck: 438
Heap Blocks: exact=428
-> Bitmap Index Scan on bloom_idx1 (cost=0.00..35692.00 rows=1 width=0) (actual time=43.674..43.674 rows=439 loops=1)
Index Cond: ((i3 = 21876) AND (i5 = 163148))
Planning time: 0.156 ms
Execution time: 44.460 ms
(8 rows)

备注:根据 i3,i5 字段查询,执行时间需要 44 ms 左右,速度依然有些慢。

1
2
3
4
5
6
7
8
9
10
11
francs=> create index btree_i3_i5 on test_bloom using btree (i3,i5);
CREATE INDEX

francs=> explain analyze select * from test_bloom where i3=21876 and i5=163148;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using btree_i3_i5 on test_bloom (cost=0.43..8.45 rows=1 width=24) (actual time=0.033..0.035 rows=1 loops=1)
Index Cond: ((i3 = 21876) AND (i5 = 163148))
Planning time: 0.181 ms
Execution time: 0.078 ms
(4 rows)

备注:在 i3,i5 列上创建 btree 组合索引,执行时间只需 0.078 ms ,从以上测试来看 bloom 索引的查询速度比 btree 慢很多。

Bloom 索引选项

bloom indexes accept the following parameters in the WITH clause.

  • length
    Length of signature in uint16 type values
  • col1 ― col16
    Number of bits for corresponding column

备注:这两参数依然不太明白具体含义,其中 length 以 16为单位, 默认值为5,个人推测根据表的数据分布调整这两个参数也许能够提高 bloom 索引查询效率。

举例如下:

1
create index bloom_idx2 on test_bloom using bloom (i1,i2,i3,i4,i5,i6) with ( length=30,col1=5,col2=5,col3=5,col4=5,col5=5,col6=5);

参考

最后推荐和张文升共同编写的《PostgreSQL实战》,本书基于PostgreSQL 10 编写,共18章,重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等,涵盖大量实战用例!

购买链接:https://item.jd.com/12405774.html

PostgreSQL实战
感谢支持!
0%