PostgreSQL10:全文检索支持 JSON 和 JSONB

Add full text search support for JSON and JSONB (Dmitry Dolgov)
This is accessed via ts_headline() and to_tsvector.

PostgreSQ10 全文检索开始支持 JSON 和 JSONB 数据类型,to_tsvector 函数中的输入参数可以看到支持 JSON 和 JSONB。

一、to_tsvector 函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
PostgreSQL  10beta1
francs=> \df *to_tsvector*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-------------------+------------------+---------------------+--------
pg_catalog | array_to_tsvector | tsvector | text[] | normal
pg_catalog | to_tsvector | tsvector | json| normal
pg_catalog | to_tsvector | tsvector | jsonb | normal
pg_catalog | to_tsvector | tsvector | regconfig, json | normal
pg_catalog | to_tsvector | tsvector | regconfig, jsonb| normal
pg_catalog | to_tsvector | tsvector | regconfig, text | normal
pg_catalog | to_tsvector | tsvector | text| normal
(7 rows)

PostgreSQL 9.6.3
des=> \df *to_tsvector*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-------------------+------------------+---------------------+--------
pg_catalog | array_to_tsvector | tsvector | text[] | normal
pg_catalog | to_tsvector | tsvector | regconfig, text | normal
pg_catalog | to_tsvector | tsvector | text| normal
(3 rows)

备注:接下来在10版本做个测试,验证JSON是否支持全文检索。

二、Json 全文检索测试

创建 random_range ()函数

1
2
3
4
5
6
CREATE OR REPLACE FUNCTION random_range(INTEGER, INTEGER)
RETURNS INTEGER
LANGUAGE SQL
AS $$
SELECT ($1 + FLOOR(($2 - $1 + 1) * random() ))::INTEGER;
$$;

创建 random_text_simple()函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
  CREATE OR REPLACE FUNCTION random_text_simple(length INTEGER)
RETURNS TEXT
LANGUAGE PLPGSQL
AS $$
DECLARE
possible_chars TEXT := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
output TEXT := '';
i INT4;
pos INT4;
BEGIN

FOR i IN 1..length LOOP
pos := random_range(1, length(possible_chars));
output := output || substr(possible_chars, pos, 1);
END LOOP;

RETURN output;
END;
$$;

备注:random_text_simple(integer) 函数随机返回指定长度字符串,用来生产测试数据,示例如下:
创建测试函数

1
2
3
4
5
francs=>  select random_text_simple(6);
random_text_simple
--------------------
T87GL1
(1 row)

创建测试表并生成测试数据

1
2
3
4
5
6
7
8
9
10
11
12
13
create table user_ini(id int4 ,user_id int8, user_name character varying(64),create_time timestamp(6)  with time zone default clock_timestamp());
insert into user_ini(id,user_id,user_name) select r,round(random()*1000000), random_text_simple(6) from generate_series(1,1000000) as r;

create table tbl_user_json(id serial, user_info json);
insert into tbl_user_json(user_info) select row_to_json(user_ini) from user_ini;

francs=> select * from tbl_user_json limit 3;
id | user_info
----------------------------------------------------------------------------------------
6000001 | {"id":1,"user_id":999960,"user_name":"GO9H59","create_time":"2017-06-11T17:48:57.178684+08:00"}
6000002 | {"id":2,"user_id":915581,"user_name":"7HZEMH","create_time":"2017-06-11T17:48:57.178881+08:00"}
6000003 | {"id":3,"user_id":68310,"user_name":"L1P1OU","create_time":"2017-06-11T17:48:57.178897+08:00"}
(3 rows)

JSON 数据全文检索测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
francs=>  select  *  from tbl_user_json where to_tsvector('english',user_info)  @@ to_tsquery('english','7HZEMH');
id | user_info
---------+-------------------------------------------------------------------------------------------------
6000002 | {"id":2,"user_id":915581,"user_name":"7HZEMH","create_time":"2017-06-11T17:48:57.178881+08:00"}
(1 row)

francs=> explain analyze select * from tbl_user_json where to_tsvector('english',user_info) @@ to_tsquery('english','7HZEMH');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..94627.42 rows=5000 width=104) (actual time=0.863..2964.579 rows=1 loops=1)
Workers Planned: 3
Workers Launched: 2
-> Parallel Seq Scan on tbl_user_json(cost=0.00..93127.42 rows=1613 width=104) (actual time=1972.931..2960.835 rows=0 loops=3)
Filter: (to_tsvector('english'::regconfig, user_info) @@ '''7hzemh'''::tsquery)
Rows Removed by Filter: 333333
Planning time: 0.117 ms
Execution ti
me: 2966.479 ms
(8 rows)

备注:从上面示例看出, JSON 数据支持全文检索操作,只是速度慢,花了近3秒,接下来创建索引。

创建 gin 索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
francs=> create index idx_gin_tbl_user_json_user_info on tbl_user_json using gin(to_tsvector('english',user_info));
CREATE INDEX

francs=> explain analyze select * from tbl_user_json where to_tsvector('english',user_info) @@ to_tsquery('english','7HZEMH');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_user_json(cost=48.75..6471.39 rows=5000 width=104) (actual time=0.027..0.027 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, user_info) @@ '''7hzemh'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_gin_tbl_user_json_user_info(cost=0.00..47.50 rows=5000 width=0) (actual time=0.019..0.019 rows=
1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, user_info) @@ '''7hzemh'''::tsquery)
Planning time: 0.127 ms
Execution time: 0.058 ms

备注:创建 gin 索引后,走了 Bitmap Index Scan,执行时间下降到 0.058 ms。

三、总结

今天只测试了全文检索对 JSON 数据类型的支持,JSONB 数据类型还没有测试,有兴趣的朋友可测试下,根据 to_tsvector 的定义,可以看到对 JSONB 是支持的;从10版本开始,PostgreSQL 对 JSON 、JSONB 的支持得到了增强。

四、参考

最后推荐和张文升共同编写的《PostgreSQL实战》,本书基于PostgreSQL 10 编写,共18章,重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等,涵盖大量实战用例!

购买链接:https://item.jd.com/12405774.html

PostgreSQL实战
感谢支持!
0%