Patroni + Etcd 实现高可用之一: 安装部署

PostgreSQL 至今都没有特别流行的高可用方案,之前介绍过两种高可用方案,如下:

以上两种方案都能实现基本的 PostgreSQL 高可用,当主库宕机时,备库能正常接管,但存在一些不足。

Pgpool-II方案

  • 故障切换时并不会判断备库与主库存在的数据差异,这块逻辑需要自定义。
  • 性能下降明显,根据之前的只读场景测试,大概降幅有20%,详见Pgpool-II 流复制模式压力测试
  • 自动化程度不高,例如备库实例挂掉后不会自动拉起,主备切换后,需人工介入将老的主库进行角色转换。

Keepalived方案

  • 程序移值性不好,不方便部署和安装。
  • 自动化程度不高,例如备库实例挂掉后不会自动拉起,主备切换后,需人工介入将老的主库进行角色转换。

Patroni + Etcd 方案优点

最近在在看基于 Patroni + Etcd 的高可用方案,此方案使用Patroni管理本地库,并结合Etcd作为数据存储和主节点选举,具有以下优势:

  • 健壮性: 使用分布式key-value数据库作为数据存储,主节点故障时进行主节点重新选举,具有很强的健壮性。
  • 支持多种复制方式: 基于内置流复制,支持同步流复制、异步流复制、级联复制。
  • 支持主备延迟设置: 可以设置备库延迟主库WAL的字节数,当备库延迟大于指定值时不做故障切换。
  • 自动化程度高: 1)支持自动化初始PostgreSQL实例并部署流复制; 2)当备库实例关闭后,支持自动拉起; 3)当主库实例关闭后,首先会尝试自动拉起; 4)支持switchover命令,能自动将老的主库进行角色转换。
  • 避免脑裂: 数据库信息记录到 ETCD 中,通过优化部署策略(多机房部署、增加实例数)可以避免脑裂。

Patroni 简介

Patroni基于Python开发的模板,结合DCS(例如 ZooKeeper, etcd, Consul )可以定制PostgreSQL高可用方案。

Patroni并不是一套拿来即用的PostgreSQL高可用组件,涉及较多的配置和定制工作。

Patroni接管PostgreSQL数据库的启停,同时监控本地的PostgreSQL数据库,并将本地的PostgreSQL数据库信息写入DCS。

Patroni的主备端是通过是否能获得 leader key 来控制的,获取到了leader key的Patroni为主节点,其它的为备节点。

Etcd 简介

Etcd是一款基于Raft算法和协议开发的分布式 key-value 数据库,基于Go语言编写,Patroni监控本地的PostgreSQL状态,并将相关信息写入Etcd,每个Patroni都能读写Etcd上的key,从而获取外地PostgreSQL数据库信息。

当Etcd的leader节点不可用时,Etcd会一致性的选择一个合适的节点作为主节点,新的Etcd主节点将获取leader key,因此建议Etcd集群为三个以上且为奇数的节点,不建议部署在同一个机房,有条件话尽量部署在三个机房。

一个标准的3节点etcd集群,最大容许1个节点故障。

部署规划

为了演示方便,使用笔记本上的三台虚机作为演示环境,软件版本规划如下:

操作系统: CentOS Linux 7.7
数据库: PostgreSQL 12.2
Python: Python 3.8.2
Etcd: etcd-v3.4.7
Patroni: patroni 1.6.5

部署规划如下:

主机 IP 组件 备注
ydtf01 192.168.2.11 PostgreSQL、Patroni、Etcd 主节点
ydtf02 192.168.2.12 PostgreSQL、Patroni、Etcd 备节点1
ydtf03 192.168.2.13 PostgreSQL、Patroni、Etcd 备节点2

环境准备

虽然Patroni支持自动化初始化PostgreSQL数据库并部署流复制,这两块工作个人建议还是手工来做比较好。

部署一主两从PostgreSQL流复制,主机ydtf01部署主库,ydtf02、ydtf03部署从库,流复制的部署可参考 PostgreSQL:使用 pg_basebackup 搭建流复制环境

为了演示方便,关闭ydtf01、ydtf02、ydtf03主机上的防火墙,如下:

1
2
# systemctl stop firewalld.service
# systemctl disable firewalld.service

由于本文的部分工具需要通 epel源安装,安装 epel 源,如下:

1
rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

安装python3,如下:

1
2
3
4
# wget -c https://www.python.org/ftp/python/3.8.2/Python-3.8.2.tar.xz
# ./configure
# make
# make install

创建软链接,如下:

1
2
rm -f /usr/bin/python
ln -s /usr/local/bin/python3 /usr/bin/python

python3安装后’yum’命令执行会报错,需要修改以下配置:

1
2
/usr/bin/yum: 将文件第一行改为/usr/bin/python2.7。(2.7.x也改为2.7)
/usr/libexec/urlgrabber-ext-down: 将文件第一行改为/usr/bin/python2.7

Etcd安装

三台主机上下载并安装ETCD,如下:

1
2
3
4
# wget -c https://github.com/etcd-io/etcd/releases/download/v3.4.7/etcd-v3.4.7-linux-amd64.tar.gz
# tar xvf etcd-v3.4.7-linux-amd64.tar.gz
# mv etcd-v3.4.7-linux-amd64 etcd-v3.4.7
# mv etcd-v3.4.7 /opt

ydtf01 主机创建 start_etcd1.sh 启动脚本,如下:

1
2
3
4
5
6
7
8
9
etcd --name etcd_ydtf01 \
--initial-advertise-peer-urls http://192.168.2.11:2380 \
--listen-peer-urls http://192.168.2.11:2380 \
--listen-client-urls http://192.168.2.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.2.11:2379 \
--initial-cluster-token etcd-cluster-ydtf \
--initial-cluster etcd_ydtf01=http://192.168.2.11:2380,etcd_ydtf02=http://192.168.2.12:2380,etcd_ydtf03=http://192.168.2.13:2380 \
--initial-cluster-state new \
--enable-v2

ydtf02 主机创建 start_etcd2.sh 启动脚本,如下:

1
2
3
4
5
6
7
8
9
etcd --name etcd_ydtf02 \
--initial-advertise-peer-urls http://192.168.2.12:2380 \
--listen-peer-urls http://192.168.2.12:2380 \
--listen-client-urls http://192.168.2.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.2.12:2379 \
--initial-cluster-token etcd-cluster-ydtf \
--initial-cluster etcd_ydtf01=http://192.168.2.11:2380,etcd_ydtf02=http://192.168.2.12:2380,etcd_ydtf03=http://192.168.2.13:2380 \
--initial-cluster-state new \
--enable-v2

ydtf03 主机创建 start_etcd3.sh 启动脚本,如下:

1
2
3
4
5
6
7
8
9
etcd --name etcd_ydtf03 \
--initial-advertise-peer-urls http://192.168.2.13:2380 \
--listen-peer-urls http://192.168.2.13:2380 \
--listen-client-urls http://192.168.2.13:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.2.13:2379 \
--initial-cluster-token etcd-cluster-ydtf \
--initial-cluster etcd_ydtf01=http://192.168.2.11:2380,etcd_ydtf02=http://192.168.2.12:2380,etcd_ydtf03=http://192.168.2.13:2380 \
--initial-cluster-state new \
--enable-v2

启动etcd,如下:

1
2
3
# ./start_etcd1.sh > start_etcd1.log 2>&1 &
# ./start_etcd2.sh > start_etcd2.log 2>&1 &
# ./start_etcd3.sh > start_etcd2.log 2>&1 &

Patroni 安装

检查并安装依赖包,如下:

1
2
# cd /opt/soft_bak/patroni-master/
# pip3 install patroni[etcd] -i https://mirrors.aliyun.com/pypi/simple/

安装 Patroni,如下:

1
2
# pip3 install psycopg2-binary -i https://mirrors.aliyun.com/pypi/simple/
# pip3 install patroni -i https://mirrors.aliyun.com/pypi/simple/

ydtf01主机配置pg_ydtf01.yml,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
scope: pg_ydtf
namespace: /service/
name: pg_ydtf01

restapi:
listen: 192.168.2.11:8008
connect_address: 192.168.2.11:8008

etcd:
#Provide host to do the initial discovery of the cluster topology:
host: 192.168.2.11:2379

bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
#standby_cluster:
#host: 127.0.0.1
#port: 1111
#primary_slot_name: patroni
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: locical
hot_standby: "on"
wal_keep_segments: 128
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "on"
# primary_conninfo: 'host=192.168.2.11 port=1921 user=repuser'
hot_standby: on
archive_timeout: 1800s

postgresql:
listen: 0.0.0.0:1921
connect_address: 192.168.2.11:1921
data_dir: /pgdata/pg12/pg_root
bin_dir: /opt/pgsql/bin
config_dir: /pgdata/pg12/pg_root
pgpass: /home/pg12/patroni/.pgpass
authentication:
replication:
username: repuser
password: xxx
superuser:
username: postgres
password: xxx
rewind: # Has no effect on postgres 10 and lower
username: rewind_user
password: xxx

tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false

包括全局参数、restapi模块参数、etcd模块参数、bootstrap启动参数、postgresql模块参数,主要参数解释如下:

  • scope: 标记cluster名称,同 postgresql.conf 的 cluster_name 参数,二级目录名: /<namespace>/<scope>/config
  • namespace: 一级目录名: /<namespace>/<scope>/config
  • name: patroni节点名称。

更多参数解释详见: YAML Configuration Settings

ydtf02、ydtf03的参数大部分和ydtf01相同,仅需修改全局参数name、restapi模块的listen和connect_address参数、etcd模块的host参数,以及postgresql模块的connect_address参数。

三台主机分别启动 patroni ,如下:

1
2
3
$patroni pg_ydtf01.yml> pg_ydtf01.log 2>&1 &
$patroni pg_ydtf02.yml> pg_ydtf02.log 2>&1 &
$patroni pg_ydtf03.yml> pg_ydtf03.log 2>&1 &

查看 patroni 集群,如下:

1
2
3
4
5
6
7
8
[pg12@ydtf01 patroni]$ patronictl -c /home/pg12/patroni/pg_ydtf01.yml list
+ Cluster: pg_ydtf (6823331807018309697) +---------+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
+-----------+-------------------+--------+---------+----+-----------+-----------------+
| pg_ydtf01 | 192.168.2.11:1921 | Leader | running | 22 | | * |
| pg_ydtf02 | 192.168.2.12:1921 | | running | 22 | 0 | |
| pg_ydtf03 | 192.168.2.13:1921 | | running | 22 | 0 | * |
+-----------+-------------------+--------+---------+----+-----------+-----------------+

查看 Patroni 状态

查看 cluster 状态,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[pg12@ydtf01 patroni]$ curl -s "http://192.168.2.11:8008/cluster" | jq .
{
"members": [
{
"name": "pg_ydtf01",
"role": "replica",
"state": "running",
"api_url": "http://192.168.2.11:8008/patroni",
"host": "192.168.2.11",
"port": 1921,
"timeline": 17,
"lag": 0
},
{
"name": "pg_ydtf02",
"role": "leader",
"state": "running",
"api_url": "http://192.168.2.12:8008/patroni",
"host": "192.168.2.12",
"port": 1921,
"timeline": 17
}
]
}

查看 patroni 节点状态,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[pg12@ydtf01 patroni]$ curl -s "http://192.168.2.11:8008/patroni" | jq .
{
"state": "running",
"postmaster_start_time": "2020-05-25 19:36:09.182 CST",
"role": "replica",
"server_version": 120002,
"cluster_unlocked": false,
"xlog": {
"received_location": 335544320,
"replayed_location": 335550208,
"replayed_timestamp": null,
"paused": false
},
"timeline": 17,
"database_system_identifier": "6823331807018309697",
"patroni": {
"version": "1.6.5",
"scope": "pg_ydtf"
}
}

更多查询命令参考 Patroni REST API

常见错误

patroni 安装报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@ydtf04 ~]# pip3 install patroni[etcd] -i https://mirrors.aliyun.com/pypi/simple/
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting patroni[etcd]
Using cached https://mirrors.aliyun.com/pypi/packages/25/01/e4656c541ac648a530fc1b6094324969f9f2ed8d7005ad0fa2598cbf1199/patroni-1.6.5-py3-none-any.whl
Collecting psutil>=2.0.0 (from patroni[etcd])
Downloading https://mirrors.aliyun.com/pypi/packages/c4/b8/3512f0e93e0db23a71d82485ba256071ebef99b227351f0f5540f744af41/psutil-5.7.0.tar.gz (449kB)
|████████████████████████████████| 450kB 10.2MB/s
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.8 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-at3fg3sc/psutil/setup.py'"'"'; __file__='"'"'/tmp/pip-install-at3fg3sc/psutil/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /tmp/pip-install-at3fg3sc/psutil/
Complete output (11 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.8/site-packages/setuptools/__init__.py", line 20, in <module>
from setuptools.dist import Distribution, Feature
File "/usr/local/lib/python3.8/site-packages/setuptools/dist.py", line 35, in <module>
from setuptools import windows_support
File "/usr/local/lib/python3.8/site-packages/setuptools/windows_support.py", line 2, in <module>
import ctypes
File "/usr/local/lib/python3.8/ctypes/__init__.py", line 7, in <module>
from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
WARNING: You are using pip version 19.2.3, however version 20.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

解决方法: 安装 libffi-devel包,之后重新编译安装 python3.8 ,这个错误困扰了我好几天,感谢 AntdDB的孙郁帮助才解决此问题。

patroni 运行报错之一

patroni 启动报错,如下:

1
2
3
4
[pg12@ydtf02 patroni]$ patroni pg_ydtf02.yml
2020-04-29 08:25:22,183 ERROR: Failed to get list of machines from http://192.168.2.12:2379/v2: EtcdException('Bad response : 404 page not found\n')
2020-04-29 08:25:22,184 INFO: waiting on etcd
2020-04-29 08:25:27,193 ERROR: Failed to get list of machines from http://192.168.2.12:2379/v2: EtcdException('Bad response : 404 page not found\n')

Etcd3.4的默认API为v3版本,Patroni现阶段不支持 v3 API,因此在三台主机的etcd启动脚本里加上--enable-v2属性即可,参考https://dba.stackexchange.com/questions/256317/patroni-with-etcd-starting-up

patroni 运行报错之二

patroni 启动报错,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[pg12@ydtf01 patroni]$ patroni pg_ydtf01.yml
2020-05-05 22:22:26,559 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-05-05 22:22:26,567 INFO: establishing a new patroni connection to the postgres cluster
2020-05-05 22:22:27,226 INFO: establishing a new patroni connection to the postgres cluster
2020-05-05 22:22:27,230 WARNING: Retry got exception: 'connection problems'
192.168.2.11:1921 - accepting connections
Traceback (most recent call last):
File "/usr/local/bin/patroni", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/patroni/__init__.py", line 235, in main
return patroni_main()
File "/usr/local/lib/python3.8/site-packages/patroni/__init__.py", line 197, in patroni_main
patroni = Patroni(conf)
File "/usr/local/lib/python3.8/site-packages/patroni/__init__.py", line 36, in __init__
self.postgresql = Postgresql(self.config['postgresql'])
File "/usr/local/lib/python3.8/site-packages/patroni/postgresql/__init__.py", line 99, in __init__
self.set_role('master' if self.is_leader() else 'replica')
File "/usr/local/lib/python3.8/site-packages/patroni/postgresql/__init__.py", line 305, in is_leader
return bool(self._cluster_info_state_get('timeline'))
File "/usr/local/lib/python3.8/site-packages/patroni/postgresql/__init__.py", line 300, in _cluster_info_state_get
raise PostgresConnectionException(self._cluster_info_state['error'])
patroni.exceptions.PostgresConnectionException: "'Too many retry attempts'"

解决方法: 修改pg_ydtf01.yml文件的postgresql 模块参数 listen: 0.0.0.0:1921。

参考

最后推荐和张文升共同编写的《PostgreSQL实战》,本书基于PostgreSQL 10 编写,共18章,重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等,涵盖大量实战用例!

购买链接:https://item.jd.com/12405774.html

PostgreSQL实战
感谢支持!
0%