mysql group replication实践记录: 步骤, 问题和注意事项
- 依赖: group-replication 需要gtid支持,多主复制基于gtid格式的binlog
- 特性: 多主模式:
- 限制: group-replication 只适合单机房高速局域网部署
- 限制: auto increment 默认是7, 集群建立起来之后不能改
- 限制: 默认要设置为read-only
- 限制: 失联的节点不会自动加回到group里.
- 限制: 2个成员里kill 1个member不能被自动处理, 因为2 成员中1个member不能独立行程多数派,整个group会卡主,不接受任何写入
- 限制: 配置: 必须使用hostname
- 操作: my.cnf 中关于group replication的配置
- 操作: 初始化mysql user 或 password时必须禁止binlog, 否则互相复制时会出现binlog冲突
- 操作: 部署mysql group replication:
- 操作: 对group成员的管理
- 操作: trouble shooting: 都挂了之后的启动
- 参考:
Mysql group replication 提供了比binlog replication更强的一致性集群解决方案. 最近在项目中尝试了1下, 有好处也有小坑, 记录1下.
依赖: group-replication 需要gtid支持,多主复制基于gtid格式的binlog
In Group Replication, state transfers are fully based on binary logs with GTID positions.
特性: 多主模式:
mysql group replicaiton 支持两种模式: 单主和多主, 区别在于是否只允许primary 支持写入, 还是每个成员都可以写入.
-
不支持并发多个库上建立db/table:
都可以创建成功, 然后error.log里报binlog apply冲突, 导致group复制全部终止.不知道怎么恢复.
-
支持并发多个库上插入数据:
并发向多个member上写, 只有1个实例成功,在写入阶段就可以检测出冲突, binlog正常复制.
限制: group-replication 只适合单机房高速局域网部署
The MySQL Group Replication documentation isn’t very optimistic on WAN support, claiming that both “Low latency, high bandwidth network connections are a requirement” and “Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.”
限制: auto increment 默认是7, 集群建立起来之后不能改
http://mysqlhighavailability.com/mysql-group-replication-auto-increment-configuration-handling/
限制: 默认要设置为read-only
配置文件my.cnf里加配置:
super-read-only = ‘on’; # 防止刚刚启动的实例误写入没在group里的mysql.
在mysql group建立起来之后, 会自动清除这个标记.
限制: 失联的节点不会自动加回到group里.
在测试过程中, 一个节点失联后, primary会把它从组成员列表中去除. 必须手动加入.
Separated nodes that lose the quorum will be expelled from the cluster, and won’t join back automatically once the network connection is restored. In its error log we can see:
需要手动加回:
mysql> START GROUP_REPLICATION;
ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (5.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.96 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ONLINE |
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ONLINE |
| group_replication_applier | ae148d90-d6da-11e6-897e-0242ac130003 | 0af7a73f4d6b | 3306 | ONLINE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
3 rows in set (0.00 sec
这里会有个问题: 失联节点还可以提供读操作
Moreover, in Group Replication a partitioned node keeps serving dirty reads as if nothing happened.
限制: 2个成员里kill 1个member不能被自动处理, 因为2 成员中1个member不能独立行程多数派,整个group会卡主,不接受任何写入
遇到mysql不能自动调整membership的情况,需要人为介入, 通过
SET GLOBAL group_replication_force_members=”127.0.0.1:10000,127.0.0.1:10001”;
强制把集群修改为摸个设置.来解决group内不能形成多数派的情况.
force_member之后, 其他的member启动后可以自动加入(如果启动的member上没有group中不包含的事务的话)
限制: 配置: 必须使用hostname
在my.cnf 中,必须加入 一个配置: report-host, 让mysql使用这个值作为能找到它的hostname, group-replication只使用hostname来找组成员.
如果不指定这个值, 默认的hostname如localhost会被使用,导致group成员互相找不到彼此.
把这个值设置成ip就可以了
[mysqld]
# let master use this address instead of
# default hostname, which might not be
# resolvable by master
report-host = your_ip
操作: my.cnf 中关于group replication的配置
# Configuring group_replication_start_on_boot instructs the plugin to not start
# operations automatically when the server starts. This is important when setting
# up Group Replication as it ensures you can configure the server before manually
# starting the plugin. Once the member is configured you can set
# group_replication_start_on_boot to on so that Group Replication starts
# automatically upon server boot.
# It shoudl be off when setting up a group.
# After that it should be `on` to let
# mysql auto starts group replicaton
# when restarted.
loose-group_replication_start_on_boot = "on"
loose-group_replication_local_address = "my_ip:0123"
loose-group_replication_group_seeds = "my_ip:0123,other_ip:0123"
# "on" for one server deploy only.
loose-group_replication_bootstrap_group = off
# group replication requires this
# if using multi worker to apply binlog
slave-preserve-commit-order = 1
# use multi primary mode
loose-group-replication-single-primary-mode = 0
# to prevent write on non-group-member
super-read-only = "on"
操作: 初始化mysql user 或 password时必须禁止binlog, 否则互相复制时会出现binlog冲突
SET SQL_LOG_BIN=0;
SET PASSWORD FOR "root"@"localhost" = "password";
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;'
操作: 部署mysql group replication:
配置binlog复制的channel(所有实例)
mysql -e 'CHANGE MASTER TO
MASTER_USER="replicator"
, MASTER_PASSWORD="replicator_password"
FOR CHANNEL "group_replication_recovery"'
检查group replication插件是否已经安装,没有则安装(只需执行一次)(所有实例)
mysql -e 'SELECT "ok" FROM mysql.plugin WHERE name = "group_replication"' \
| grep ok \
|| mysql -e 'INSTALL PLUGIN group_replication SONAME "group_replication.so"'
检查是否bootstrap过了(在一个实例上)
mysql -e 'SELECT "booted" FROM performance_schema.replication_group_members WHERE MEMBER_ID != "";' \
| grep booted
# bootstrap group replication, 如果没有启动过的话(在一个实例上)
mysql -e 'SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;'
检查group replicaiton 状态
mysql -e 'SELECT "started" FROM performance_schema.replication_connection_status WHERE SERVICE_STATE = "ON";' \
| grep started
其他成员加入(第一个实例之外)
mysql -e 'START GROUP_REPLICATION;'
操作: 对group成员的管理
- 通过配置来指定要加入的group的节点都有哪些, 原则上每个成员都使用相同的组配置:
loose-group_replication_group_seeds = "172.18.5.50:24999,172.18.5.135:24999,172.18.5.55:24999"
这些节点只要有1个能够联系到就能正确加入到group中.
加入之后的group membership(运行时的)可能和这个列表不同. mysql会动态调整
这个配置只影响
start group_replication
时去哪找集群去加入. -
mysql会动态的调整group membership,如:
- 发现一个member联系不到;
- member被正常kill掉;
- 网络中断等…
能被处理的membership变化, mysql会自动调整, error.log里会看到以下信息:
Plugin group_replication reported: 'Group membership changed to 172.18.5.55:3306 on view 15168862705781240:4.'
譬如 3个成员的group里1个member被kill掉.
用
SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member';
可以看到group从3个成员变成2个.
操作: trouble shooting: 都挂了之后的启动
group-replication-force-members 在所有机器都挂了重启时用不了.
报错是:
[ERROR] Plugin group_replication reported: 'group_replication_force_members must be empty on group start. Current value: '172.18.5.55:24999''
第1个机器起来时,无法加入到一个现有的group, 拿不到group members列表. group_replication不能正常启动.
这时, group-replication-force-members 也无法设置.
多数派都挂了后, 需要重新走1下bootstrap 流程:
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
参考:
battle-for-synchronous-replication-in-mysql-galera-vs-group-replication
Archive
- 15 Nov 2020 slimarray: gzip的压缩率, 即时访问
- 28 Oct 2020 200行代码实现基于paxos的kv存储
- 18 Oct 2020 后分布式时代: 多数派读写的'少数派'实现
- 20 Dec 2019 Art of Pull Requests(翻译)
- 21 Nov 2019 掐指算算: 你的CDN多花了几百万?
- 19 Nov 2019 一年的素描练习
- 30 Oct 2019 互联网中对象访问频率的91分布
- 09 Jan 2019 哄好面试官系列-1: 比较2个python dict(多级)是否相同
- 04 Nov 2018 存储中的文件合并策略优化
- 27 Sep 2018 软件工程是个面包机
- 26 Aug 2018 程序员必须知道的事情, 一般人我不告诉他
- 16 Aug 2018 cgexec 无法继承 LD_PRELOAD 环境变量
- 04 Aug 2018 mysql group replication实践记录: 步骤, 问题和注意事项
- 13 Feb 2018 枚举所有整勾股数
- 03 Feb 2018 ansible中的include, include_tasks 和 import_tasks 的差别
- 20 Nov 2017 python 并发subprocess.Popen的坑
- 05 Aug 2017 程序员必读: 摸清hash表的脾性
- 06 May 2017 python 进程内存增长问题, 解决方法和工具
- 01 Feb 2017 xp的分布式系统系列教程之: Erasure-Code: 工作原理, 数学解释, 实践和分析.
- 01 Feb 2017 xp的分布式系统系列教程之: Erasure-Code: 工作原理, 数学解释, 实践和分析.
- 11 Nov 2015 可靠分布式系统基础 Paxos 的直观解释
- 28 Jul 2015 socket关闭: close()和shutdown()的差异
- 17 May 2015 随手改变世界之 git-auto-squash
- 17 Feb 2015 Numbers Programmers Should Know About Hash
- 11 Feb 2015 Vim-tabbar: Simple, stupid and fast tab-bar for VIM
- 24 Jul 2014 1% 慢请求优化
- 31 Jan 2014 Some useful resources
- 31 Jan 2014 jobq.py -- Queue processing engine