본문 바로가기

Database/MySQL

keepalived를 이용한 mysql failover

개요

위 그림에서 SQLNode1, SQLNode2 에 keepalived 를 설치한다.

keepalived 에는 다음과 같은 동작을 기대한다.

  • SQLNode1 과 SQLNode2 의 mysql daemon 의 health check 를 한다.
  • SQLNode1 (master) 의 health check 실패시 VIP 로 들어오는 요청을 SQLNode2(backup) 로 보낸다.
  • SQLNode2(backup) 으로 요청이 들어 오는 경우 다시 VIP 로 redirect 한다.
    • 스케줄링에 의해 backup 쪽으로 요청이 오면 이것을 master 로 돌리기 위해
    • master 에만 전달하는게 안되는거 같음( 옵션확인 필요)

설치

yum 을 이용하여 keepalived 패키지를 설치한다.

[root@sql1 ~]# yum install keepalived
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base: mirror.kakao.com
 * extras: mirror.kakao.com
 * updates: ftp.kaist.ac.kr
Resolving Dependencies
--> Running transaction check
---> Package keepalived.x86_64 0:1.3.5-8.el7_6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

======================================================================================================
 Package                 Arch                Version                       Repository            Size
======================================================================================================
Installing:
 keepalived              x86_64              1.3.5-8.el7_6                 updates              329 k

Transaction Summary
======================================================================================================
Install  1 Package

Total download size: 329 k
Installed size: 1.0 M
Is this ok [y/d/N]: y
Downloading packages:
keepalived-1.3.5-8.el7_6.x86_64.rpm                                            | 329 kB  00:00:00    
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : keepalived-1.3.5-8.el7_6.x86_64                                                    1/1
  Verifying  : keepalived-1.3.5-8.el7_6.x86_64                                                    1/1

Installed:
  keepalived.x86_64 0:1.3.5-8.el7_6                                                                  

Complete!
[root@sql1 ~]#

keepalived.conf 설정

/etc/keepalived/keepalived.conf 를 설정한다.
아래는 SQLNode1 의 설정이다.

global_defs {
   notification_email {
   }
   router_id LVS_DEVEL
   script_user root
}

vrrp_instance VI_1 {
    state BACKUP
    interface enp0s8
        lvs_sync_daemon_interface enp0s8
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
                192.168.56.100 label enp0s8:1
    }
        notify_master "/etc/keepalived/iptables.sh 192.168.56.100 master"
        notify_backup "/etc/keepalived/iptables.sh 192.168.56.100 backup"
}

virtual_server 192.168.56.100 3306 {
        delay_loop 6
        lb_algo rr
        lb_kind DR
        protocol TCP
    real_server 192.168.56.3 3306 {
        weight 10
        MISC_CHECK {
            misc_path "/etc/keepalived/mysql-check.sh 192.168.56.3"
            misc_timeout 5
        }      
    }
    real_server 192.168.56.4 3306 {
        weight 10
        MISC_CHECK {
            misc_path "/etc/keepalived/mysql-check.sh 192.168.56.4"
            misc_timeout 5
        }      
    }
}

아래는 SQLNode2 의 설정

global_defs {
   notification_email {
   }
   router_id LVS_DEVEL
   script_user root
}

vrrp_instance VI_1 {
    state BACKUP
    interface enp0s8
    lvs_sync_daemon_interface enp0s8
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.56.100 label enp0s8:1
    }
    notify_master "/etc/keepalived/iptables.sh 192.168.56.100 master"
    notify_backup "/etc/keepalived/iptables.sh 192.168.56.100 backup"
}

virtual_server 192.168.56.100 3306 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    protocol TCP
    real_server 192.168.56.3 3306 {
        weight 10
        MISC_CHECK {
            misc_path "/etc/keepalived/mysql-check.sh 192.168.56.3"
            misc_timeout 5
        }      
    }

    real_server 192.168.56.4 3306 {
        weight 10
        MISC_CHECK {
            misc_path "/etc/keepalived/mysql-check.sh 192.168.56.4"
            misc_timeout 5
        }      
    }
}

마스터로 사용할 vrrp instance 의 priority 를 더 높게 설정하고 나머지는 같다.
각 옵션에 대한 내용은 keepalived 가 채택이 되었을때 ...

iptables.sh 스크립트

아래 스크립트는 자신이 backup 상태라면 요청을 master 에게 redirect 하기 위한 설정이다.
아마도 keepalived 의 스케줄링 정책상 Active-Stanby 로는 설정이 안되기에 사용하는 편법 같다.
테스트 결과 예상과는 다르게 backup 쪽으로도 호출이 되는데 iptables 부터 다시 확인이 필요하다.

#!/bin/bash

case $2 in

backup)

/sbin/iptables -t nat -A PREROUTING -d $1 -p tcp -j REDIRECT

;;

master)

/sbin/iptables -t nat -D PREROUTING -d $1 -p tcp -j REDIRECT

;;

esac

mysql-check.sh 스크립트

아래 스크립트는 mysql health check 용이다.

#!/bin/bash

/home/chlee/mysqlc/bin/mysql --host=$1 --user=hcheck --password=password -Nse "select 1 from dual"

실행하기 전에 health check user 를 생성해야 한다.

mysql> grant select on *.* to 'hcheck'@'%' identified by 'password';

Query OK, 0 rows affected (0.00 sec)

keepalived 실행하기

root 권한으로 keepalived 를 실행한다.

[root@sql1 keepalived]# systemctl restart keepalived

/vat/log/messages 에서 로그를 확인한다.

Mar 26 17:36:55 sql1 Keepalived[20457]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Mar 26 17:36:55 sql1 Keepalived[20457]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 26 17:36:55 sql1 systemd: PID file /var/run/keepalived.pid not readable (yet?) after start.
Mar 26 17:36:55 sql1 Keepalived[20459]: Starting Healthcheck child process, pid=20460
Mar 26 17:36:55 sql1 systemd: Started LVS and VRRP High Availability Monitor.
Mar 26 17:36:55 sql1 Keepalived[20459]: Starting VRRP child process, pid=20461
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: remove_script 1 0
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Setting uid.gid
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Leaving misc_end_handler
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: remove_script 1 0
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Setting uid.gid
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Leaving misc_end_handler
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: SECURITY VIOLATION - check scripts are being executed but script_security not enabled.
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Activating healthchecker for service [192.168.56.100]:3306
Mar 26 17:36:55 sql1 Keepalived_healthcheckers[20460]: Activating healthchecker for service [192.168.56.100]:3306
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: Registering Kernel netlink reflector
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: Registering Kernel netlink command channel
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: Registering gratuitous ARP shared channel
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: (VI_1): Specifying lvs_sync_daemon_interface against a vrrp is deprecated.
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]:        Please use global lvs_sync_daemon
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) removing protocol VIPs.
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: Using LinkWatch kernel netlink reflector...
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) Entering BACKUP STATE
Mar 26 17:36:55 sql1 Keepalived_vrrp[20461]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]
Mar 26 17:36:55 sql1 kernel: IPVS: sync thread started: state = BACKUP, mcast_ifn = enp0s8, syncid = 255, id = 0
Mar 26 17:36:56 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) forcing a new MASTER election
Mar 26 17:36:57 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar 26 17:36:58 sql1 kernel: IPVS: stopping backup sync thread 20462 ...
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on enp0s8 for 192.168.56.100
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:36:58 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:36:58 sql1 avahi-daemon[2771]: Registering new address record for 192.168.56.100 on enp0s8.IPv4.
Mar 26 17:36:58 sql1 kernel: IPVS: sync thread started: state = MASTER, mcast_ifn = enp0s8, syncid = 255, id = 0
Mar 26 17:37:03 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:37:03 sql1 Keepalived_vrrp[20461]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on enp0s8 for 192.168.56.100
Mar 26 17:37:03 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:37:03 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:37:03 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100
Mar 26 17:37:03 sql1 Keepalived_vrrp[20461]: Sending gratuitous ARP on enp0s8 for 192.168.56.100

테스트할때 주의점은 keepalived 가 서비스 데몬과 같은 machine 에 설치된 경우
test client 는 반드시 다른 machine 에서 테스트를 해야 한다.
VIP 설정과 스케줄링에 의해 패킷이 꼬일 수 있다.

Mar 26 16:45:55 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:45:55 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:45:56 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:45:56 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:45:58 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:45:58 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:45:58 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:45:58 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:46:00 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:46:00 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:46:01 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:46:01 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:46:02 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8
Mar 26 16:46:02 sql1 kernel: ll header: 00000000: 08 00 27 e5 8c eb 08 00 27 7f 33 24 08 00 ..'.....'.3$..
Mar 26 16:46:06 sql1 kernel: IPv4: martian source 192.168.56.100 from 192.168.56.100, on dev enp0s8

failover 테스트

테스트 프로그램을 통해 failover 수행을 확인한다.
아래는 1나의의 클라이언트에서 commit interval 1 로 100000 건을 insert 동작을 수행한다.
테스트 프로그램을 VIP (192.168.56.100) 만을 보고 있다.

[chlee@base single_dml]$ ./run.sh mytest 1 1 100000 1

UNIT=100000

mytest 1 0 1 100000 1

중간에 sqlnode1 을 셧다운 한다.

[chlee@sql1 mysqlc]$ mysql.sh stop

Enter password:

[chlee@sql1 mysqlc]$ 2019-03-27T05:53:56.456575Z mysqld_safe mysqld from pid file /home/chlee/mysqlc/data/sql1.pid ended

테스트 프로그램의 로그를 통해 중간에 connection lost 오류를 확인하고 재접속을 시도 한다.

[2019-03-27 14:59:16]: [0]Start Insert...[1]
[2019-03-27 14:59:32]: (STMT)ERROR 2013 (HY000): Lost connection to MySQL server during query
[2019-03-27 14:59:32]:
[Reconnect]
[2019-03-27 14:59:33]: [0]Start Insert...[22622]
[2019-03-27 15:00:40]: [0]Finished!

다음은 sqlnode1 의 shutdown 에 대해 keepalived 의 로그이다.

# /var/log/message 의 해당 로그이다.
# mysql health check 실패시 해당 process 의 exit 로그 (3,4 중 하나라도 실패하면 발생한다. )
Mar 27 14:59:38 sql1 Keepalived_healthcheckers[20460]: pid 14417 exited with status 1 

# health check 가 실패한 mysql 의 정보
Mar 27 14:59:38 sql1 Keepalived_healthcheckers[20460]: Misc check to [192.168.56.3] for [/etc/keepalived/mysql-check.sh 192.168.56.3] failed.

# 3번서버( sqlnode1 ) 을 virtual server 100 번에서 제외시킴
Mar 27 14:59:38 sql1 Keepalived_healthcheckers[20460]: Removing service [192.168.56.3]:3306 from VS [192.168.56.100]:3306





# backup 쪽에도 같은 로그가 있다.

Mar 27 14:59:33 sql2 Keepalived_healthcheckers[20944]: pid 15229 exited with status 1
Mar 27 14:59:33 sql2 Keepalived_healthcheckers[20944]: Misc check to [192.168.56.3] for [/etc/keepalived/mysql-check.sh 192.168.56.3] failed.
Mar 27 14:59:33 sql2 Keepalived_healthcheckers[20944]: Removing service [192.168.56.3]:3306 from VS [192.168.56.100]:3306

Reference

iptables

iptables -I INPUT -p vrrp -j ACCEPT
iptables -I OUTPUT -p vrrp -j ACCEPT
iptables -I INPUT -p tcp --dport 3306 -j ACCEPT
iptables -I INPUT -p tcp --dport 1186 -j ACCEPT
iptables -t nat -A PREROUTING -d 192.168.56.100 -p tcp -j REDIRECT