DRBD UpToDate/DUnknown 故障恢复

559 查看

故障如下:

root@drbd1:~# drbd-overview
 0:data/0  StandAlone Primary/Unknown UpToDate/DUnknown /data/mysql ext3 3.9G 8.1M 3.7G 1%
root@drbd2:~# drbd-overview
 0:data/0  StandAlone Primary/Unknown UpToDate/DUnknown /data/mysql ext3 3.9G 8.1M 3.7G 1%

状态 StandAlone: 没有可用的网络配置(没有可用的复制或同步网路), 资源没有被连接, 或者是管理员使用drbdadm disconnect <resource> 进行了连接中断, 也有可能是认证失败或是产生脑裂而中断了连接

查看日志:

root@drbd1:~# tail -n 20 /var/log/syslog
May 23 20:34:41 drbd1 kernel: [ 4629.177175] drbd data: Peer authenticated using 20 bytes HMAC
May 23 20:34:41 drbd1 kernel: [ 4629.177389] drbd data: conn( WFConnection -> WFReportParams )
May 23 20:34:41 drbd1 kernel: [ 4629.177391] drbd data: Starting asender thread (from drbd_r_data [10450])
May 23 20:34:41 drbd1 kernel: [ 4629.186967] block drbd0: drbd_sync_handshake:
May 23 20:34:41 drbd1 kernel: [ 4629.186970] block drbd0: self B4EF9EF8D6B328BD:1E9AC6C2E7980795:4B519345CD4008DE:4B509345CD4008DE bits:1024 flags:0
May 23 20:34:41 drbd1 kernel: [ 4629.186972] block drbd0: peer 7B0DFE0CF2812103:1E9AC6C2E7980794:4B519345CD4008DE:4B509345CD4008DE bits:1 flags:2
May 23 20:34:41 drbd1 kernel: [ 4629.186973] block drbd0: uuid_compare()=100 by rule 90
May 23 20:34:41 drbd1 kernel: [ 4629.186976] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
May 23 20:34:41 drbd1 kernel: [ 4629.188312] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
May 23 20:34:41 drbd1 kernel: [ 4629.188324] block drbd0: Split-Brain detected but unresolved, dropping connection!
May 23 20:34:41 drbd1 kernel: [ 4629.189831] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
May 23 20:34:41 drbd1 kernel: [ 4629.191008] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
May 23 20:34:41 drbd1 kernel: [ 4629.191028] drbd data: conn( WFReportParams -> Disconnecting )
May 23 20:34:41 drbd1 kernel: [ 4629.191030] drbd data: error receiving ReportState, e: -5 l: 0!
May 23 20:34:41 drbd1 kernel: [ 4629.191496] drbd data: asender terminated
May 23 20:34:41 drbd1 kernel: [ 4629.191497] drbd data: Terminating drbd_a_data
May 23 20:34:41 drbd1 kernel: [ 4629.218488] drbd data: Connection closed
May 23 20:34:41 drbd1 kernel: [ 4629.218551] drbd data: conn( Disconnecting -> StandAlone )
May 23 20:34:41 drbd1 kernel: [ 4629.218553] drbd data: receiver terminated
May 23 20:34:41 drbd1 kernel: [ 4629.218554] drbd data: Terminating drbd_r_data

查看服务状态:

root@drbd1:~# service drbd status
drbd driver loaded OK; device status:
version: 8.4.5 (api:1/proto:86-101)
srcversion: 5A4F43804B37BB28FCB1F47
m:res   cs          ro               ds                 p       mounted  fstype
0:data  StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext3

其中: drbd1 为主节点, drbd2 为备节点

解决方法:

1.确保卸载所有drbd设备

root@drbd1:~# umount /dev/drbd0
root@drbd2:~# umount /dev/drbd0

2.将所有节点设为Secondary

root@drbd1:~# drbdadm secondary data
root@drbd2:~# drbdadm secondary data

3.中断节点的连接

root@drbd2:~# drbdadm disconnect data
??: Failure: (162) Invalid configuration request
additional info from kernel:
unknown connection
Command 'drbdsetup-84 disconnect ipv4:10.11.8.158:7789 ipv4:10.11.8.145:7789' terminated with exit code 10

4.drbd2 上执行

root@drbd2:~# drbdadm connect data --discard-my-data
root@drbd2:~# drbd-overview
 0:data/0  WFConnection Secondary/Unknown UpToDate/DUnknown

状态 WFConnection: 表示本节点将会等待, 直到对点网络实现连接

5.drbd1 上执行

root@drbd1:~# drbdadm connect data
root@drbd1:~# drbd-overview
 0:data/0  Connected Secondary/Secondary UpToDate/UpToDate

状态恢复正常