ヨシのエンジニア備忘録

Scientific Linux 6.0 Pacemaker + Corosync クラスタリング

色々なサイトを見ながらやってようやく構築できた。
今度もっときれいに書きますが、忘れないうちにメモとして。

Pacemaker+corosynck

1.SELINUX無効
これではまった。。。orz

#vi /etc/selinux/config
SELINUX=enforcing
↓変更
SELINUX=disabled

2.hostsファイル作成
# vi /etc/hosts
以下を追記する

192.168.123.1 sl6-1
192.168.123.2 sl6-2

3.SL標準パッケージでインストール

# yum install corosync pacemaker
▼インストールパッケージ
corosync-1.2.3-21.el6.x86_64
pacemaker-1.1.2-7.el6.x86_64
corosynclib-1.2.3-21.el6.x86_64
pacemaker-libs-1.1.2-7.el6.x86_64

4.設定変更

# cd /etc/corosync/
# cp -p corosync.conf.example corosync.conf
# vi corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.123.0
mcastaddr: 226.94.1.1
# mcastport: 4000
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}

verを 0 → 1　へ変更する。これにもはまった。。。

# vi /etc/corosync/service.d/pcmk
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}

変更しないと以下のようなエラーがでます。

Jan 29 11:02:03 sl6 pacemakerd: [1729]: ERROR: read_config: We can only start Pacemaker from init if using version 1 of the Pacemaker plugin for Corosync. Terminating.

5.corosyncとpacemakerの起動

# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
# /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager (pacemakerd): [ OK ]

自動起動設定

# chkconfig corosync on
# chkconfig pacemaker on

6.起動確認
自動でsl6-1とsl6-2がノードとして認識されていることがわかる。

# crm_mon -1
============
Last updated: Sun Jan 29 20:48:21 2012
Stack: openais
Current DC: sl6-1 - partition with quorum
Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ sl6-1 sl6-2 ]

7.リソース設定
▼クラスタ全体

# crm configure property no-quorum-policy="ignore" stonith-enabled="false"

no-quorum-policy：ノードが２つの場合、ignoreにするらしい
stonith-enabled：STONITH機能を無効にする

▼リソース動作のデフォルト値

# crm configure rsc_defaults resource-stickiness="INFINITY" migration-threshold="2"

resource-stickiness：INFINITYを指定すると稼働中の自動フェイルバックしないようにする
migration-threshold：障害時にリソースの再起動を試みる回数

▼仮想IPアドレスリソース設定

crm configure primitive res_vip ocf:heartbeat:IPaddr2 \
params ip=192.168.11.1 nic="eth0" cidr_netmask="24" op monitor interval="10s"

▼GWのping死活監視リソース設定

crm configure primitive res_ping ocf:pacemaker:ping params \
name="default_ping_set" host_list="192.168.11.254" multiplier="100" \
dampen="1" op monitor interval="10s" timeout="60" op start timeout="60"

▼GWのPING監視は全nodeで実行されるようにclone化

crm configure clone clone_ping res_ping

▼PING監視で100以下の値になったら、VIPをフェイルオーバールする

crm configure location vip_location res_vip rule -inf: not_defined default_ping_set or default_ping_set lt 100

8.動作確認する

# crm_mon -A
Last updated: Sun Jan 29 19:18:33 2012
Stack: openais
Current DC: sl6-1 - partition with quorum
Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ sl6-1 sl6-2 ]
res_vip (ocf::heartbeat:IPaddr2): Started sl6-1
Clone Set: clone_ping
Started: [ sl6-1 sl6-2 ]
Node Attributes:
* Node sl6-1:
+ default_ping_set : 100
* Node sl6-2:
+ default_ping_set : 100
Failed actions:
res_ping_monitor_0 (node=sl6-1, call=15, rc=5, status=complete): not installed
res_ping_monitor_0 (node=sl6-2, call=10, rc=5, status=complete): not installed

eth0側のみ切断してみる。
すると、GWへpingが通らなくなり、フェイルオーバーされる。
default_ping_setの値が100 → 0へ変わっていることがわかる。

# crm_mon -A
============
Last updated: Sun Jan 29 19:12:36 2012
Stack: openais
Current DC: sl6-1 - partition with quorum
Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ sl6-1 sl6-2 ]
res_vip (ocf::heartbeat:IPaddr2): Started sl6-2
Clone Set: clone_ping
Started: [ sl6-1 sl6-2 ]
Node Attributes:
* Node sl6-1:
+ default_ping_set : 0 : Connectivity is lost
* Node sl6-2:
+ default_ping_set : 100
Failed actions:
res_ping_monitor_0 (node=sl6-1, call=15, rc=5, status=complete): not installed
res_ping_monitor_0 (node=sl6-2, call=10, rc=5, status=complete): not installed

SL6-2で見ると、以下のように仮想IPアドレスが移っていることがわかる。
IPAddr2のリソース監視の場合、ifconfigでは表示されないので注意。

# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:ee:31:66 brd ff:ff:ff:ff:ff:ff
inet 192.168.11.3/24 brd 192.168.11.255 scope global eth0
inet 192.168.11.1/24 brd 192.168.11.255 scope global secondary eth0
inet6 fe80::a00:27ff:feee:3166/64 scope link
valid_lft forever preferred_lft forever

■参考
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05.html#
http://linux-ha.sourceforge.jp/wp/archives/1178/2
http://gihyo.jp/admin/serial/01/pacemaker/0004
http://d.hatena.ne.jp/kaze-kaoru/20110901/1314847758

ヨシのエンジニア備忘録

2012年1月29日日曜日

0 件のコメント:

コメントを投稿