Data Load High Availability Case - Log Management System Configuration Manual

Cong Li - Jul 18 - - Dev Community

1. Environment Preparation

  1. Select two servers with identical configurations as load machines, and install the same OS: Redhat Enterprise Edition 6.5 on both nodes.

  2. Disable the firewall on each node. Use the following commands to turn off the firewall:

    # chkconfig iptables off
    # chkconfig ip6tables off
    

    After restarting, you can verify that the firewall is disabled with:

    # chkconfig --list iptables
    iptables        0:off  1:off  2:off  3:off  4:off  5:off  6:off
    
    # chkconfig --list ip6tables
    ip6tables       0:off  1:off  2:off  3:off  4:off  5:off  6:off
    
  3. Disable SELINUX on each node. Modify the /etc/sysconfig/selinux file:

    # vi /etc/sysconfig/selinux
    
    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    #     enforcing - SELinux security policy is enforced.
    #     permissive - SELinux prints warnings instead of enforcing.
    #     disabled - No SELinux policy is loaded.
    SELINUX=disabled
    
    # SELINUXTYPE= can take one of these two values:
    #     targeted - Targeted processes are protected,
    #     mls - Multi Level Security protection.
    SELINUXTYPE=targeted
    

    After restarting, confirm SELINUX is disabled with:

    # sestatus
    SELinux status:                 disabled
    

2. Allocate Physical Partitions

Allocate a physical partition on each load machine, ensuring both partitions have the same capacity (e.g., /dev/sdb1).

3. Configure Hostnames

Configure the hostnames on both machines (IPs: 132.121.12.173 and 132.121.12.174):

# On 132.121.12.173
[root@132.121.12.173 ~]# vim /etc/hosts
132.121.12.173  OCSJZ13
132.121.12.174  OCSJZ14

# On 132.121.12.174
[root@132.121.12.174 ~]# vim /etc/hosts
132.121.12.173  OCSJZ13
132.121.12.174  OCSJZ14
Enter fullscreen mode Exit fullscreen mode

4. Set Up SSH Trust

Establish SSH trust between the root users of the two load machines:

# On OCSJZ13
[root@OCSJZ13 ~]# ssh-keygen -t rsa (press Enter for all prompts)
[root@OCSJZ13 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@OCSJZ14 (enter the root password of OCSJZ14)

# On OCSJZ14
[root@OCSJZ14 ~]# ssh-keygen -t rsa (press Enter for all prompts)
[root@OCSJZ14 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@OCSJZ13 (enter the root password of OCSJZ13)
Enter fullscreen mode Exit fullscreen mode

5. Set Up SSH Trust for Gbase User

Establish SSH trust between the gbase user on the load machines and all cluster nodes. This needs to be done for both load machines but only one-way (load machines can access all cluster nodes without a password).

6. Prepare Installation Packages

On both load machines:

  1. Extract pkgs.tar.bz2 and install all RPM packages.
  2. Extract toolkit.tar.bz2, copy all programs from the sbin directory to /usr/sbin, and copy all files from the service directory to /etc/init.d.

7. Configure DRBD

Configure DRBD on both load machines with the same configuration file /etc/drbd.d/global_common.conf:

[root@OCSJZ13 ~]# vim /etc/drbd.d/global_common.conf
global {
   usage-count yes;
}

common {
   handlers {
       pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
       pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
       local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
   }

   net {
       protocol C;
   }

   syncer {
       rate 100M;
   }
}
Enter fullscreen mode Exit fullscreen mode

Create the resource file /etc/drbd.d/dispdrbd.res on both load machines:

[root@OCSJZ13 ~]# vim /etc/drbd.d/dispdrbd.res
resource dispdrbd {
   on OCSJZ13 {
       device    /dev/drbd0;
       disk      /dev/sdb1;
       address   192.168.0.173:7789;
       meta-disk internal;
   }

   on OCSJZ14 {
       device    /dev/drbd0;
       disk      /dev/sdb1;
       address   192.168.0.174:7789;
       meta-disk internal;
   }
}
Enter fullscreen mode Exit fullscreen mode

Create the DRBD resources on both load machines:

[root@OCSJZ13 ~]# dd if=/dev/zero bs=1M count=1 of=/dev/sdb1
[root@OCSJZ13 ~]# drbdadm create-md dispdrbd

# Start DRBD service on both machines
[root@OCSJZ13 ~]# service drbd start

# If synchronization does not start, execute on either machine
[root@OCSJZ14 ~]# drbdadm invalidate dispdrbd
Enter fullscreen mode Exit fullscreen mode

After synchronization (when cat /proc/drbd shows ds:Uptodate/Uptodate), set the primary node (assuming 132.121.12.173 is the primary):

[root@OCSJZ13 ~]# drbdsetup /dev/drbd0 primary

# If setting primary fails
[root@OCSJZ13 ~]# drbdadm -- --overwrite-data-of-peer primary all

# Format the shared device
[root@OCSJZ13 ~]# mkfs.ext4 /dev/drbd0
Enter fullscreen mode Exit fullscreen mode

8. Configure NFS

On both load machines:

[root@OCSJZ13 ~]# mkdir /nfsshare
[root@OCSJZ13 ~]# vim /etc/exports
/nfsshare *(rw,sync,no_root_squash)
Enter fullscreen mode Exit fullscreen mode

9. Configure Corosync

On the primary node:

[root@OCSJZ13 ~]# corosync-keygen
[root@OCSJZ13 ~]# chmod 0400 /etc/corosync/authkey
[root@OCSJZ13 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
[root@OCSJZ13 ~]# vim /etc/corosync/corosync.conf

compatibility: whitetank

totem {
   version: 2
   secauth: off
   threads: 0
   interface {
        member {
            memberaddr: 132.121.12.173
        }
        member {
            memberaddr: 132.121.12.174
        }
        ringnumber: 0
        bindnetaddr: 132.121.12.1
        mcastport: 5422
        ttl: 1
    }
    transport: udpu 
}

logging {
   fileline: off
   to_stderr: no
   to_logfile: yes
   to_syslog: yes
   logfile: /var/log/cluster/corosync.log
   debug: off
   timestamp: on
}

service {
   ver: 0
   name: pacemaker
   use_mgmtd: yes
}
Enter fullscreen mode Exit fullscreen mode

Copy the configuration to the secondary node:

[root@OCSJZ13 ~]# scp /etc/corosync/authkey /etc/corosync/corosync.conf root@OCSJZ14:/etc/corosync

# On the secondary node
[root@OCSJZ14 ~]# chmod 0400 /etc/corosync/authkey
[root@OCSJZ14 ~]# vim /etc/corosync/corosync.conf
Enter fullscreen mode Exit fullscreen mode

10. Start Services

Start the corosync service on both load machines:

[root@OCSJZ13 ~]# service corosync start
Enter fullscreen mode Exit fullscreen mode

11. Configure Pacemaker

On one of the load machines, configure CRM:

[root@OCSJZ13 ~]# crm configure
Enter fullscreen mode Exit fullscreen mode

Disable STONITH:

crm(live)configure# property stonith-enabled=false
Enter fullscreen mode Exit fullscreen mode

Modify the cluster state check to ignore quorum not being met:

crm(live)configure# property no-quorum-policy=ignore
Enter fullscreen mode Exit fullscreen mode

Specify the default stickiness value for resources:

crm(live)configure# rsc_defaults resource-stickiness=100
crm(live)configure# commit
Enter fullscreen mode Exit fullscreen mode

Configure DRBD (primary/secondary mode):

crm(live)configure# primitive dispdrbd ocf:linbit:drbd params drbd_resource=dispdrbd op monitor role=Master interval=50s timeout=30s op monitor role=Slave interval=60s timeout=30s
crm(live)configure# master ms_dispdrbd dispdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
Enter fullscreen mode Exit fullscreen mode

Configure the mount point:

crm(live)configure# primitive webfs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/nfsshare" fstype="ext4"
crm(live)configure# colocation webfs_on_ms_dispdrbd inf: webfs ms_dispdrbd:Master
crm(live)configure# order webfs_after_ms_dispdrbd inf: ms_dispdrbd:promote webfs:start
Enter fullscreen mode Exit fullscreen mode

Configure the virtual IP (this IP is used to provide external services and must not conflict with other IPs in the system; the user-provided IP for this setup is 132.121.12.175):

crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip="132.121.12.175" cidr_netmask="24" op monitor interval="5s"
crm(live)configure# colocation vip_on_ms_dispdrbd inf: vip ms_dispdrbd:Master
Enter fullscreen mode Exit fullscreen mode

Configure NFS:

crm(live)configure# primitive rpcbind lsb:rpcbind op monitor interval="10s"
crm(live)configure# colocation rpcbind_on_ms_dispdrbd inf: rpcbind ms_dispdrbd:Master
crm(live)configure# primitive nfsshare lsb:nfs op monitor interval="30s"
crm(live)configure# colocation nfsshare_on_ms_dispdrbd inf: nfsshare ms_dispdrbd:Master
crm(live)configure# order nfsshare_after_rpcbind mandatory: rpcbind nfsshare:start
crm(live)configure# order nfsshare_after_vip mandatory: vip nfsshare:start
crm(live)configure# order nfsshare_after_webfs mandatory: webfs nfsshare:start
Enter fullscreen mode Exit fullscreen mode

Check the status of the Pacemaker service using the crm status command:

[root@OCSJZ13 ~]# crm status
============
Last updated: Thu Feb  7 21:20:35 2013
Last change: Thu Feb  7 20:36:54 2013 via cibadmin on OCSJZ13
Stack: openais
Current DC: OCSJZ13 - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ OCSJZ13 OCSJZ14 ]

Master/Slave Set: ms_dispdrbd [dispdrbd]
    Masters: [ OCSJZ13 ]
    Slaves: [ OCSJZ14 ]
webfs  (ocf::heartbeat:Filesystem):    Started OCSJZ13
vip    (ocf::heartbeat:IPaddr):        Started OCSJZ13
nfsshare       (lsb:nfs):      Started OCSJZ13
Enter fullscreen mode Exit fullscreen mode

12. Configure Related Configuration Files (All Configuration Files Must Be Consistent on Both Load Machines)

Configure gciplist.conf:

[root@OCSJZ13 ~]# gciplist HOSTNAME >/etc/gciplist.conf
Enter fullscreen mode Exit fullscreen mode

(Note: HOSTNAME refers to the IP of any node in the cluster. This configuration needs to be reset each time the cluster topology changes—adding or removing nodes.)

Configure dispmon.conf:

There are two modes for loading monitoring services: background service mode and cron job mode. The configurations for the two modes are different.

1) Background Service Mode

[root@OCSJZ13 ~]# vim /etc/dispmon.conf
mode=daemon
exec=fetch_load.sh
Enter fullscreen mode Exit fullscreen mode

(Note: fetch_load.sh refers to the filename of the user's load control program execution file, which needs to be deployed in the /usr/sbin directory.)

2) Cron Job Mode

[root@OCSJZ13 ~]# vim /etc/dispmon.conf
mode=crontab
Enter fullscreen mode Exit fullscreen mode

Configure dispcron.conf (Only Needed for Cron Job Mode):

This file must be placed in the /etc directory and is configured in the same format as a crontab configuration file.

Note: The dispmon.conf and dispcron.conf configuration files must be set while the dispmon service on both load machines is stopped.

13. Complete the Configuration of Related Services

[root@OCSJZ13 ~]# crm configure
Enter fullscreen mode Exit fullscreen mode

Configure the dispserver Service:

crm(live)configure# primitive dispserver lsb:dispsvr op monitor interval="10s"
crm(live)configure# colocation dispsvr_on_ms_dispdrbd inf: dispserver ms_dispdrbd:Master
crm(live)configure# order dispsvr_after_nfsshare mandatory: nfsshare dispserver:start
Enter fullscreen mode Exit fullscreen mode

Configure the monitor Service:

crm(live)configure# primitive dispmon lsb:dispmon op monitor interval="10s"
crm(live)configure# colocation dispmon_on_ms_dispdrbd inf: dispmon ms_dispdrbd:Master
crm(live)configure# order dispmon_after_dispserver mandatory: dispserver dispmon:start
crm(live)configure# commit
crm(live)configure# quit
Enter fullscreen mode Exit fullscreen mode
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .