Introduction
This page describes several procedures which AlexisHuxley uses to configure and test cluster services on his network. The actual installation of cluster software, etc is covered by MDI.
Procedure: configuring a VM to access multiple bridges
In VM servers, pdi (see MDI) can configure 3 bridges, each connected to a different VLAN and make them available to VMs. But the VM configuration still needs to be updated to make use of them.
Run:
virsh shutdown <this-vm> virsh dumpxml <this-vm> > <this-vm>.xml
Edit the XML file, clone the NIC stanza twice, incrementing the MAC address, bridge name and PCI slot in the clones, making sure that the original NIC stanza is not changed and that the PCI slot does not clash with any already present! E.g. If the original stanza was this:
<interface type='bridge'> <mac address='00:16:3e:dd:54:cf'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface>
and PCI slot numbers 0x03 and 0x04 were used by other stanzas then you would add this:
<interface type='bridge'> <mac address='00:16:3e:dd:54:d0'/> <source bridge='br1'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </interface> <interface type='bridge'> <mac address='00:16:3e:dd:54:d1'/> <source bridge='br2'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface>
Run:
virsh undefine <this-vm> virsh define <this-vm>.xml
Run:
virsh start <this-vm>
libvirt or libvirt-tools has a bug whereby XML configuration data for multiple NICs overwrites the XML configuration data for the first NIC, leading on the first edit to the impression that there is only one NIC and then on the second edit to there really being only one NIC. For this reason it is a good idea to preserve the XML files used above.
Procedure: tweaking basic cluster settings
This section lists various steps which may be needed; review them carefully to decide whether they are appropriate.
- Set the Unix password for the 'hacluster' account (this will be needed when using hb_gui).
Disable STONITH (taken from http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo), fix two-node quorum issues (taken from http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf) and make sure that resources do not migrate back by running:
noodle# crm crm(live)# cib new configtmp INFO: building help index INFO: configtmp shadow CIB created crm(configtmp)# configure crm(configtmp)configure# property stonith-enabled=false crm(configtmp)configure# property no-quorum-policy=ignore crm(configtmp)configure# rsc_defaults resource-stickiness=100 crm(configtmp)configure# verify crm(configtmp)configure# end There are changes pending. Do you want to commit them? y crm(configtmp)# cib use live crm(live)# cib commit configtmp INFO: commited 'configtmp' shadow CIB to the cluster crm(live)# cib delete configtmp crm(live)# quit
Procedure: testing using a dummy resource
Set up a dummy resource by running:
noodle# crm crm(live)# cib new configtmp INFO: building help index INFO: configtmp shadow CIB created crm(configtmp)# configure crm(configtmp)configure# primitive dummy ocf:pacemaker:Dummy op monitor interval=10s WARNING: dummy: default timeout 20s for start is smaller than the advised 90 WARNING: dummy: default timeout 20s for stop is smaller than the advised 100 crm(configtmp)configure# verify WARNING: dummy: default timeout 20s for start is smaller than the advised 90 WARNING: dummy: default timeout 20s for stop is smaller than the advised 100 crm(configtmp)configure# end There are changes pending. Do you want to commit them? y crm(configtmp)# cib use live crm(live)# cib commit configtmp INFO: commited 'configtmp' shadow CIB to the cluster crm(live)# cib delete configtmp INFO: configtmp shadow CIB deleted crm(live)# quit bye noodle#
Test by running the following commands (based on http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo):
root# crm crm(live)# configure show node doodle \ attributes standby="off" node noodle \ attributes standby="off" primitive dummy ocf:pacemaker:Dummy \ op monitor interval="10s" property $id="cib-bootstrap-options" \ dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ maintenance-mode="false" \ last-lrm-refresh="1291797358" rsc_defaults $id="rsc-options" \ resource-stickiness="100" op_defaults $id="op_defaults-options" \ record-pending="false" crm(live)# node show doodle: normal standby: off noodle: normal standby: off crm(live)# resource show dummy (ocf::pacemaker:Dummy) Started crm(live)# node standby <node-name> # verify resource is migrated to other node with "crm_mon -1" crm(live)# node online <node-name> # verify resource is note migrated back to other node with "crm_mon -1" crm(live)# resource migrate dummy <node-name> # verify resource is migrated with "crm_mon -1" crm(live)# resource stop dummy # verify resource is stopped with "crm_mon -1" crm(live)# resource start dummy # verify resource is started with "crm_mon -1" crm(live)# quit bye noodle#Remove the dummy resource by running:
noodle# crm crm(live)# cib new configtmp INFO: building help index INFO: configtmp shadow CIB created crm(configtmp)# configure crm(configtmp)configure# delete dummy INFO: hanging location:cli-prefer-dummy deleted crm(configtmp)configure# verify crm(configtmp)configure# end There are changes pending. Do you want to commit them? y crm(configtmp)# cib use live crm(live)# cib commit configtmp INFO: commited 'configtmp' shadow CIB to the cluster crm(live)# cib delete configtmp INFO: configtmp shadow CIB deleted crm(live)# quit bye noodle#
Procedure: clustering Apache
- On all nodes install apache2
On all nodes prevent automatic startup:
service apache2 stop update-rc.d apache2 remove
On all nodes configure apache to listen on an as-yet-unconfigured virtual interface:
perl -pi -e 's/^Listen.*/Listen 192.168.1.13:80/' /etc/apache2/ports.conf
- On NFS shared storage (e.g. NAS) allocate storage to be accessible to both nodes
On one node manually start resources to test understanding of what is required and in what order. E.g.:
mount storage.pasta.net:/vol/webpages /var/www ifconfig eth0:1 192.168.1.13 up service apache2 start
and check web access on the virtual interface.- Manually stop resources.
Add a resource group containing 3 resources for this service (vNIC, mount, apache). The resulting resources looked like this:
noodle# cibadmin -Q -o resources > resources.xml noodle# cat resources.xml <resources> <group id="webservices"> <meta_attributes id="webservices-meta_attributes"> <nvpair id="webservices-meta_attributes-target-role" name="target-role" value="started"/> </meta_attributes> <primitive class="ocf" id="vnic" provider="heartbeat" type="IPaddr2"> <operations id="vnic-operations"> <op id="vnic-op-monitor-10s" interval="10s" name="monitor" timeout="20s"/> </operations> <instance_attributes id="vnic-instance_attributes"> <nvpair id="vnic-instance_attributes-ip" name="ip" value="192.168.1.13"/> <nvpair id="vnic-instance_attributes-nic" name="nic" value="eth0:1"/> </instance_attributes> <meta_attributes id="vnic-meta_attributes"> <nvpair id="vnic-meta_attributes-target-role" name="target-role" value="started"/> </meta_attributes> </primitive> <primitive class="ocf" id="mount" provider="heartbeat" type="Filesystem"> <operations id="mount-operations"> <op id="mount-op-monitor-20" interval="20" name="monitor" timeout="40"/> </operations> <instance_attributes id="mount-instance_attributes"> <nvpair id="mount-instance_attributes-device" name="device" value="storage.pasta.net:/vol/www"/> <nvpair id="mount-instance_attributes-directory" name="directory" value="/var/www"/> </instance_attributes> <meta_attributes id="mount-meta_attributes"> <nvpair id="mount-meta_attributes-target-role" name="target-role" value="started"/> </meta_attributes> </primitive> <primitive class="lsb" id="apache2" type="apache2"> <operations id="apache2-operations"> <op id="apache2-op-monitor-15" interval="15" name="monitor" start-delay="15" timeout="15"/> </operations> </primitive> </group> </resources> noodle#This could be reloaded with:
cibadmin --replace --scope resources --xml-file resources.xml
Procedure: clustering Icinga
- On all nodes install icinga
Work around BTS#599555 by creating XXXX containing the following (with hostname adjusted):
<VirtualHost *:80> ServerName icinga.pasta.net ServerAlias www.icinga.pasta.net DocumentRoot /usr/share/icinga/htdocs ScriptAlias /cgi-bin/icinga /usr/lib/cgi-bin/icinga # Where the stylesheets (config files) reside Alias /stylesheets /etc/icinga/stylesheets <Directory /usr/share/icinga/htdocs> Options FollowSymLinks Order allow,deny Allow from all </Directory> ErrorLog ${APACHE_LOG_DIR}/icinga.error.log CustomLog ${APACHE_LOG_DIR}/icinga.access.log combined </VirtualHost>Run:
/etc/init.d/apache2 reload
- In /etc/apache2/conf.d/icinga, locate the specification of the htpasswd.users file.
Use htpasswd to add an entry to that file.
At this point, I could access the Icinga tactical interface, but found it similar enough to Nagios that I did not want to continue.
