Enabling automatic failover
Configure automatic OpsCenter failover from the primary OpsCenter instance to the designated backup OpsCenter instance.
opscenterd.conf
The location of the opscenterd.conf file depends on the type of installation:
- Installer-Services or package installations: /etc/opscenter/opscenterd.conf
- Installer-No Services or tarball installations: install_location/conf/opscenterd.conf
- Windows installations: Program Files (x86)\DataStax Community\opscenter\conf\opscenterd.conf
address.yaml
The location of the address.yaml file depends on the type of installation:
- Installer-Services or package installations: /var/lib/datastax-agent/conf/address.yaml
- Installer-No Services or tarball installations: install_location/conf/address.yaml
Follow these steps to enable automatic failover from the primary OpsCenter instance to the designated backup OpsCenter instance. Enabling failover requires minimal initial set up on the backup OpsCenter.
To enable automatic OpsCenter failover:
Procedure
- Optional:
Set up a hostname/IP that can switch between primary
and backup OpsCenter instances to avoid changing the browser URL for OpsCenter
if a failover occurs.
If you do not set up a hostname or IP for seamless URL switching post-failover, inform your OpsCenter users of any alternate URL to access OpsCenter.
-
Mirror the configuration directories stored on the OpsCenter primary to the
OpsCenter backup using the method you prefer, such as NFS mount or rysnc.
-
If SSL is enabled, mirror the contents of the SSL configuration
directory on the primary OpsCenter machine to the backup OpsCenter
machine.
- /var/lib/opscenter/ssl (package installs)
- install_location/ssl (Installer-No Services or tarball installations)
$ scp /var/lib/opscenter/ssl/* secondary:/var/lib/opscenter/ssl
-
Mirror the contents of the main configuration directory on the primary
OpsCenter machine to the backup OpsCenter machine.
- /etc/opscenter (Installer-Services or package installations)
- install_location/conf (Installer-No Services or tarball installations)
$ scp /etc/opscenter/* secondary:/etc/opscenter
Note: Thefailover_configuration_directory
should not be mirrored across OpsCenter installs when configuring OpsCenter to support failover. -
Mirror the contents of the persist_directory location that indicates the current status
of the Repair Service. The location of the persist directory for the
Repair Service depends on the type of install:
- /var/lib/opscenter/repair_service (Installer-Services or package installations)
- install_location/repair_service (Installer-No Services or tarball installations)
$ scp /var/lib/opscenter/repair_service/* secondary:/var/lib/opscenter/repair_service
Repair Service progress is stored on the filesystem. If using an NFS mount to mirror to, the Repair Service starts up after a failover from approximately the same point where it was interrupted. If manually copying directories or using rysnc, the Repair Services resumes from whenever the Repair Service directory was last synced. Otherwise, the Repair Service simply restarts rather than continuing from where it left off. -
Create and run an automated script to keep the
mirrored directories in sync.
The following example cron scripts run rsync to synchronize the configuration directories every 5 minutes for Installer-Services or package installations:
*/5 * * * * /usr/bin/rsync -az /etc/opscenter <user>@<backup_host>:/etc/opscenter
*/5 * * * * /usr/bin/rsync -az /var/lib/opscenter/ssl <user>@<backup_host>:/var/lib/opscenter/ssl
The following example cron scripts run rsync to synchronize the configuration directories every 5 minutes for a Installer-No Services or tarball installations:
*/5 * * * * /usr/bin/rsync -az install_location/conf <user>@<backup_host>:install_location/conf
*/5 * * * * /usr/bin/rsync -az install_location/ssl <user>@<backup_host>:install_location/ssl
Note:When a failover occurs, you must manually stop the sync scripts on the former primary and start the sync scripts on the new primary. Failure to do so will result in configuration changes on the new primary being overwritten by stale files from the former primary.
-
If SSL is enabled, mirror the contents of the SSL configuration
directory on the primary OpsCenter machine to the backup OpsCenter
machine.
- Optional:
If you want to override the default values, edit the
[failover]
section of the OpsCenter configuration file opscenterd.conf.OpsCenter daemon failover default configuration parameters Option Description Default heartbeat_period Frequency in seconds with which the primary OpsCenter sends a heartbeat to the backup OpsCenter. 10 heartbeat_reply_period Frequency in seconds with which the OpsCenter backup sends a heartbeat to the primary OpsCenter. 300 heartbeat_fail_window Amount of time in seconds that must elapse before the lack of a heartbeat triggers a failover. 60 failover_configuration_directory Directory location where failover-specific configuration is stored. The failover_id
file is also located in the failover directory.Note: The failover configuration directory should not be mirrored or replicated across OpsCenter installs when configuring OpsCenter to support failover.- /var/lib/opscenter/failover/ (package installs)
- /opscenterd/failover/ (Installer-No Services or tarball installations)
-
On the backup OpsCenter in the failover directory,
create a
primary_opscenter_location
configuration file that indicates the IP address of the primary OpsCenter daemon to monitor:/var/lib/opscenter/failover/primary_opscenter_location
(package installs)/opscenterd/failover/primary_opscenter_location
(Installer-No Services or tarball installations)
Theprimary_opscenter_location
file should only contain the IP address of the primary OpsCenter instance and nothing more:$ cat primary_opscenter_location
55.100.200.300
Ensure the user running OpsCenter has at least read permission for theprimary_opscenter_location
file. Before the backup OpsCenter can take over as the primary OpsCenter, the backup OpsCenter deletes theprimary_opscenter_location
file in the event of a failover. After a failover, recreate theprimary_opscenter_location
file on the newly designated backup OpsCenter. -
Ensure that address.yaml is not being managed
by third-party Configuration Management. During failover, OpsCenter
automatically changes
stomp_interface
in address.yaml to point to the backup opscenterd instance. If a separate Configuration Management system is managing address.yaml, that change might be undone when the Configuration Management system pushes its next update.