Enabling automatic failover

Configure automatic OpsCenter failover from the primary OpsCenter instance to the designated backup OpsCenter instance.

Follow these steps to enable automatic failover from the primary OpsCenter instance to the designated backup OpsCenter instance. Enabling failover requires minimal initial set up on the backup OpsCenter.

Important: When configuring automatic failover, both the primary and secondary OpsCenter instances must be running the same OpsCenter version. The OpsCenter daemon (opscenterd) and the DataStax Agents must be running the same version before configuring high availability. DataStax cannot guarantee results if the primary and secondary OpsCenter instances are running different versions.

To enable automatic OpsCenter failover:

opscenterd.conf

The location of the opscenterd.conf file depends on the type of installation:
  • Package installations: /etc/opscenter/opscenterd.conf
  • Tarball installations: install_location/conf/opscenterd.conf

lcm.db

The location of the Lifecycle Manager database lcm.db depends on the type of installation:
  • Package installations: /var/lib/opscenter/lcm.db
  • Tarball installations: install_location/lcm.db

lcm.key

The location of the Lifecycle Manager database encryption key lcm.key depends on the type of installation:
  • Package installations: /etc/opscenter/lcm.key
  • Tarball installations: install_location/keys/lcm.key

passwd.db

The default location of the password database passwd.db for OpsCenter authentication depends on the type of installation:
  • Package installations: /etc/opscenter/passwd.db
  • Tarball installations: install_location/passwd.db

Prerequisites

Warning: Ensure that address.yaml is not being managed by third-party Configuration Management. During failover, OpsCenter automatically changes stomp_interface in address.yaml to point to the backup opscenterd instance. If a separate Configuration Management system is managing address.yaml, that change might be undone when the Configuration Management system pushes its next update.

Procedure

  1. Optional: Set up a hostname/IP that can switch between primary and backup OpsCenter instances to avoid changing the browser URL for OpsCenter if a failover occurs.
    If you do not set up a hostname or IP for seamless URL switching post-failover, inform your OpsCenter users of any alternate URL to access OpsCenter.
  2. Mirror the configuration directories stored on the OpsCenter primary to the OpsCenter backup using the method you prefer, such as NFS mount or rysnc.
    1. If SSL is enabled, mirror the contents of the SSL configuration directory on the primary OpsCenter machine to the backup OpsCenter machine.
      • /var/lib/opscenter/ssl (package installs)
      • install_location/ssl (tarball installs)
      scp /var/lib/opscenter/ssl/* secondary:/var/lib/opscenter/ssl
    2. Mirror the contents of the main configuration directory on the primary OpsCenter machine to the backup OpsCenter machine.
      • /etc/opscenter (package installs)
      • install_location/conf (tarball installs)
      scp /etc/opscenter/* secondary:/etc/opscenter
      Note: The failover_configuration_directory should not be mirrored across OpsCenter installs when configuring OpsCenter to support failover.
    3. Mirror the contents of the persist_directory location that indicates the current status of the Repair Service. The location of the persist directory for the Repair Service depends on the type of install:
      • /var/lib/opscenter/repair_service (package installs)
      • install_location/repair_service (tarball installs)
      scp /var/lib/opscenter/repair_service/* secondary:/var/lib/opscenter/repair_service
      Repair Service progress is stored on the filesystem. If using an NFS mount to mirror to, the Repair Service starts up after a failover from approximately the same point where it was interrupted. If manually copying directories or using rysnc, the Repair Services resumes from whenever the Repair Service directory was last synced. Otherwise, the Repair Service simply restarts rather than continuing from where it left off.
    4. Mirror the Lifecycle Manager database lcm.db:
      • /var/lib/opscenter/lcm.db (package installs)
      • install_location/lcm.db (tarball installs)
      scp /var/lib/opscenter/lcm.db secondary:/var/lib/opscenter/lcm.db
    5. Mirror the Lifecycle Manager database encryption key lcm.key:
      • /etc/opscenter/lcm.key (package installs)
      • install_location/keys/lcm.key (tarball installs)
      scp /etc/opscenter/lcm.key secondary:/etc/opscenter/lcm.key
    6. If Lifecycle Manager has generated any certificates for clusters configured to use node-to-node or client-to-node encryption, mirror the Lifecycle Manager certificate authority.
      • /var/lib/opscenter/ssl/lcm (package installs)
      • install_location/ssl/lcm/cacerts (tarball installs)
      scp -r /var/lib/opscenter/ssl/lcm secondary:/var/lib/opscenter/ssl/
    7. If OpsCenter role-based security is enabled, mirror the roles and password database passwd.db:
        • /etc/opscenter/passwd.db (package installs)
        • install_location/passwd.db (tarball installs)
      scp /etc/opscenter/passwd.db secondary:/etc/opscenter/passwd.db
    8. Create and run an automated script to keep the mirrored directories in sync.

      The following example cron scripts run rsync to synchronize the configuration directories every 5 minutes for a package installation:

      */5 * * * * /usr/bin/rsync -az /etc/opscenter <user>@<backup_host>:/etc/opscenter
      */5 * * * * /usr/bin/rsync -az /var/lib/opscenter/ssl <user>@<backup_host>:/var/lib/opscenter/ssl

      The following example cron scripts run rsync to synchronize the configuration directories every 5 minutes for a tarball installation:

      */5 * * * * /usr/bin/rsync -az install_location/conf <user>@<backup_host>:install_location/conf
      */5 * * * * /usr/bin/rsync -az install_location/ssl <user>@<backup_host>:install_location/ssl
      Note:

      When a failover occurs, you must manually stop the sync scripts on the former primary and start the sync scripts on the new primary. Failure to do so will result in configuration changes on the new primary being overwritten by stale files from the former primary.

  3. Optional: If you want to override the default values, edit the [failover] section of the OpsCenter configuration file opscenterd.conf.
    Note: Making any changes to the opscenterd.conf file requires restarting OpsCenter.
    Table 1. OpsCenter daemon failover default configuration parameters
    Option Description Default
    heartbeat_period Frequency in seconds with which the primary OpsCenter sends a heartbeat to the backup OpsCenter. 10
    heartbeat_reply_period Frequency in seconds with which the OpsCenter backup sends a heartbeat to the primary OpsCenter. 300
    heartbeat_fail_window Amount of time in seconds that must elapse before the lack of a heartbeat triggers a failover. 60
    failover_configuration_directory Directory location where failover-specific configuration is stored. The failover_id file is also located in the failover directory.
    Note: The failover configuration directory should not be mirrored or replicated across OpsCenter installs when configuring OpsCenter to support failover.
    • /var/lib/opscenter/failover/ (package installs)
    • install_location/failover/ (tarball installs)
  4. On the backup OpsCenter in the failover directory, create a primary_opscenter_location configuration file that indicates the IP address of the primary OpsCenter daemon to monitor:
    • /var/lib/opscenter/failover/primary_opscenter_location (package installs)
    • install_location/failover/primary_opscenter_location (tarball installs)
    The primary_opscenter_location file should only contain the IP address of the primary OpsCenter instance and nothing more:
    cat primary_opscenter_location
    55.100.200.300
    Ensure the user running OpsCenter has at least read permission for the primary_opscenter_location file. Before the backup OpsCenter can take over as the primary OpsCenter, the backup OpsCenter deletes the primary_opscenter_location file in the event of a failover. After a failover, recreate the primary_opscenter_location file on the newly designated backup OpsCenter.