Datacenter-aware Round Robin Policy

A specialized Round Robin load balancing policy allows for querying remote datacenters only when all local nodes are down. This policy will round robin requests across hosts in the local datacenter, falling back to remote datacenter if necessary. The name of the local datacenter must be supplied by the user.

By default, this policy will not actually fall back to nodes of a remote datacenter. You must configure the exact number of remote hosts that will be used by passing that number when constructing a policy instance. A nil value means an unlimited number of remote hosts can be potentially used.

By default, this policy will not attempt to use remote hosts for local consistencies (:local_one or :local_quorum), however, it is possible to change that behavior via the constructor.

Background

Given
a running cassandra cluster in 2 datacenters with 2 nodes in each
And
the following schema:
CREATE KEYSPACE simplex WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};
CREATE TABLE simplex.songs (
  id uuid PRIMARY KEY,
  title text,
  album text,
  artist text,
  tags set<text>,
  data blob
);
INSERT INTO simplex.songs (id, title, album, artist, tags)
VALUES (
   756716f7-2e54-4715-9f00-91dcbea6cf50,
   'La Petite Tonkinoise',
   'Bye Bye Blackbird',
   'Joséphine Baker',
   {'jazz', '2013'})
;
INSERT INTO simplex.songs (id, title, album, artist, tags)
VALUES (
   f6071e72-48ec-4fcb-bf3e-379c8a696488,
   'Die Mösch',
   'In Gold',
   'Willi Ostermann',
   {'kölsch', '1996', 'birds'}
);
INSERT INTO simplex.songs (id, title, album, artist, tags)
VALUES (
   fbdf82ed-0063-4796-9c7c-a3d4f47b4b25,
   'Memo From Turner',
   'Performance',
   'Mick Jager',
   {'soundtrack', '1991'}
);

First seen datacenter is considered local when not explicitly given

Given
the following example:
require 'cassandra'

policy     = Cassandra::LoadBalancing::Policies::DCAwareRoundRobin.new
hosts      = ['127.0.0.3', '127.0.0.4']
cluster    = Cassandra.cluster(hosts: hosts, load_balancing_policy: policy)
session    = cluster.connect('simplex')

hosts_used = 4.times.map do
  info = session.execute("SELECT * FROM songs").execution_info
  info.hosts.last.ip
end.sort.uniq

puts hosts_used
When
it is executed
Then
its output should contain:
127.0.0.3
127.0.0.4

Requests are automatically routed to local datacenter

Given
the following example:
require 'cassandra'

datacenter = "dc2"
policy     = Cassandra::LoadBalancing::Policies::DCAwareRoundRobin.new(datacenter)
cluster    = Cassandra.cluster(load_balancing_policy: policy, consistency: :one)
session    = cluster.connect('simplex')

hosts_used = 4.times.map do
  info = session.execute("SELECT * FROM songs").execution_info
  info.hosts.last.ip
end.sort.uniq

puts hosts_used
When
it is executed
Then
its output should contain:
127.0.0.3
127.0.0.4

Requests are routed to remote datacenters if local datacenter is down

Given
the following example:
require 'cassandra'

datacenter     = "dc2"
remotes_to_try = nil
policy         = Cassandra::LoadBalancing::Policies::DCAwareRoundRobin.new(datacenter, remotes_to_try)
cluster        = Cassandra.cluster(load_balancing_policy: policy, consistency: :one)
session        = cluster.connect('simplex')

hosts_used = 4.times.map do
  info = session.execute("SELECT * FROM songs").execution_info
  info.hosts.last.ip
end.sort.uniq

puts hosts_used
And
node 3 is stopped
And
node 4 is stopped
When
it is executed
Then
its output should contain:
127.0.0.1
127.0.0.2

Requests are routed up to a maximum number of hosts in remote datacenters

Given
the following example:
require 'cassandra'

datacenter     = "dc2"
remotes_to_try = 1
policy         = Cassandra::LoadBalancing::Policies::DCAwareRoundRobin.new(datacenter, remotes_to_try)
cluster        = Cassandra.cluster(load_balancing_policy: policy, consistency: :one)
session        = cluster.connect('simplex')

hosts_used = 4.times.map do
  info = session.execute("SELECT * FROM songs").execution_info
  info.hosts.last.ip
end.sort.uniq

puts "Used #{hosts_used.size} host, with ip #{hosts_used.first}"
And
node 3 is stopped
And
node 4 is stopped
When
it is executed
Then
its output should match:
Used 1 host, with ip 127\.0\.0\.(1|2)

Requests with local consistencies are not routed to remote datacenters by default

Given
the following example:
require 'cassandra'

datacenter     = "dc2"
remotes_to_try = nil
policy         = Cassandra::LoadBalancing::Policies::DCAwareRoundRobin.new(datacenter, remotes_to_try)
cluster        = Cassandra.cluster(load_balancing_policy: policy, consistency: :one)
session        = cluster.connect('simplex')

begin
  session.execute("SELECT * FROM songs", consistency: :local_one)
  puts "failure"
rescue Cassandra::Errors::NoHostsAvailable
  puts "success"
end
And
node 3 is stopped
And
node 4 is stopped
When
it is executed
Then
its output should contain:
success

Routing requests with local consistencies to remote datacenters

Given
the following example:
require 'cassandra'

datacenter     = "dc2"
remotes_to_try = nil
use_remote     = true
policy         = Cassandra::LoadBalancing::Policies::DCAwareRoundRobin.new(datacenter, remotes_to_try, use_remote)
cluster        = Cassandra.cluster(load_balancing_policy: policy)
session        = cluster.connect('simplex')

hosts_used = 4.times.map do
  info = session.execute("SELECT * FROM songs", consistency: :local_one).execution_info
  info.hosts.last.ip
end.sort.uniq

puts hosts_used
And
node 3 is stopped
And
node 4 is stopped
When
it is executed
Then
its output should contain:
127.0.0.1
127.0.0.2