Using the in-memory option

DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively.

DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively. No disk I/O occurs. Consider using the in-memory option for storing a modest amount of data, mostly composed of overwrites, such as an application for mirroring stock exchange data. Only the prices fluctuate greatly while the keys for the data remain relatively constant. Generally, the table you design for use in-memory should have the following characteristics:

  • Store a small amount of data
  • Experience a workload that is mostly overwrites
  • Be heavily trafficked

Check performance metrics using OpsCenter, for example, before and after using the in-memory option.

Limitation 

Currently, the in-memory option uses memory in the Java heap. Manage available memory carefully. Use the dsetool inmemorystatus command to get the size, capacity, and percentage of memory in MB used by a table. Bytes are truncated. For example:
$ bin/dsetool inmemorystatus ks1 users
Keyspace        ColumnFamily            Size     Capacity   Usage
ks1             users                    0MB          1MB     52%

Creating a table using the in-memory option 

In CQL, to create a table that uses the in-memory option, add a CQL directive to the CREATE TABLE statement. Use the compaction directive in the statement to specify the MemoryOnlyStrategy class and size_limit_in_mb property, which limits the amount of data that the table can accommodate.

CREATE TABLE users (
uid text,
fname text,
lname text,
PRIMARY KEY (uid)
) WITH compaction= { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 1 } AND caching = 'NONE';

To enable metered flushing, configure the memtable_flush_period_in_ms using the CREATE TABLE or ALTER TABLE statement.

Altering an on-disk table 

Use the ALTER TABLE statement to change a traditional table to one that uses the in-memory option, or vice versa. For example, suppose you have a traditional table named emp. Using the DESCRIBE command, you can see that the table is a traditional table by the absence of a line in the output that looks something like this:

compaction={'size_limit_in_mb': '1', 'class': 'MemoryOnlyStrategy'} >

Alter the emp table to use the in-memory option and, as a best practice, disable caching:

ALTER TABLE emp WITH compaction =
  { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 1 }
  AND caching = 'NONE';

Limiting the size of tables 

The size_limit_in_mb property is a required property of the in-memory option schema that you configure using CREATE TABLE or ALTER TABLE. Valid values are 1 - 1024, which limits tables in memory to 1GB (1024MB) per node. It is possible, but not recommended, to create multiple 1GB tables, but no single table can exceed 1GB per node. For example, the total space you can allocate to a table in memory is 1GB * Nodes / replication factor; therefore, this configuration in a 10 node cluster can accommodate 5GB of data distributed over the cluster:

  • size_limit_in_mb=1024
  • replication factor = 2

Disabling key caching 

DataStax recommends disabling caching on tables configured to use the in-memory option. An error is logged if key caching is not disabled. Enabling row caching, on the other hand, causes an error condition. To disable both types of caching, set the table caching property to NONE.

ALTER TABLE users WITH caching = 'NONE';

Managing available memory 

Running in a distributed environment, DataStax Enterprise cannot prevent you from adding excessive data that exceeds the available memory. Differences in the data size from node to node that might exist make such prevention impossible. It is the Cassandra administrator's responsibility to manage available memory carefully.

Failure to manage available memory when using the in-memory option results in an error message that looks something like this when capacity is exceeded:

SEVERE: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)
SEVERE: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.53.120.13 (null), abc.com/10.53.122.15 (Timeout during read), abc.com/10.53.120.18 (null))
.
.
.

Checking available memory

Cassandra does not hold any locks on data while running requests, so concurrent write requests might exceed the size_limit_in_mb a bit. Cassandra provides the AllMemtablesDataSize metric to check available memory, so you can ensure that you have more available memory for a table than the size limit allows. Use OpsCenter or JMX to check the AllMemtablesDataSize metric to determine available memory. As mentioned previously, memtables flushes do not reduce the size of in-memory data.

Checking table properties

In cqlsh, use the DESCRIBE command to view table properties.

cqlsh> DESCRIBE TABLE users;

The output includes the size limit of the table data, size_limit_in_mb and whether or not the table uses the in-memory option:

CREATE TABLE users (
  uid text PRIMARY KEY,
  fname text,
  lname text
 ) WITH
 bloom_filter_fp_chance=0.010000 AND
 caching='KEYS_ONLY' AND
 comment='' AND
 dclocal_read_repair_chance=0.000000 AND
 gc_grace_seconds=432000 AND
 read_repair_chance=0.100000 AND
 replicate_on_write='true' AND
 populate_io_cache_on_flush='false' AND
 compaction={'size_limit_in_mb': '1', 'class': 'MemoryOnlyStrategy'} AND
 compression={'sstable_compression': 'LZ4Compressor'} AND
 caching = 'NONE';

Overwriting data best practice

Overwrite data in memory using CQL insert or update operations. Overwriting in-memory data takes advantage of the memory capacity you have.

Backing up and restoring data 

The procedure for backing up and restoring data is the same for in-memory and on-disk data. During the snapshot process, Cassandra flushes data to disk, and then creates hard links to the backup-up SSTable files for each keyspace in another named directory.

Flushing data to disk

To enable flushing to disk of the memtable data, change the default setting of the memtable_flush_period_in_ms table property from 0 (disable) to a higher number, such as every hour (3600 seconds). When the memtable flush period expires, Cassandra writes the contents of the memtable to disk, purges the data in the commit log. The size of in-memory data is not affected by flushing. When Cassandra flushes data in tables using the in-memory option to disk, new SSTables replace the old ones. When Cassandra flushes data to disk in tables that are not in-memory tables, old SSTables are not replaced.

Flushing data to disk does not remove in-memory data from the heap, as previously mentioned.

To automatically flush data to disk, configure the memtable_flush_period_in_ms using the CREATE TABLE or ALTER TABLE command. For example, configure the users_flushed table to flush the memtable every 3600ms.

CREATE TABLE users_flushed (
  uid text,
  fname text,
  lname text,
  PRIMARY KEY (uid)
  ) WITH compaction={'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 1}
    AND memtable_flush_period_in_ms = 3600 AND caching = 'NONE';

Alternatively, you can flush data to disk manually. To manually flush data to disk, use the nodetool flush command. For example, in the bin directory, flush the data from mykeyspace and mytable:

nodetool flush mykeyspace mytable

The nodetool flush command performs the operation on the current node and results in the following background operations:

  • Creates a new SSTable
  • Deletes the commit logs that refer to data in the flushed memtables

To save time, flushing data to disk is recommended before backing up in-memory data.