Using the in-memory option
DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively.
DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively. No disk I/O occurs. Consider using the in-memory option for storing a modest amount of data, mostly composed of overwrites, such as an application for mirroring stock exchange data. Only the prices fluctuate greatly while the keys for the data remain relatively constant. Generally, the table you design for use in-memory should have the following characteristics:
- Store a small amount of data
- Experience a workload that is mostly overwrites
- Be heavily trafficked
Check performance metrics using OpsCenter, for example, before and after using the in-memory option.
Limitation
$ bin/dsetool inmemorystatus ks1 users Keyspace ColumnFamily Size Capacity Usage ks1 users 0MB 1MB 52%
Creating a table using the in-memory option
In CQL, to create a table that uses the in-memory option, add a CQL directive to the CREATE TABLE statement. Use the compaction directive in the statement to specify the MemoryOnlyStrategy class and size_limit_in_mb property, which limits the amount of data that the table can accommodate.
CREATE TABLE users ( uid text, fname text, lname text, PRIMARY KEY (uid) ) WITH compaction= { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 1 } AND caching = 'NONE';
To enable metered flushing, configure the memtable_flush_period_in_ms using the CREATE TABLE or ALTER TABLE statement.
Altering an on-disk table
Use the ALTER TABLE statement to change a traditional table to one that uses the in-memory option, or vice versa. For example, suppose you have a traditional table named emp. Using the DESCRIBE command, you can see that the table is a traditional table by the absence of a line in the output that looks something like this:
compaction={'size_limit_in_mb': '1', 'class': 'MemoryOnlyStrategy'} >
Alter the emp table to use the in-memory option and, as a best practice, disable caching:
ALTER TABLE emp WITH compaction = { 'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 1 } AND caching = 'NONE';
Limiting the size of tables
The size_limit_in_mb property is a required property of the in-memory option schema that you configure using CREATE TABLE or ALTER TABLE. Valid values are 1 - 1024, which limits tables in memory to 1GB (1024MB) per node. It is possible, but not recommended, to create multiple 1GB tables, but no single table can exceed 1GB per node. For example, the total space you can allocate to a table in memory is 1GB * Nodes / replication factor; therefore, this configuration in a 10 node cluster can accommodate 5GB of data distributed over the cluster:
- size_limit_in_mb=1024
- replication factor = 2
Disabling key caching
DataStax recommends disabling caching on tables configured to use the in-memory option. An error is logged if key caching is not disabled. Enabling row caching, on the other hand, causes an error condition. To disable both types of caching, set the table caching property to NONE.
ALTER TABLE users WITH caching = 'NONE';
Managing available memory
Running in a distributed environment, DataStax Enterprise cannot prevent you from adding excessive data that exceeds the available memory. Differences in the data size from node to node that might exist make such prevention impossible. It is the Cassandra administrator's responsibility to manage available memory carefully.
Failure to manage available memory when using the in-memory option results in an error message that looks something like this when capacity is exceeded:
SEVERE: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive) SEVERE: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.53.120.13 (null), abc.com/10.53.122.15 (Timeout during read), abc.com/10.53.120.18 (null)) . . .
Checking available memory
Cassandra does not hold any locks on data while running requests, so concurrent write requests might exceed the size_limit_in_mb a bit. Cassandra provides the AllMemtablesDataSize metric to check available memory, so you can ensure that you have more available memory for a table than the size limit allows. Use OpsCenter or JMX to check the AllMemtablesDataSize metric to determine available memory. As mentioned previously, memtables flushes do not reduce the size of in-memory data.
Checking table properties
In cqlsh, use the DESCRIBE command to view table properties.
cqlsh> DESCRIBE TABLE users;
The output includes the size limit of the table data, size_limit_in_mb and whether or not the table uses the in-memory option:
CREATE TABLE users ( uid text PRIMARY KEY, fname text, lname text ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=432000 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'size_limit_in_mb': '1', 'class': 'MemoryOnlyStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'} AND caching = 'NONE';
Overwriting data best practice
Overwrite data in memory using CQL insert or update operations. Overwriting in-memory data takes advantage of the memory capacity you have.
Backing up and restoring data
The procedure for backing up and restoring data is the same for in-memory and on-disk data. During the snapshot process, Cassandra flushes data to disk, and then creates hard links to the backup-up SSTable files for each keyspace in another named directory.
Flushing data to disk
To enable flushing to disk of the memtable data, change the default setting of the memtable_flush_period_in_ms table property from 0 (disable) to a higher number, such as every hour (3600 seconds). When the memtable flush period expires, Cassandra writes the contents of the memtable to disk, purges the data in the commit log. The size of in-memory data is not affected by flushing. When Cassandra flushes data in tables using the in-memory option to disk, new SSTables replace the old ones. When Cassandra flushes data to disk in tables that are not in-memory tables, old SSTables are not replaced.
Flushing data to disk does not remove in-memory data from the heap, as previously mentioned.
To automatically flush data to disk, configure the memtable_flush_period_in_ms using the CREATE TABLE or ALTER TABLE command. For example, configure the users_flushed table to flush the memtable every 3600ms.
CREATE TABLE users_flushed ( uid text, fname text, lname text, PRIMARY KEY (uid) ) WITH compaction={'class': 'MemoryOnlyStrategy', 'size_limit_in_mb': 1} AND memtable_flush_period_in_ms = 3600 AND caching = 'NONE';
Alternatively, you can flush data to disk manually. To manually flush data to disk, use the nodetool flush command. For example, in the bin directory, flush the data from mykeyspace and mytable:
nodetool flush mykeyspace mytable
The nodetool flush command performs the operation on the current node and results in the following background operations:
- Creates a new SSTable
- Deletes the commit logs that refer to data in the flushed memtables
To save time, flushing data to disk is recommended before backing up in-memory data.