Recovering data lost due to TTL timestamp overflow

Reinsert data removed because the TTL timestamp wrapped around to 1900.

Earlier versions had no protection against expiration timestamps after the maximum date the storage engine can represent (2038-01-19T03:14:06+00:00). Before 5.1.7 and 5.0.12, a long TTL period that creates expiration timestamps greater than January 19, 2038 causes the date to overflow. The year wraps around and the data immediately expires. Records expired by overflow are not queryable and are permanently removed after a compaction. This only occurs when TTL value is close to the maximum 630720000 seconds (20 years).

The earliest possible date overflow for an expiration timestamps is 2018-01-19T03:14:06+00:00. As time progresses, the maximum supported TTL value gradually reduces as 2038-01-19T03:14:06+00:00 approaches.

To recover data with overflowed timestamps from SSTables that did not go through compaction after data was inserted with an overflowed expiration, use one of the following methods
Tip: To find out if an SSTable has an entry with overflowed expiration, inspect it with the sstablemetadata tool and look for a negative min local deletion time field. Back up SSTables in this condition immediately, as they are subject to data loss during compaction.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/cassandra/conf/cassandra.yaml

Procedure

Offline overflowed expiration recovery method

Online overflowed expiration recovery method

  • To recover data from a table that has not gone through compaction:
    Attention: Running scrub in a production environment may negatively impact cluster performance. DataStax recommends using the offline method instead.
    1. Disable compaction on the node.
      nodetool disableautocompaction
      Warning: This step is crucial. The data might be removed permanently during compaction.
    2. Copy the SSTables containing entries with overflowed expiration time to the data directory.
    3. Load the SSTables.
      nodetool refresh
    4. Run the scrub command with reinsert overflow option on the tables.
      nodetool scrub --reinsert-overflowed-ttl keyspace_name table_name

      See /en/dse/5.1/dse-admin/datastax_enterprise/tools/nodetool/toolsScrub.html#toolsScrub__reinsertTTLOverflow for details

    5. For indexed tables, use the dsetool reload_core (admin)dsetool reload_core (dev) command on a search node to load and reindex the reinserted values.
      dsetool reload_core reindex=true keyspace_name.table_name
    6. Re-enable compactions after verifying that scrub recovered the missing entries.