Configuring the Backup Service to upload very large files to Amazon S3

Change the settings of the agent to allow it to upload very large files to Amazon S3.

The default settings for the DataStax agent prevent it from uploading SSTables to S3 that are over a certain size. This limitation is in place to prevent the agent from using too much memory and will be lifted in future versions. The default maximum SSTable size is approximately 150 GB. This limitation can be increased by modifying some properties on the agent on each node. These properties are configured in the datastax-agent-env.sh file on each node. The defaults in datastax-agent-env.sh are:

JVM_OPTS="$JVM_OPTS -Xmx128M -Djclouds.mpu.parts.magnitude=100000
    -Djclouds.mpu.parts.size=16777216"

In order to increase the maximum SSTable size that the agent can upload, modify these properties:

-Xmx128M
-Djclouds.mpu.parts.size=16777216

The -Xmx setting controls the heap size of the agent. The -Djclouds setting controls the chunk size for files when uploading to S3. Since S3 supports multipart file uploads with a maximum number of 10,000 parts, the chunk size controls how large a file we can upload. Increasing the chunk size also requires using more memory on the agent, so the agent heap size also needs to be increased.

Here are example settings that allow loading 250 GB SSTables:

-Xmx256M
-Djclouds.mpu.parts.size=32000000

These settings increase the chunk size to 32MB and the heap size to 256MB and allow for the larger SSTable sizes.