Configuring the agent to upload very large files to Amazon S3

Change the settings on the agent to allow uploading very large files to Amazon S3.

Change the settings on the agent to allow uploading very large files to Amazon S3. The default settings for the DataStax agent prevent it from uploading SSTables that are over a certain size to S3. This limitation is in place to prevent the agent from using too much memory and will be lifted in future versions. The default maximum SSTable size is approximately 150 GB. Increase the maximum SSTable size allowed for uploading by modifying some agent properties on each node. These properties are configured in the datastax-agent-env.sh file on each node. The defaults in datastax-agent-env.sh are:

JVM_OPTS="$JVM_OPTS -Xmx128M -Djclouds.mpu.parts.magnitude=100000
    -Djclouds.mpu.parts.size=16777216"
To increase the maximum SSTable size that the agent can upload, modify these properties:
-Xmx128M
-Djclouds.mpu.parts.size=16777216
  • The -Xmx setting controls the heap size of the agent.
  • The -Djclouds setting controls the chunk size for files when uploading to S3.

Since S3 supports multipart file uploads with a maximum number of 10,000 parts, the chunk size controls how large a file the agent can upload. Increasing the chunk size requires using more memory on the agent, so increasing the agent heap size is also required.

Here are example settings that allow loading 250 GB SSTables:

-Xmx256M
-Djclouds.mpu.parts.size=32000000

These settings increase the chunk size to 32MB and the heap size to 256MB, which allows for uploading the larger SSTable sizes.