Step 6: Optimize the Playlist application

Optimize the Playlist application with paging result sets and caching.

Now that the Playlist application is functionally complete, you will optimize its performance by limiting the number of items returned in our queries and enabling caching in Cassandra.

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

Using paging in queries 

When fetching large result sets, Cassandra caches the entire result set and sends it to the application in a single block. The Cassandra Java driver, however, has a paging feature that will retrieve the results of a query in configurable chunks. In this step, we will configure the paging feature to return sets 200 rows at a time for the query that returns the largest set, in TracksDAO.listSongsByGenre.

public static List<TracksDAO> listSongsByGenre(String genre, int num_tracks) {

  String queryText = "SELECT * FROM track_by_genre WHERE genre = ? LIMIT ?";
  PreparedStatement preparedStatement = getSession().prepare(queryText);
  BoundStatement boundStatement = preparedStatement.bind(genre, num_tracks);
  boundStatement.setFetchSize(200);
  ResultSet results = getSession().execute(boundStatement);

  List<TracksDAO> tracks = new ArrayList<>();

  for (Row row : results) {
    tracks.add(new TracksDAO(row));
  }

  return tracks;
}

The setFetchSize method is called on the statement object and set to the number of rows returned at a time. Choosing the correct number of rows depends on your data set and your application. Setting the paging size too small results in more queries being resent to Cassandra as the data set is traversed, resulting in poor performance.

Enabling row caching in Cassandra 

When you enable row caching on a table, Cassandra will detect frequently accessed partitions and store rows of data into a RAM cache. A cache increases the performance of queries that access those rows by limiting the number of times Cassandra needs to read from disk storage. You can configure how many rows to cache per partition by setting the rows_per_partition attribute of the caching option when creating or altering a table. Setting rows_per_partition to ALL caches all the rows in the partition.

CREATE TABLE my_table (
id uuid PRIMARY KEY,
status text)
WITH caching = {'rows_per_partition':'100'}

In this case, you will alter the track_by_genre and track_by_artist tables to cache the first 100 rows on each partition.

ALTER TABLE track_by_genre WITH caching = {'rows_per_partition':'100'};
ALTER TABLE track_by_artist WITH caching = {'rows_per_partition':'100'};

Using a row cache requires more memory on each node. The amount of memory Cassandra dedicates to the row cache is configured in the row_cache_size_in_mb option in cassandra.yaml.

Cassandra also supports key caching, which helps Cassandra find the location of a partition on disk to decrease disk seek times. Key caches are enabled by default, so we don't need to explicitly turn the key cache on.

Changes from Step 5 

To see all code changes from the step5 branch, enter the following command in a terminal in the playlist directory:

git diff step5..step6