Step 6: Optimize the Playlist application
Optimize the Playlist application with paging result sets and caching.
Now that the Playlist application is functionally complete, you will optimize its performance by limiting the number of items returned in our queries and enabling caching in Cassandra.
Package installations | /etc/cassandra/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |
Using paging in queries
When fetching large result sets, Cassandra caches the entire result set and sends it to the application in a single block. The Cassandra Java driver, however, has a paging feature that will retrieve the results of a query in configurable chunks. In this step, we will configure the paging feature to return sets 200 rows at a time for the query that returns the largest set, in TracksDAO.listSongsByGenre.
public static List<TracksDAO> listSongsByGenre(String genre, int num_tracks) {
String queryText = "SELECT * FROM track_by_genre WHERE genre = ? LIMIT ?";
PreparedStatement preparedStatement = getSession().prepare(queryText);
BoundStatement boundStatement = preparedStatement.bind(genre, num_tracks);
boundStatement.setFetchSize(200);
ResultSet results = getSession().execute(boundStatement);
List<TracksDAO> tracks = new ArrayList<>();
for (Row row : results) {
tracks.add(new TracksDAO(row));
}
return tracks;
}
The setFetchSize method is called on the statement object and set to the number of rows returned at a time. Choosing the correct number of rows depends on your data set and your application. Setting the paging size too small results in more queries being resent to Cassandra as the data set is traversed, resulting in poor performance.
Enabling row caching in Cassandra
When you enable row caching on a table, Cassandra will detect frequently accessed
partitions and store rows of data into a RAM cache. A cache increases the performance of queries
that access those rows by limiting the number of times Cassandra needs to read from disk
storage. You can configure how many rows to cache per partition by setting the
rows_per_partition
attribute of the caching
option when
creating or altering a table. Setting rows_per_partition
to
ALL
caches all the rows in the partition.
CREATE TABLE my_table (
id uuid PRIMARY KEY,
status text)
WITH caching = {'rows_per_partition':'100'}
In this case, you will alter the track_by_genre
and
track_by_artist
tables to cache the first 100 rows on each partition.
ALTER TABLE track_by_genre WITH caching = {'rows_per_partition':'100'};
ALTER TABLE track_by_artist WITH caching = {'rows_per_partition':'100'};
Using a row cache requires more memory on each node. The amount of memory Cassandra dedicates
to the row cache is configured in the row_cache_size_in_mb
option in
cassandra.yaml.
Cassandra also supports key caching, which helps Cassandra find the location of a partition on disk to decrease disk seek times. Key caches are enabled by default, so we don't need to explicitly turn the key cache on.
Changes from Step 5
To see all code changes from the step5
branch, enter the following command in
a terminal in the playlist directory:
git diff step5..step6