Paging Large Queries

Cassandra 2.0+ offers support for automatic query paging. Starting with version 2.0 of the driver, if protocol_version is greater than 2 (it is by default), queries returning large result sets will be automatically paged.

Controlling the Page Size

By default, Session.default_fetch_size controls how many rows will be fetched per page. This can be overridden per-query by setting fetch_size on a Statement. By default, each page will contain at most 5000 rows.

Handling Paged Results

Whenever the number of result rows for are query exceed the page size, an instance of PagedResult will be returned instead of a normal list. This class implements the iterator interface, so you can treat it like a normal iterator over rows:

from dse.query import SimpleStatement
query = "SELECT * FROM users"  # users contains 100 rows
statement = SimpleStatement(query, fetch_size=10)
for user_row in session.execute(statement):
    process_user(user_row)

Whenever there are no more rows in the current page, the next page will be fetched transparently. However, note that it is possible for an Exception to be raised while fetching the next page, just like you might see on a normal call to session.execute().

If you use Session.execute_async() along with, ResponseFuture.result(), the first page will be fetched before result() returns, but latter pages will be transparently fetched synchronously while iterating the result.

Handling Paged Results with Callbacks

If callbacks are attached to a query that returns a paged result, the callback will be called once per page with a normal list of rows.

Use ResponseFuture.has_more_pages and ResponseFuture.start_fetching_next_page() to continue fetching pages. For example:

class PagedResultHandler(object):

    def __init__(self, future):
        self.error = None
        self.finished_event = Event()
        self.future = future
        self.future.add_callbacks(
            callback=self.handle_page,
            errback=self.handle_err)

    def handle_page(self, rows):
        for row in rows:
            process_row(row)

        if self.future.has_more_pages:
            self.future.start_fetching_next_page()
        else:
            self.finished_event.set()

    def handle_error(self, exc):
        self.error = exc
        self.finished_event.set()

future = session.execute_async("SELECT * FROM users")
handler = PagedResultHandler(future)
handler.finished_event.wait()
if handler.error:
    raise handler.error

Resume Paged Results

You can resume the pagination when executing a new query by using the ResultSet.paging_state. This can be useful if you want to provide some stateless pagination capabilities to your application (ie. via http). For example:

from dse.query import SimpleStatement
query = "SELECT * FROM users"
statement = SimpleStatement(query, fetch_size=10)
results = session.execute(statement)

# save the paging_state somewhere and return current results
session['paging_stage'] = results.paging_state

# resume the pagination sometime later...
statement = SimpleStatement(query, fetch_size=10)
ps = session['paging_state']
results = session.execute(statement, paging_state=ps)