Caching edges and properties

How to configure graph caching for edges and properties.

Caching can improve query performance and is configurable. DSE Graph has two types of cache: adjacency list cache and index/property cache. Either edges or properties can be cached using the schema API vertexLabel() method with the cache() option. Caching can be configured for all edges, all properties, or a filtered set of edges. Vertices are not cached directly, but caching properties and edges that define the relationship between vertices essentially accomplishes the same operation.

Property caching is enabled if indexes exist and are used in the course of queries. Full graph scan queries will not be cached. If an index does not exist, then caching does not occur. Adjacency list caching is enabled if caching is configured for edges.

The caches are local to a node and data is loaded into cache when it is read with a query. Both caches are set to a default size of 128 MB in the dse.yaml file. The settings are adjacency_cache_size_in_mb and index_cache_size_in_mb. Both caches utilize off-heap memory implemented as Least Recently Used (LRU) cache.

Caching is intended to help make queries more efficient if the same information is required in a later query. For instance, caching the calories property for meal vertices will improve the retrieval of a query asking for all meals with a calorie count less than 850 calories.

Graph cache is local to each node in the cluster, so the cached data can be different between nodes. Thus, a query can use cache on one node, but not on another. The caches are updated only when the data is not found. Graph caching does not have any means of eviction. No flushing occurs, and the cache is not updated if an element is deleted or modified. The cache will only evict data based on the time-to-live (TTL) value set when the cache is configured for an element. Set a low TTL value for elements (property keys, vertex labels, edge labels) that change often to avoid stale data.

Graph cache is useful for rarely changed graph data. The queries that will use graph cache effectively are queries that repeatedly run. If the queries run differ even in the sort order, the graph cache will not be used to reduce the query latency. For instance, caching the calories property for meal vertices will improve the retrieval of a query asking for all meals with a calorie count less than 850 calories, if this query is repeated. Note that all properties for all meal vertices will be cached along with calories.

The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml

Procedure

  • Cache all properties for author vertices up to an hour (3600 seconds):
    schema.vertexLabel('author').cache().properties().ttl(3600).add()

    Enabling property cache causes index queries to use IndexCache for the specified vertex label.

  • Cache both incoming and outgoing created edges for author vertices up to a minute (60 seconds):
    schema.vertexLabel('author').cache().bothE('created').ttl(60).add()