Indexing graph overview
Explain indexes and how they affect DSE Graph performance.
DSE Graph implements two types of indexes, vertex-centric indexes and global indexes. Vertex-centric indexes are local and specific to a single vertex. Global indexes are specific to a vertex label and property and are graph-wide. All indexes contribute to the performance of graph traversals on large distributed graphs.
Vertex-centric indexing (VCI) overview
Vertex-centric indexes (VCI) are specific to a vertex, unlike global indexes which are global to the graph and index elements for fast global lookups. Vertex-centric indexes sort and index the incident edges and adjacent vertices of a vertex according to the incident edge labels or properties. When a vertex is queried, its index is consulted to avoid linear scans of all incident edges. Traversals can be reduced to O(1) or O(log n) from O(n). A typical graph traversal touches numerous vertices, compounding the cost of each incident edge scan if indexes are not consulted.
In DSE Graph, vertex-centric indexing is maintained as a materialized view (MV) table. When data is written or updated in the graph, the index is updated rather than utilizing the read repair functionality of Cassandra. A consequence of using a MV table is higher write latencies, but results in lower read latencies for graph traversals. Edge indexes and property indexes are vertex-centric indexes.
Global indexing overview
Indexes can affect traversal query performance. Decreasing the number of starting points for a graph traversal can greatly reduce the latency for a query result. If a traversal must start by checking all the vertices in a graph, time is lost finding the right starting point. If a starting vertex can be identified, that time is not required. Indexing the location of vertices based on the vertex label and property value improves the performance.
Global indexing uses the built-in indexing features of DSE. DSE Graph is stored in Cassandra, so two types of indexing available are based on Cassandra secondary indexing capability and materialized views. In addition, DSE Graph can take advantage of DSE Search for full text and string indexing. Global indexing can be accomplished with any of these three indexing types. The type selected depends on the data itself. The type of index lookup will also affect performance.
selectivity = ( cardinality / number of rows ) * 100%
In
general, low cardinality results in low selectivity, and high cardinality results in high
selectivity. Searching by country is a good candidate for secondary indexing.Materialized view indexing uses Cassandra materialized views to store data. Materialized views are tables generated from a base table to provide a query based on a different primary key than the base table. This type of index is best used for values of high cardinality or nearly unique values or high selectivity. Searching materialized views yields similar response times to searching base tables, although writing the data incurs a small time penalty.
Search indexes rely on DSE Search. Since graph data is stored in Cassandra tables, one search core is available per vertex label. These indexes differ from secondary and materialized view indexes in their syntax due to this unique property. For each vertex label that will be indexed with search, all properties must be added to a single search index named search. Because search is implemented with DSE Search, two indexing options are available, full text and string. Full text indexing performs tokenization and secondary processing such as case normalization. Full text indexing is useful for queries where partial match of text is required. String indexing is useful for queries where an exact string is sought and no tokenization must be performed, such as for Solr faceting. This type of index is best for low selectivity, like secondary indexes.
Composite index keys are not currently supported in DSE Graph.
Indexing best practices
More than one index can be created on the same property, such as creating both a
materialized index and a search index on the property amount
. The DSE Graph
query optimizer will automatically use the appropriate index when processing a query;
designation of an index type to use is not a feature. The order of preference that DSE Graph
uses is MV index > secondary index > DSE Search index to ensure best performance. Different
index types may be created on different properties as appropriate, based on the selectivity.
A special case exists for indexing vertices created with composite keys; a search index is the only choice for indexing two or more
properties, especially for graph loading with the DSE Graph
Loader. Separate materialized view indexes will not be used for the property keys
that make up the composite key (custom vertex id) and the DSE Graph Loader will fail to
create the vertices.
In general, secondary indexes in DSE Graph are limited in usefulness, for the same reasons that constrict their general use in DSE. Materialized view indexing should be considered
If a search index is created, be aware that building the index can take time, and that until the index is available, queries that depend on the index can fail. Applications that create schema, immediately followed by data insertion that require search indexes will likely experience errors. Also, queries that use search indexes should be run on DSE Search-enabled nodes in the cluster.
Search indexes do require resources. Each index allocates a minimum of 256MB by default, and each index will require two physical cores. For a typical 32GB node, 16 search indexes would be a reasonable number to create.
tokenRegex
will display case insensitivity in
queries, whether a search index is used or not.