Indexing
Explain indexes and how they affect DSE Graph performance.
Indexes play a significant role in making DSE Graph queries performant. Graph queries that must traverse the entire graph to find information will have poor performance, which explains why full-scan queries are disallowed in production environments. Two aspects of querying a graph can be improved with indexing: the initial vertex or vertices from which to start a traversal, and the narrowing of the edges and vertices to traverse from this starting point. DSE Graph implements two types of indexes, global indexes and vertex-centric indexes (VCIs) to address these different aspects of query processing. Global indexes are used to find the starting point for a query and involve finding a matching value for a value of a vertex property. Vertex-centric indexes are used to narrow down the scope of a query after a starting point is defined.
Global indexing overview
g.V(['~label':'person', 'personId':1])
However,
identifying a vertex by vertex id is rather restrictive. Using a vertex label and a property in
a traversal allows DSE Graph to identify the DSE node where the vertex data resides without
reading all data from all DSE nodes. Most graph queries will first use a global index to find a
starting vertex with a friendlier
property:g.V().has('person', 'name', 'Julia Child')
Since the property
name
is not part of the vertex id, an index is required to match the search
conditions with the correct vertex, and that index is a global index.Global indexing in DSE Graph can be accomplished with one of three DSE indexing methods: a materialized view (MV), a search index, or a secondary index.
selectivity = ( cardinality / number of rows ) * 100%
In general,
low cardinality results in low selectivity, and high cardinality results in high selectivity.
Searching materialized views yields similar response times to searching base tables, although
writing the data incurs a small time penalty. When data is written or updated in the graph, the
index information is updated in the MV table along with the graph tables. A consequence of using
a MV table is higher write latencies, but results in lower read latencies for graph
traversals.Search indexes are used when textual, numeric or geospatial indexing are required and rely on DSE Search. Since graph data is stored in DSE database tables, one search core is available per vertex label. For each vertex label that will be indexed with search, all properties must be added to a single search index named search. Because search is implemented with DSE Search, all data types can be indexed. For two indexing options, full text and string, the property key must be defined, as different indexing results. Full text indexing performs tokenization and secondary processing such as case normalization. Full text indexing is useful for queries where partial match of text is required, and lends itself to regular expressing (regEx) searching. String indexing is useful for queries where an exact string is sought and no tokenization is required, similar to Solr faceting. This type of index is best for low selectivity, but lends itself to fuzzy matching for both tokenized and non-tokenized indexing.
Secondary indexing in DSE Graph follows the same rule of thumb as DSE secondary indexing. This type of index is meant for lower cardinality values, or alternatively, for low selectivity values. The number of values for indexing should number in the tens to hundreds at most; for instance, searching by country is a good candidate for secondary indexing. In addition, only equality conditions can be used to match values, and no ordering or range queries on values can be used. If more complex value matching is required, search indexes are the superior choice.
Index type | Use |
---|---|
Materialized view | Most efficient index for high cardinality, high selectivity vertex properties and equality predicates. |
Secondary index | Efficient index for low cardinality, low selectivity vertex properties and equality predicates. |
Search index |
Efficient and versatile index for vertex properties with a wide range of cardinality and selectivity. A search index supports a variety of predicates:
|
Composite index keys are not currently supported in DSE Graph.
Vertex-centric indexing (VCI) overview
g.V().has('person', 'name', 'Julia Child').outE('created').has('createDate', gt(1960-01-01))
Property
indexes are created to index meta-properties.
Property indexes can support both equality and inequality predicates, and are useful in cases
where a range of values must be returned by a query. This example will find all the countries
that Fritz Streiff lived in and order them by the year he started living in the
country:g.V().has('person', 'name', 'Fritz Streiff').properties('country').has('startYear', order().by(decr))
Vertex-centric indexing in DSE Graph is accomplished with materialized views (MVs) for both edge and property indexes, and have the same properties as described above for global indexes.
Indexing best practices
The most important fact to remember is that a search index is the only choice for indexing two
or more properties that define the starting point for a query. Multiple materialized view or
secondary indexes cannot be used for global indexing. For instance, g.V().has('person',
'gender', 'F').has('person', 'country', 'France')
will only use one index, not both, if
the indexes are materialized view or secondary indexes. If a search index is defined, both
properties,country
and gender
, are used. Once the starting
point is defined, a vertex-centric index can be used to narrow the query.
More than one index can be created on the same property, such as creating both a materialized
view (MV) index and a search index on the property amount
. The DSE Graph query
optimizer automatically uses the appropriate index when processing a query; designation of an
index type to use is not a feature. The order of preference that DSE Graph uses is MV index >
secondary index > DSE Search index to ensure best performance. However, choosing the optimal
type of index is key to good performance. For instance, it is important to understand the
limitations of materialized views, and base the number of MV indexes on that understanding. See
. Different index types may be created on different properties as
appropriate, based on the selectivity. In general, secondary indexes in DSE Graph are limited in
usefulness, for the same reasons that constrict their general use in DSE. Materialized view
indexing should be the first choice, unless textual search is required and a search index is
selected.
If a search index is created, be aware that building the index can take time, and that until the index is available, queries that depend on the index can fail. Applications that create schema, immediately followed by data insertion that require search indexes will likely experience errors. Also, queries that use search indexes should be run on DSE Search-enabled nodes in the cluster. Search indexes also require extra resources. Each index allocates a minimum of 256MB of memory by default, and each index will require two physical cores. For a typical 32GB node, 16 search indexes would be a reasonable number to create.
tokenRegex
will display case insensitivity in queries, whether
a search index is used or not.Textual search indexes are by default indexed in both tokenized
(TextField) and non-tokenized (StrField) forms. This means that all textual
predicates (token, tokenPrefix, tokenRegex, eq, neq, regex, prefix) will be usable
with all textual vertex properties indexed. Practically, search indexes should be
created using the asString()
method only in cases where there is
absolutely no use for tokenization and text analysis, such as for inventory
categories (silverware, shoes, clothing). The asText()
method is
used if searching tokenized text, such as long multi-sentence descriptions. The
query optimizer will choose whether to use analyzed or non-analyzed indexing based
on the textual predicate used.
It is possible to modify search index schema to change search characteristics. Although DSE Graph will not overwrite these out-of-band changes, it is recommended that you do not add or remove fields in this manner - only DSE Graph commands should be used. The general use of this feature is mainly to change the behavior of a search, such as adding case sensitivity to a type of search.