Creating index schema
All index schema is based on previously created properties and vertex labels and added to existing schema with add()
.
Prerequisites
Create vertex label schema.
Procedure
indexFor
-
Determine an index for a given query using
indexFor
:
schema.indexFor(g.V().has('person', 'name', 'Julia CHILD')).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('person').materializedView('person_by_name').ifNotExists().partitionBy('name').clusterBy('person_id', Asc).create()
The partition key of the materialized view (MV) index will be the property name
for which the index is built, while the clustering key is the base table’s partition key person_id
.
To name the index differently, create the index manually and change the value of the materializedView
step.
-
Automatically create a recommended index for a particular query using
indexFor
:
schema.indexFor(g.V().has('person', 'name', 'Julia CHILD')).apply()
==>Creating the following indexes:
schema.vertexLabel('person').materializedView('person_by_name').ifNotExists().partitionBy('name').clusterBy('person_id', Asc).create()
OK
+
The partition key of the MV index will be the property name
for which the index is built, while the clustering key is the base table’s partition key person_id
.
If you wish to name the index differently, you could create the index manually, changing the value of the materializedView
step.
-
Determine an index for the given query involving edges using
indexFor
:schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('stars', 5)).analyze()
==>Traversal requires that the following indexes are created: schema.edgeLabel('reviewed'). from('person').to('recipe'). materializedView('person__reviewed__recipe_by_person_person_id_stars'). ifNotExists(). partitionBy(OUT, 'person_id'). partitionBy('stars'). clusterBy(IN, 'recipe_id', Asc). create()
This analysis looks for all
reviewed
edges fromperson
torecipe
that have astars
rating of exactly 5. The partition key of the MV index will be the outgoing vertex label’s propertyperson_id
and the propertystars
for which the index is built, while the clustering key is the incoming vertex label’s partition keyrecipe_id
. If you wish to name the index differently, you could create the index manually, changing the value of thematerializedView
step. -
Determine an index for a particular query that examines a CQL collection using `indexFor`:
schema.indexFor(g.V().has('recipe', 'cuisine', contains('French')).values('name')).analyze()
==>Traversal requires that the following indexes are created: schema.vertexLabel('recipe').secondaryIndex('recipe_2i_by_cuisine').ifNotExists().by('cuisine').indexValues().create()
This index analyzes the
cuisine
set in therecipe
vertex label, so that queries can narrow the results to particular set values with thecontains
step and recommends a secondary index. -
Determine an index for a particular query that uses search predicates using
indexFor
:schema.indexFor(g.V().has('recipe', 'instructions', token('Saute'))).analyze()
==>Traversal requires that the following indexes are created: schema.vertexLabel('recipe').searchIndex().ifNotExists().by('instructions').create()
This analysis creates a search index that can look for a tokenized word
Saute
in theinsructions
property of the vertex labelrecipe
. Search indexes can index more than one property, and only one search index can be created for each vertex or edge label. -
Determine an index for a particular query that uses geospatial predicates using
indexFor
:schema.indexFor(g.V().hasLabel('location').has('geo_point', Geo.inside(Geo.point(-110,30),20, Geo.Unit.DEGREES)).values('name')).analyze()
==>Traversal requires that the following indexes are created: schema.vertexLabel('location').searchIndex().ifNotExists().by('geo_point').create()
All geospatial queries must use a search index, if the exact partition key is not used to search for the geospatial item.
Materialized views
-
Create a vertex label materialized view index manually for a property:
schema.vertexLabel('person').
materializedView('person_by_name').
ifNotExists().
partitionBy('name').
create()
Identify the vertex label and partition key using the property key.
In the materializedView()
step, name the index.
The created index schema can be examined with schema.vertexLabels().describe()
.
-
Create an edge label materialized view index manually for a property:
schema.edgeLabel('reviewed').
from('person').to('recipe').
materializedView('person__reviewed__recipe_by_person_person_id_year').
ifNotExists().
partitionBy(OUT, 'person_id').
clusterBy('year', Asc).
clusterBy(IN, 'recipe_id', Asc).
create()
Note that edge indexes can be somewhat tricky to create manually, and that analyzing the queries based on indexes may be easier.
This index is created for the query schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('stars', 5)).analyze()
shown in the indexFor()
section.
Secondary indexes
-
Create a vertex label secondary index manually for a multi-values property:
schema.vertexLabel('person').
secondaryIndex('person_2i_by_nickname').
ifNotExists().
by('nickname').
indexValues().
create()
The differences between a materialized view index and a secondary index are the use of by()
instead of partionBy()
and additional steps like indexValues()
.
Since secondary indexes are used for collections, the by()
step identifies the collection property name.
The index options for the collection like indexValues()
used here are detailed in Indexing.
-
Create a secondary index manually for a replacement meta-property:
schema.vertexLabel('person').
secondaryIndex('person_2i_by_badge').
ifNotExists().
by('badge').
indexKeys().
create()
This secondary index indexes the map badge
by the property keys using indexKeys()
.
A query might be g.V().has('person', 'badge', containsKey('gold')).values('badge')
whici will return the gold
badges along with the date at which the badge was earned:
==>{gold=2017-01-01, silver=2016-01-01}
Search indexes
-
Create a vertex label search index manually:
schema.vertexLabel('recipe').
searchIndex().
ifNotExists().
by('instructions').asText().
by('name').
by('cuisine').
waitForIndex(30).
create()
If no option is specified like with name
and cuisine
, the property will be indexed both asText()
and asString()
.
If you wish to have a search index with a smaller storage footprint, and only need one type of indexing, specify the appropriate type as shown for instructions
.
Search indexes are created for both full text and string searches by default, but properties can be designated with either option using asText
or asString
, respectively.
Textual search indexes are by default indexed in both tokenized (TextField) and non-tokenized (StrField) forms.
This means that all textual predicates (token, tokenPrefix, tokenRegex, eq, neq, regex, prefix) will be usable with all textual vertex or edge properties indexed.
Practically, search indexes should be created using the asString()
method only in cases where there is absolutely no use for tokenization and text analysis, such as for inventory categories (silverware, shoes, clothing).
The asText()
method is used if searching tokenized text, such as long multi-sentence descriptions.
The query optimizer will choose whether to use analyzed or non-analyzed indexing based on the textual predicate used.
Only one search index can be created per vertex label. |
-
Create an edge label search index manually:
schema.edgeLabel('reviewed').
from('person').to('recipe').
searchIndex().
ifNotExists().
by('comment').
create()
In this search index creation, the property comment
will be indexed both as fulltext and string, and all query predicates can be used.
Non-text properties are also created without asText()
or asString()
as needed.
-
Create a geospatial search index manually:
schema.vertexLabel('location').
searchIndex().
ifNotExists().
by('geo_point').
create()
In this example, the property geo_point is a point defining a longitude and latitude.
The search index includes geo_point without a qualifying asText()
or asString()
method.
Example
The indexes used for the DataStax Graph QuickStart example used throughout the documentation:
// MATERIALIZED VIEW INDEX FOR A VERTEX LABEL
// for predicates that are not search-specific or specific to CQL collections
// schema.indexFor(g.V().has('person', 'name', 'Julia CHILD')).analyze()
// tag::MVIndexPerson[]
schema.vertexLabel('person').
materializedView('person_by_name').
ifNotExists().
partitionBy('name').
create()
// end::MVIndexPerson[]
// tag::vertexMVindex[]
schema.vertexLabel('meal').
materializedView('meal_by_type').
ifNotExists().
partitionBy('type').
waitForIndex().
create()
schema.vertexLabel('ingredient').
materializedView('ingredient_by_name').
ifNotExists().
partitionBy('name').
create()
schema.vertexLabel('location').
materializedView('location_by_name').
ifNotExists().
partitionBy('name').
clusterBy('loc_id', Asc).
create()
schema.vertexLabel('meal_item').
materializedView('meal_item_by_name').
ifNotExists().
partitionBy('name').
clusterBy('item_id', Asc).
create()
schema.vertexLabel('recipe').
materializedView('recipe_by_name').
ifNotExists().
partitionBy('name').
clusterBy('recipe_id', Asc).
create()
// end::vertexMVindex[]
// MATERIALIZED VIEW INDEX FOR AN EDGE LABEL
// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('stars', 5)).analyze()
schema.edgeLabel('reviewed').
from('person').to('recipe').
materializedView('person__reviewed__recipe_by_person_person_id_stars').
ifNotExists().
partitionBy(OUT, 'person_id').
partitionBy('stars').
clusterBy(IN, 'recipe_id', Asc).
create()
// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('year', gt('2020-12-01' as LocalDate))).analyze()
// tag::edgeMVindex[]
schema.edgeLabel('reviewed').
from('person').to('recipe').
materializedView('person__reviewed__recipe_by_person_person_id_year').
ifNotExists().
partitionBy(OUT, 'person_id').
clusterBy('year', Asc).
clusterBy(IN, 'recipe_id', Asc).
create()
// end::edgeMVindex[]
// SECONDARY INDEX
// for specific predicates used with collections (set/list/map)
// contains(x), containsKey(x), containsValue(x), entryEq(x, y))
//schema.indexFor(g.V().has('recipe', 'cuisine', contains('French')).values('name')).analyze()
schema.vertexLabel('recipe').
secondaryIndex('recipe_2i_by_cuisine').
ifNotExists().
by('cuisine').
indexValues().
create()
// This is a replacement for a search index in 6.7 and earlier using a multi-property
// Since the multi-prop is now a set, a 2i index is used
// schema.indexFor(g.V().has('person', 'nickname', contains('Simone'))).analyze()
// tag::secIndexPerson[]
schema.vertexLabel('person').
secondaryIndex('person_2i_by_nickname').
ifNotExists().
by('nickname').
indexValues().
create()
// end::secIndexPerson[]
// This is a replacement for a property index in 6.7 and earlier using a meta-property
// This one works: g.V().has('person', 'country.field1', 'France').values()
// This one doesn't work: g.V().has('person', 'country.field2', '1960-01-01' as LocalDate)
// schema.indexFor(g.V().has('person', 'country.field2', '1960-01-01' as LocalDate)).analyze()
// tag::secIndex[]
schema.vertexLabel('person').
secondaryIndex('person_2i_by_country').
ifNotExists().
by('country').
indexValues().
create()
// end::secIndex[]
// schema.indexFor(g.V().has('person', 'badge', containsKey('gold')).values('badge')).analyze()
// tag::metaSecIndex[]
schema.vertexLabel('person').
secondaryIndex('person_2i_by_badge').
ifNotExists().
by('badge').
indexKeys().
create()
// end::metaSecIndex[]
// tag::edgeSecIndex[]
schema.edgeLabel('is_stocked_with').
from('store').to('ingredient').
secondaryIndex('store_is_stocked_with_ingredient_by_store_store_id_expire_date').
ifNotExists().
partitionBy(OUT, 'store_id').
clusterBy('expire_date', Asc).
clusterBy(IN, 'ingred_id', Asc).
create()
// end::edgeSecIndex[]
// SEARCH INDEX
// asString(): non-tokenized: regex, prefix, eq, neq, fuzzy, phrase
// asText(): tokenized: tokenRegex, tokenPrefix, token, tokenFuzzy
// if not specified, both asString() and asText() are created
// Geospatial: Geo.inside, Geo.insideCartesian, neq, without
// schema.indexFor(g.V().has('recipe', 'instructions', token('Saute'))).analyze()
// tag::searchIndexRecipe[]
schema.vertexLabel('recipe').
searchIndex().
ifNotExists().
by('instructions').asText().
by('name').
by('cuisine').
waitForIndex(30).
create()
// end::searchIndexRecipe[]
// tag::searchIndex[]
// schema.indexFor(g.V().has('book', 'publish_year', neq(1960))).analyze()
// schema.indexFor(g.V().has('book', 'publish_year', eq(1961))).analyze()
schema.vertexLabel('book').
searchIndex().
ifNotExists().
by('name').
by('publish_year').
create()
schema.vertexLabel('store').
searchIndex().
ifNotExists().
by('name').
create()
schema.vertexLabel('home').
searchIndex().
ifNotExists().
by('name').
create()
schema.vertexLabel('fridge_sensor').
searchIndex().
ifNotExists().
by('city_id').
by('sensor_id').
by('name').
create()
// end::searchIndex[]
// Will return only one record, the one that STARTS with Yummy
// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('comment', prefix('Yummy'))).analyze()
// Will return two records, each which has the word Yummy somewhere in the comments
// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('comment', token('Yummy'))).analyze()
// tag::edgeSearchIndex[]
schema.edgeLabel('reviewed').
from('person').to('recipe').
searchIndex().
ifNotExists().
by('comment').
create()
// end::edgeSearchIndex[]
// schema.indexFor(g.V().hasLabel('location').has('geo_point', Geo.inside(Geo.point(-110,30),20, Geo.Unit.DEGREES)).values('name')).analyze()
// tag::geoSearchIndex[]
schema.vertexLabel('location').
searchIndex().
ifNotExists().
by('geo_point').
create()
// end::geoSearchIndex[]