Creating index schema

All index schema is based on previously created properties and vertex labels and added to existing schema with add().

Procedure

indexFor

  • Determine an index for a given query using indexFor:

schema.indexFor(g.V().has('person', 'name', 'Julia CHILD')).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('person').materializedView('person_by_name').ifNotExists().partitionBy('name').clusterBy('person_id', Asc).create()

The partition key of the materialized view (MV) index will be the property name for which the index is built, while the clustering key is the base table’s partition key person_id. To name the index differently, create the index manually and change the value of the materializedView step.

  • Automatically create a recommended index for a particular query using indexFor:

schema.indexFor(g.V().has('person', 'name', 'Julia CHILD')).apply()
==>Creating the following indexes:
schema.vertexLabel('person').materializedView('person_by_name').ifNotExists().partitionBy('name').clusterBy('person_id', Asc).create()
OK

+ The partition key of the MV index will be the property name for which the index is built, while the clustering key is the base table’s partition key person_id. If you wish to name the index differently, you could create the index manually, changing the value of the materializedView step.

  • Determine an index for the given query involving edges using indexFor:

    schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('stars', 5)).analyze()
    ==>Traversal requires that the following indexes are created:
    schema.edgeLabel('reviewed').
      from('person').to('recipe').
      materializedView('person__reviewed__recipe_by_person_person_id_stars').
      ifNotExists().
      partitionBy(OUT, 'person_id').
      partitionBy('stars').
      clusterBy(IN, 'recipe_id', Asc).
      create()

    This analysis looks for all reviewed edges from person to recipe that have a stars rating of exactly 5. The partition key of the MV index will be the outgoing vertex label’s property person_id and the property stars for which the index is built, while the clustering key is the incoming vertex label’s partition key recipe_id. If you wish to name the index differently, you could create the index manually, changing the value of the materializedView step.

  • Determine an index for a particular query that examines a CQL collection using `indexFor`:

    schema.indexFor(g.V().has('recipe', 'cuisine', contains('French')).values('name')).analyze()
    ==>Traversal requires that the following indexes are created:
    schema.vertexLabel('recipe').secondaryIndex('recipe_2i_by_cuisine').ifNotExists().by('cuisine').indexValues().create()

    This index analyzes the cuisine set in the recipe vertex label, so that queries can narrow the results to particular set values with the contains step and recommends a secondary index.

  • Determine an index for a particular query that uses search predicates using indexFor:

    schema.indexFor(g.V().has('recipe', 'instructions', token('Saute'))).analyze()
    ==>Traversal requires that the following indexes are created:
    schema.vertexLabel('recipe').searchIndex().ifNotExists().by('instructions').create()

    This analysis creates a search index that can look for a tokenized word Saute in the insructions property of the vertex label recipe. Search indexes can index more than one property, and only one search index can be created for each vertex or edge label.

  • Determine an index for a particular query that uses geospatial predicates using indexFor:

    schema.indexFor(g.V().hasLabel('location').has('geo_point', Geo.inside(Geo.point(-110,30),20, Geo.Unit.DEGREES)).values('name')).analyze()
    ==>Traversal requires that the following indexes are created:
    schema.vertexLabel('location').searchIndex().ifNotExists().by('geo_point').create()

    All geospatial queries must use a search index, if the exact partition key is not used to search for the geospatial item.

Materialized views

  • Create a vertex label materialized view index manually for a property:

schema.vertexLabel('person').
  materializedView('person_by_name').
  ifNotExists().
  partitionBy('name').
  create()

Identify the vertex label and partition key using the property key. In the materializedView() step, name the index. The created index schema can be examined with schema.vertexLabels().describe().

  • Create an edge label materialized view index manually for a property:

schema.edgeLabel('reviewed').
  from('person').to('recipe').
  materializedView('person__reviewed__recipe_by_person_person_id_year').
  ifNotExists().
  partitionBy(OUT, 'person_id').
  clusterBy('year', Asc).
  clusterBy(IN, 'recipe_id', Asc).
  create()

Note that edge indexes can be somewhat tricky to create manually, and that analyzing the queries based on indexes may be easier. This index is created for the query schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('stars', 5)).analyze() shown in the indexFor() section.

Secondary indexes

  • Create a vertex label secondary index manually for a multi-values property:

schema.vertexLabel('person').
  secondaryIndex('person_2i_by_nickname').
  ifNotExists().
  by('nickname').
  indexValues().
  create()

The differences between a materialized view index and a secondary index are the use of by() instead of partionBy() and additional steps like indexValues(). Since secondary indexes are used for collections, the by() step identifies the collection property name. The index options for the collection like indexValues() used here are detailed in Indexing.

  • Create a secondary index manually for a replacement meta-property:

schema.vertexLabel('person').
  secondaryIndex('person_2i_by_badge').
  ifNotExists().
  by('badge').
  indexKeys().
  create()

This secondary index indexes the map badge by the property keys using indexKeys(). A query might be g.V().has('person', 'badge', containsKey('gold')).values('badge') whici will return the goldbadges along with the date at which the badge was earned:

==>{gold=2017-01-01, silver=2016-01-01}

Search indexes

  • Create a vertex label search index manually:

schema.vertexLabel('recipe').
  searchIndex().
  ifNotExists().
  by('instructions').asText().
  by('name').
  by('cuisine').
  waitForIndex(30).
  create()

If no option is specified like with name and cuisine, the property will be indexed both asText() and asString(). If you wish to have a search index with a smaller storage footprint, and only need one type of indexing, specify the appropriate type as shown for instructions.

Search indexes are created for both full text and string searches by default, but properties can be designated with either option using asText or asString, respectively. Textual search indexes are by default indexed in both tokenized (TextField) and non-tokenized (StrField) forms. This means that all textual predicates (token, tokenPrefix, tokenRegex, eq, neq, regex, prefix) will be usable with all textual vertex or edge properties indexed. Practically, search indexes should be created using the asString() method only in cases where there is absolutely no use for tokenization and text analysis, such as for inventory categories (silverware, shoes, clothing). The asText() method is used if searching tokenized text, such as long multi-sentence descriptions. The query optimizer will choose whether to use analyzed or non-analyzed indexing based on the textual predicate used.

Only one search index can be created per vertex label.

  • Create an edge label search index manually:

schema.edgeLabel('reviewed').
    from('person').to('recipe').
    searchIndex().
    ifNotExists().
    by('comment').
    create()

In this search index creation, the property comment will be indexed both as fulltext and string, and all query predicates can be used. Non-text properties are also created without asText() or asString() as needed.

  • Create a geospatial search index manually:

schema.vertexLabel('location').
  searchIndex().
  ifNotExists().
  by('geo_point').
  create()

In this example, the property geo_point is a point defining a longitude and latitude. The search index includes geo_point without a qualifying asText() or asString() method.

Example

The indexes used for the DataStax Graph QuickStart example used throughout the documentation:

// MATERIALIZED VIEW INDEX FOR A VERTEX LABEL
// for predicates that are not search-specific or specific to CQL collections

// schema.indexFor(g.V().has('person', 'name', 'Julia CHILD')).analyze()
// tag::MVIndexPerson[]
schema.vertexLabel('person').
  materializedView('person_by_name').
  ifNotExists().
  partitionBy('name').
  create()
// end::MVIndexPerson[]

// tag::vertexMVindex[]
schema.vertexLabel('meal').
  materializedView('meal_by_type').
  ifNotExists().
  partitionBy('type').
  waitForIndex().
  create()

schema.vertexLabel('ingredient').
  materializedView('ingredient_by_name').
  ifNotExists().
  partitionBy('name').
  create()

schema.vertexLabel('location').
  materializedView('location_by_name').
  ifNotExists().
  partitionBy('name').
  clusterBy('loc_id', Asc).
  create()

schema.vertexLabel('meal_item').
  materializedView('meal_item_by_name').
  ifNotExists().
  partitionBy('name').
  clusterBy('item_id', Asc).
  create()

schema.vertexLabel('recipe').
  materializedView('recipe_by_name').
  ifNotExists().
  partitionBy('name').
  clusterBy('recipe_id', Asc).
  create()
// end::vertexMVindex[]

// MATERIALIZED VIEW INDEX FOR AN EDGE LABEL

// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('stars', 5)).analyze()
schema.edgeLabel('reviewed').
  from('person').to('recipe').
  materializedView('person__reviewed__recipe_by_person_person_id_stars').
  ifNotExists().
  partitionBy(OUT, 'person_id').
  partitionBy('stars').
  clusterBy(IN, 'recipe_id', Asc).
  create()

// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('year', gt('2020-12-01' as LocalDate))).analyze()
// tag::edgeMVindex[]
schema.edgeLabel('reviewed').
  from('person').to('recipe').
  materializedView('person__reviewed__recipe_by_person_person_id_year').
  ifNotExists().
  partitionBy(OUT, 'person_id').
  clusterBy('year', Asc).
  clusterBy(IN, 'recipe_id', Asc).
  create()
// end::edgeMVindex[]

// SECONDARY INDEX
// for specific predicates used with collections (set/list/map)
// contains(x), containsKey(x), containsValue(x), entryEq(x, y))

//schema.indexFor(g.V().has('recipe', 'cuisine', contains('French')).values('name')).analyze()
schema.vertexLabel('recipe').
  secondaryIndex('recipe_2i_by_cuisine').
  ifNotExists().
  by('cuisine').
  indexValues().
  create()

// This is a replacement for a search index in 6.7 and earlier using a multi-property
// Since the multi-prop is now a set, a 2i index is used
// schema.indexFor(g.V().has('person', 'nickname', contains('Simone'))).analyze()
// tag::secIndexPerson[]
schema.vertexLabel('person').
  secondaryIndex('person_2i_by_nickname').
  ifNotExists().
  by('nickname').
  indexValues().
  create()
// end::secIndexPerson[]

// This is a replacement for a property index in 6.7 and earlier using a meta-property
// This one works: g.V().has('person', 'country.field1', 'France').values()
// This one doesn't work: g.V().has('person', 'country.field2', '1960-01-01' as LocalDate)

// schema.indexFor(g.V().has('person', 'country.field2', '1960-01-01' as LocalDate)).analyze()
// tag::secIndex[]
schema.vertexLabel('person').
  secondaryIndex('person_2i_by_country').
  ifNotExists().
  by('country').
  indexValues().
  create()
// end::secIndex[]

// schema.indexFor(g.V().has('person', 'badge', containsKey('gold')).values('badge')).analyze()
// tag::metaSecIndex[]
schema.vertexLabel('person').
  secondaryIndex('person_2i_by_badge').
  ifNotExists().
  by('badge').
  indexKeys().
  create()
// end::metaSecIndex[]

// tag::edgeSecIndex[]
schema.edgeLabel('is_stocked_with').
  from('store').to('ingredient').
  secondaryIndex('store_is_stocked_with_ingredient_by_store_store_id_expire_date').
  ifNotExists().
  partitionBy(OUT, 'store_id').
  clusterBy('expire_date', Asc).
  clusterBy(IN, 'ingred_id', Asc).
  create()
// end::edgeSecIndex[]

// SEARCH INDEX
// asString(): non-tokenized: regex, prefix, eq, neq, fuzzy, phrase
// asText(): tokenized: tokenRegex, tokenPrefix, token, tokenFuzzy
// if not specified, both asString() and asText() are created
// Geospatial: Geo.inside, Geo.insideCartesian, neq, without

// schema.indexFor(g.V().has('recipe', 'instructions', token('Saute'))).analyze()
// tag::searchIndexRecipe[]
schema.vertexLabel('recipe').
  searchIndex().
  ifNotExists().
  by('instructions').asText().
  by('name').
  by('cuisine').
  waitForIndex(30).
  create()
// end::searchIndexRecipe[]

// tag::searchIndex[]
// schema.indexFor(g.V().has('book', 'publish_year', neq(1960))).analyze()
// schema.indexFor(g.V().has('book', 'publish_year', eq(1961))).analyze()
schema.vertexLabel('book').
  searchIndex().
  ifNotExists().
  by('name').
  by('publish_year').  
  create()
schema.vertexLabel('store').
  searchIndex().
  ifNotExists().
  by('name').
  create()

schema.vertexLabel('home').
  searchIndex().
  ifNotExists().
  by('name').
  create()

schema.vertexLabel('fridge_sensor').
  searchIndex().
  ifNotExists().
  by('city_id').
  by('sensor_id').
  by('name').
  create()
// end::searchIndex[]

// Will return only one record, the one that STARTS with Yummy
// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('comment', prefix('Yummy'))).analyze()
// Will return two records, each which has the word Yummy somewhere in the comments
// schema.indexFor(g.V().hasLabel('person').outE('reviewed').has('comment', token('Yummy'))).analyze()
// tag::edgeSearchIndex[]
schema.edgeLabel('reviewed').
    from('person').to('recipe').
    searchIndex().
    ifNotExists().
    by('comment').
    create()
// end::edgeSearchIndex[]

// schema.indexFor(g.V().hasLabel('location').has('geo_point', Geo.inside(Geo.point(-110,30),20, Geo.Unit.DEGREES)).values('name')).analyze()
// tag::geoSearchIndex[]
schema.vertexLabel('location').
  searchIndex().
  ifNotExists().
  by('geo_point').
  create()
// end::geoSearchIndex[]

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com