SAI indexing with collections

SAI supports collections of type map, list, and set. Collections allow you to group and store data together in a column.

In a relational database, a grouping such as a user’s multiple email addresses is achieved via many-to-one joined relationship between (for example) a user table and an email table. DSE avoids joins between two tables by storing the user’s email addresses in a collection column in the user table. Each collection specifies the data type of the data held.

A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns.

In CQL queries of database tables with SAI indexes, the CONTAINS clauses are supported with, and specific to:

  • SAI collection maps with keys, values, and entries

  • SAI collections with list and set types

Using the set type

This example uses the following table and index:

CREATE CUSTOM INDEX IF NOT EXISTS teams_idx ON cycling.cyclist_career_teams (teams) USING 'StorageAttachedIndex';

An index is created on the set column teams in the cyclist_career_teams table.

Query with CONTAINS from the set teams column:

SELECT lastname, teams FROM cycling.cyclist_career_teams 
  WHERE teams CONTAINS 'Rabobank-Liv Giant';
Results
 lastname | teams
----------+----------------------------------------------------------------------------------------------------
    BRAND | {'AA Drink - Leontien.nl', 'Leontien.nl', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}

(1 rows)

Using the list type

This example uses the following table and index:

CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
  year int,
  month int,
  events list<text>,
  PRIMARY KEY (year, month)
);
CREATE CUSTOM INDEX events_idx ON cycling.upcoming_calendar (events) 
  USING 'StorageAttachedIndex';

An index is created on the list column events in the upcoming_calendar table.

Query with CONTAINS from the list events column:

SELECT * FROM cycling.upcoming_calendar 
  WHERE events CONTAINS 'Criterium du Dauphine';
Results
 year | month | events
------+-------+----------------------------------------------------------------------------
 2015 |     6 | ['Tour de France', 'Criterium du Dauphine', 'Vuelta Ciclista a Venezuela']

(1 rows)

Using the map type

This example uses the following table and indexes:

CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
  id uuid PRIMARY KEY,
  firstname text,
  lastname text,
  teams map<int, text>
);
CREATE CUSTOM INDEX IF NOT EXISTS team_keys_idx
  ON cycling.cyclist_teams ( KEYS (teams) ) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX IF NOT EXISTS team_values_idx
  ON cycling.cyclist_teams ( VALUES (teams) ) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX IF NOT EXISTS team_entries_idx
  ON cycling.cyclist_teams ( ENTRIES (teams) ) USING 'StorageAttachedIndex';

Indexes created on the map column teams in the cyclist_career_teams table target the keys, values, and full entries of the column data.

Query with KEYS from the map teams column:

SELECT firstname,lastname,teams FROM cycling.cyclist_teams 
  WHERE teams CONTAINS KEY 2015;
Results
 firstname | lastname   | teams
-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
  Marianne |        VOS |                                                                                                                                   {2015: 'Rabobank-Liv Woman Cycling Team'}

(2 rows)

Query a value from the map teams column, noting that only the keyword CONTAINS is included:

SELECT firstname,lastname,teams FROM cycling.cyclist_teams 
  WHERE teams CONTAINS 'Team Garmin - Cervelo';
Results
 firstname | lastname   | teams
-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Jamie |    BENNETT |                                                                     {2011: 'Team Garmin - Cervelo', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team'}
 Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

(2 rows)

Query entries from the map teams column, noting the difference in the WHERE clause:

SELECT firstname,lastname,teams FROM cycling.cyclist_teams 
  WHERE teams CONTAINS 'Team Garmin - Cervelo';
Results
 firstname | lastname   | teams
-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Jamie |    BENNETT |                                                                     {2011: 'Team Garmin - Cervelo', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team'}
 Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

(2 rows)

Query for two entries from the map teams column:

SELECT firstname,lastname,teams FROM cycling.cyclist_teams
  WHERE teams[2014] = 'Boels:Dolmans Cycling Team'
    AND teams[2015] = 'Boels:Dolmans Cycling Team';
Results
 firstname | lastname   | teams
-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

(1 rows)

Query for either one of two entries from the map teams column:

SELECT firstname,lastname,teams FROM cycling.cyclist_teams
WHERE teams[2014] = 'Boels:Dolmans Cycling Team'
  OR teams[2015] = 'Boels:Dolmans Cycling Team';
Results
 firstname | lastname   | teams
-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Jamie |    BENNETT |                                                                     {2011: 'Team Garmin - Cervelo', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team'}
 Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

(2 rows)

For more information, see:

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com