Examine SAI column index and query rules

Understand the columns on which you can define SAI indexes and run queries.

Let's walk through another example to understand the columns on which you can define SAI indexes and subsequently run queries that reference those columns.

While this guide's prior topic, SAI quick start, focused on defining multiple indexes based on non-primary key columns (a very useful feature), this topic expands the discussion to SAI indexes on each column in a composite partition key. Then this topic adds an example using collection maps.

Tip: DataStax Astra cloud users: read this topic to understand the column and query rules. If you specified a different keyspace name for your Astra database, use it instead of the keyspace name shown in examples here.

Also refer to the Better Botz sample app and its SAI indexing examples in the Astra documentation.

About composite partition keys

SAI makes it possible to define an index using a single column in the table's composite partition key. A composite partition key means that the partition is made up of multiple columns. In this case with an SAI index, you would specify just one of the columns that comprises the composite partition key, as shown in this topic. You could also define a separate SAI index that specifies another single column from the composite partition key.

Without an SAI index defined, in your query, the WHERE clause would need to specify all the partition key columns or use ALLOW FILTERING. This type of query can result in unpredictable performance. SAI solves this problem – see the examples below.

Create the keyspace and table, and insert simple data

First, create the demo2 keyspace:
CREATE KEYSPACE demo2 WITH replication = 
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
Define a database table where the composite partition key is made from id and age. We also define a name column cluster key:
CREATE TABLE demo2.person_id_name_compositkey2 
       (id int, age int, name text, primary key ((id, age), name));
Add some data:
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (1, 'Rocco', 21); 
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (2, 'John', 50); 
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (3, 'Bryn', 33); 
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (4, 'Jason', 34); 
Important: Notice that at this point, there is no SAI index defined.

Queries before SAI indexes are defined

Without an SAI index, a user cannot query on the partition key columns individually. Example:
SELECT * from demo2.person_id_name_compositkey2 where id > 0;
InvalidRequest: Error from server: code=2200 
[Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. 
If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Define SAI indexes

Instead of using ALLOW FILTERING in your queries, now define SAI indexes. Examples:
CREATE CUSTOM INDEX ON demo2.person_id_name_compositkey2 (id) 
       USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON demo2.person_id_name_compositkey2 (age) 
       USING 'StorageAttachedIndex';

Run the example query again

With the SAI indexes defined, we can run the prior query example again, this time with desired results.
SELECT * from demo2.person_id_name_compositkey2 where id > 0;
 id | age | name
----+-----+-------
  3 |  33 |  Bryn
  1 |  21 |  Rocco
  2 |  50 |  John
  4 |  34 |  Jason
  
(4 rows)
(Try another query that specifies the composite partition's age column in the query.)

What about SAI and a partition key with only one column?

When there is only one column defining the partition key, there is no need for an index. SAI returns an error message. Example:
CREATE CUSTOM INDEX ON demo2.person_id_name_primarykey (id) 
       USING 'StorageAttachedIndex';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot create secondary index on the only partition key column id"

Using collections with SAI

SAI supports collections of type map, list, and set. Using these features with SAI indexes was introduced in DSE 6.8.3. Collections allow you to group and store data together in a column.
Tip: Starting in DSE 6.8.5, you have the option of creating multiple SAI collection indexes with different map types on the same column.

In a relational database, a grouping such as a user's multiple email addresses is achieved via many-to-one joined relationship between (for example) a user table and an email table. DSE avoids joins between two tables by storing the user's email addresses in a collection column in the user table. Each collection specifies the data type of the data held.

A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns.

See the SAI collection examples in the next two sections: Also refer to CREATE CUSTOM INDEX.
Tip: In CQL queries of database tables with SAI indexes, the CONTAINS clauses are supported with, and specific to:
  • SAI collection maps with keys, values, and entries
  • SAI collections with list and set types
For examples, see the sections below.

SAI collection map examples with keys, values, and entries

The following examples demonstrate using collection maps of multiple types (keys, values, entries) in SAI indexes. For related information, see Creating collections and Using map type.

First, create the keyspace.
Note: If you are logged into the DataStax Astra CQL Console, you already have a keyspace defined for your cloud database; skip this next step and specify your keyspace name in the USE command.
CREATE KEYSPACE demo3 WITH replication = 
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
USE demo3;
Create an audit table, with a collection map named text_map.
CREATE TABLE audit ( id int PRIMARY KEY , text_map map<text, text>);
Create multiple SAI collection maps on the same column, each using keys, values, and entries.
Note: Creating multiple SAI collection indexes with different map types on the same column requires DSE 6.8.4 or later. If you're using DSE 6.8.3, Submit a DROP INDEX index-name; command before adding the next map type on the same column.
CREATE CUSTOM INDEX ON audit (keys(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON audit (values(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON audit (entries(text_map)) USING 'StorageAttachedIndex';
Insert some data:
INSERT INTO audit (id, text_map) values (1, {'Carlos':'Perotti', 'Marcel':'Silva'});
INSERT INTO audit (id, text_map) values (2, {'Giovani':'Pasi', 'Frances':'Giardello'});
INSERT INTO audit (id, text_map) values (3, {'Mark':'Pastore', 'Irene':'Cantona'});
Query all example:
SELECT * FROM audit;
 id | text_map
----+---------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(3 rows)
Query example with CONTAINS KEY that uses the SAI with collection map type keys:
SELECT * FROM audit WHERE text_map CONTAINS KEY 'Giovani';
 id | text_map
----+---------------------------------------------
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}

(1 rows)
Query example that uses the SAI index with collection map type entries:
SELECT * FROM audit WHERE text_map['Irene'] = 'Cantona' AND text_map['Mark'] = 'Pastore';
 id | text_map
----+-----------------------------------------
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(1 rows)
Query example with CONTAINS that uses the SAI index with map type values:
SELECT * FROM audit WHERE text_map CONTAINS 'Silva';
 id | text_map
----+------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}

(1 rows)

SAI collection examples with list and set types

These examples demonstrate using collections with the list and set types in SAI indexes. For related information, see:
If you have not already, create the keyspace.
Note: If you are logged into the DataStax Astra CQL Console, you already have a keyspace defined for your cloud database; skip this next step and specify your keyspace name in the USE command.
CREATE KEYSPACE IF NOT EXISTS demo3 WITH replication = 
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
USE demo3;

Using the list type

Create a calendar table with a collection of type list.
CREATE TABLE calendar (key int PRIMARY KEY, years list<int>);
Create an SAI index using the collection's years column.
CREATE CUSTOM INDEX ON calendar(years) USING 'StorageAttachedIndex';
Insert some random int list data for years, just for demo purposes.
Tip: Notice the INSERT command's square brackets syntax for list values.
INSERT INTO calendar (key, years) VALUES (0, [1990,1996]);
INSERT INTO calendar (key, years) VALUES (1, [2000,2010]);
INSERT INTO calendar (key, years) VALUES (2, [2001,1990]);
Query with CONTAINS example from the list:
SELECT * FROM calendar  WHERE years CONTAINS 1990;
 key | years
-----+--------------
   0 | [1990, 1996]
   2 | [2001, 1990]
(2 rows)

This example created the calendar table with years list<int>. Of course, you could have created the table with years list<text>, for example, inserted 'string' values, and queried on the strings.

Using the set type

Now create a calendar2 table with a collection of type set.
CREATE TABLE calendar2 (key int PRIMARY KEY, years set<int>);
Create an SAI index using the collection's years column – this time for the calendar2 table.
CREATE CUSTOM INDEX ON calendar2(years) USING 'StorageAttachedIndex';
Insert some random int set data for years, again just for demo purposes.
Tip: Notice the INSERT command's curly braces syntax for set values.
INSERT INTO calendar2 (key, years) VALUES (0, {1990,1996});
INSERT INTO calendar2 (key, years) VALUES (1, {2000,2010});
INSERT INTO calendar2 (key, years) VALUES (2, {2001,1990,2020});
Query with CONTAINS example from the list:
SELECT * FROM calendar2  WHERE years CONTAINS 1990;
 key | years
-----+--------------------
   0 |       {1990, 1996}
   2 | {1990, 2001, 2020}

(2 rows)

What's next?

The next topic looks at configuration options for SAI indexes.