Examine SAI column index and query rules

Understand the columns on which you can define SAI indexes and run queries.

Let's walk through another example to understand the columns on which you can define SAI indexes and subsequently run queries that reference those columns.

While this guide's prior topic, SAI quick start, focused on defining multiple indexes based on non-primary key columns (a very useful feature), this topic expands the discussion to SAI indexes on each column in a composite partition key. Then this topic adds an example using collection maps.

Tip: DataStax Astra cloud users: read this topic to understand the column and query rules. If you specified a different keyspace name for your Astra database, use it instead of the keyspace name shown in examples here.

Also refer to the Better Botz sample app and its SAI indexing examples in the Astra documentation.

About composite partition keys

SAI makes it possible to define an index using a single column in the table's composite partition key. A composite partition key means that the partition is made up of multiple columns. In this case with an SAI index, you would specify just one of the columns that comprises the composite partition key, as shown in this topic. You could also define a separate SAI index that specifies another single column from the composite partition key.

Without an SAI index defined, in your query, the WHERE clause would need to specify all the partition key columns or use ALLOW FILTERING. This type of query can result in unpredictable performance. SAI solves this problem – see the examples below.

Create the keyspace and table, and insert simple data

First, create the demo2 keyspace:
CREATE KEYSPACE demo2 WITH replication = 
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
Define a database table where the composite partition key is made from id and age. We also define a name column cluster key:
CREATE TABLE demo2.person_id_name_compositkey2 
       (id int, age int, name text, primary key ((id, age), name));
Add some data:
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (1, 'Rocco', 21); 
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (2, 'John', 50); 
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (3, 'Bryn', 33); 
INSERT INTO demo2.person_id_name_compositkey2 (id, name, age) VALUES (4, 'Jason', 34); 
Important: Notice that at this point, there is no SAI index defined.

Queries before SAI indexes are defined

Without an SAI index, a user cannot query on the partition key columns individually. Example:
SELECT * from demo2.person_id_name_compositkey2 where id > 0;
InvalidRequest: Error from server: code=2200 
[Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. 
If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Define SAI indexes

Instead of using ALLOW FILTERING in your queries, now define SAI indexes. Examples:
CREATE CUSTOM INDEX ON demo2.person_id_name_compositkey2 (id) 
       USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON demo2.person_id_name_compositkey2 (age) 
       USING 'StorageAttachedIndex';

Run the example query again

With the SAI indexes defined, we can run the prior query example again, this time with desired results.
SELECT * from demo2.person_id_name_compositkey2 where id > 0;
 id | age | name
----+-----+-------
  3 |  33 |  Bryn
  1 |  21 |  Rocco
  2 |  50 |  John
  4 |  34 |  Jason
  
(4 rows)
(Try another query that specifies the composite partition's age column in the query.)

What about SAI and a partition key with only one column?

When there is only one column defining the partition key, there is no need for an index. SAI returns an error message. Example:
CREATE CUSTOM INDEX ON demo2.person_id_name_primarykey (id) 
       USING 'StorageAttachedIndex';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot create secondary index on the only partition key column id"

Using collections with SAI

SAI supports collections, which enable you to group and store data together in a column.
Tip: Starting in DSE 6.8.4, you have the option of creating multiple SAI indexes using collection maps on the same column, as shown in the examples below.

In a relational database, a grouping such as a user's multiple email addresses is achieved via many-to-one joined relationship between (for example) a user table and an email table. DSE avoids joins between two tables by storing the user's email addresses in a collection column in the user table. Each collection specifies the data type of the data held.

A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns.

See the examples in the next section. Also refer to CREATE CUSTOM INDEX.

SAI collection map examples with keys, values, and entries

These examples demonstrate using collection maps with multiple types in SAI indexes. For related information, see Creating collections and Using map type.

First, create the keyspace:
CREATE KEYSPACE demo3 WITH replication = 
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
USE demo3;
Create an audit table, with a collection map named text_map.
CREATE TABLE demo3.audit ( id int PRIMARY KEY , text_map map<text, text>);
Create multiple SAI collection maps on the same column, each using keys, values, and entries.
CREATE CUSTOM INDEX ON demo3.audit (keys(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON demo3.audit (values(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON demo3.audit (entries(text_map)) USING 'StorageAttachedIndex';
Insert some data:
INSERT INTO demo3.audit (id, text_map) values (1, {'Carlos':'Perotti', 'Marcel':'Silva'});
INSERT INTO demo3.audit (id, text_map) values (2, {'Giovani':'Pasi', 'Frances':'Giardello'});
INSERT INTO demo3.audit (id, text_map) values (3, {'Mark':'Pastore', 'Irene':'Cantona'});
Query all example:
SELECT * FROM demo3.audit;
 id | text_map
----+---------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(3 rows)
Query example that uses the SAI index with collection map type entries:
SELECT * FROM demo3.audit WHERE text_map['Irene'] = 'Cantona' AND text_map['Mark'] = 'Pastore';
 id | text_map
----+-----------------------------------------
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(1 rows)
Query example with CONTAINS KEY that uses the SAI with collection map type keys:
SELECT * FROM demo3.audit WHERE text_map CONTAINS KEY 'Giovani';
 id | text_map
----+---------------------------------------------
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}

(1 rows)
Query example with CONTAINS that uses the SAI index with map type values:
SELECT * FROM demo3.audit WHERE text_map CONTAINS 'Silva';
 id | text_map
----+------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}

(1 rows)

What's next?

The next topic looks at configuration options for SAI indexes.