Query with SAI
This page demonstrates how to run queries on tables with SAI indexes.
To query with SAI, you need a keyspace with a table and one or more SAI indexes.
If you haven’t created an SAI index yet, see the SAI quickstart to learn how to create multiple indexes based on non-primary key columns.
Query operators supported for SAI
SAI only supports |
SAI supports the following query operators for tables with SAI indexes:
-
Numerics:
=
,<
,>
,⇐
,>=
,AND
,IN
,OR
(for vector databases) -
Strings:
=
,AND
,IN
-
Collections:
=
CONTAINS
,CONTAINS KEY
SAI does not supports the following query operators for tables with SAI indexes:
-
Strings or Numerics:
LIKE
,OR
(for non-vector databases)
Vector search with SAI
For an example and explanation of vector search with CQL, see Vector search quickstart.
Single index match on a column
This example uses the following table and indexes:
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE CUSTOM INDEX commenter_idx ON cycling.comments_vs (commenter)
USING 'StorageAttachedIndex'
WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};
CREATE CUSTOM INDEX created_at_idx ON cycling.comments_vs (created_at)
USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX comment_ann_idx ON cycling.comments_vs(comment_vector)
USING 'StorageAttachedIndex';
The column commenter
is not the partition key in this table, so an index is required to query on it.
Query for a match on that column:
SELECT * FROM cycling.comments_vs WHERE commenter = 'Alex';
Results
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+-----------------------------------------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | b'?fff?\\n=q=\\xf5\\xc2\\x8f=\\xcc\\xcc\\xcd?s33' | Alex | f259bc00-38c3-11ef-bd85-f92c3c7170c3
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | b'?}p\\xa4?\\x00\\x00\\x00?}p\\xa4=\\xcc\\xcc\\xcd>\\xae\\x14{' | Alex | f2583561-38c3-11ef-bd85-f92c3c7170c3
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-02-14 20:43:20.000000+0000 | Raining too hard should have postponed | b'>\\xe6ff=\\xb8Q\\xec<#\\xd7\\n>L\\xcc\\xcd=\\xe1G\\xae' | Alex | f2557640-38c3-11ef-bd85-f92c3c7170c3
(3 rows)
Note that a failure will occur if you try this query before creating the index:
SELECT * FROM cycling.comments_vs WHERE commenter = 'Alex';
Results
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+-----------------------------------------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | b'?fff?\\n=q=\\xf5\\xc2\\x8f=\\xcc\\xcc\\xcd?s33' | Alex | f259bc00-38c3-11ef-bd85-f92c3c7170c3
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | b'?}p\\xa4?\\x00\\x00\\x00?}p\\xa4=\\xcc\\xcc\\xcd>\\xae\\x14{' | Alex | f2583561-38c3-11ef-bd85-f92c3c7170c3
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-02-14 20:43:20.000000+0000 | Raining too hard should have postponed | b'>\\xe6ff=\\xb8Q\\xec<#\\xd7\\n>L\\xcc\\xcd=\\xe1G\\xae' | Alex | f2557640-38c3-11ef-bd85-f92c3c7170c3
(3 rows)
Single index match on a column with options
SAI options include case_sensitive
, ascii
, and normalize
.
This example uses the following table and indexes:
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE CUSTOM INDEX commenter_idx ON cycling.comments_vs (commenter)
USING 'StorageAttachedIndex'
WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};
Case-sensitivity
The column commenter
is not the partition key in this table, so an index is required to query on it.
If we want to check commenter
as a case-sensitive value, we can use the case_sensitive
option set to true
.
Note that no results are returned if you use an inappropriately case-sensitive value in the query:
SELECT comment,comment_vector,commenter FROM cycling.comments_vs
WHERE commenter : 'alex';
Results
comment | comment_vector | commenter
-----------------------+-------------------------------------------------------+-----------
Rest stops were great | b'?}p\xa4?\x00\x00\x00?}p\xa4=\xcc\xcc\xcd>\xae\x14{' | alex
(1 rows)
When we switch the case of the cyclist’s name to match the case in the index, the query succeeds:
SELECT comment,comment_vector,commenter FROM cycling.comments_vs
WHERE commenter : 'Alex';
Results
comment | comment_vector | commenter
----------------------------------------+-------------------------------------------------------+-----------
LATE RIDERS SHOULD NOT DELAY THE START | b'?fff?\n=q=\xf5\xc2\x8f=\xcc\xcc\xcd?s33' | Alex
Second rest stop was out of water | b'?}p\xa4?\x00\x00\x00?}p\xa4=\xcc\xcc\xcd>\xae\x14{' | Alex
Raining too hard should have postponed | b'>\xe6ff=\xb8Q\xec<#\xd7\n>L\xcc\xcd=\xe1G\xae' | Alex
LATE RIDERS SHOULD NOT BE ALLOWED | b'?fff?\n=q=\xf5\xc2\x8f=\xcc\xcc\xcd?s33' | alexander
(4 rows)
Multiple index matching with operators
These examples use the following table and indexes. Several indexes are created for the table to demonstrate how to query for matches on more than one column.
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE CUSTOM INDEX commenter_idx ON cycling.comments_vs (commenter)
USING 'StorageAttachedIndex'
WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};
CREATE CUSTOM INDEX created_at_idx ON cycling.comments_vs (created_at)
USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX comment_ann_idx ON cycling.comments_vs(comment_vector)
USING 'StorageAttachedIndex';
-
AND: Query for matches on more than one column where both columns must match:
SELECT * FROM cycling.comments_vs WHERE created_at = '2017-03-21 21:11:09.999' AND commenter : 'Alex';
Results
id | created_at | comment | comment_vector | commenter | record_id --------------------------------------+---------------------------------+-----------------------------------+-----------------------------------------------------------------+-----------+-------------------------------------- e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | b'?}p\\xa4?\\x00\\x00\\x00?}p\\xa4=\\xcc\\xcc\\xcd>\\xae\\x14{' | Alex | f2583561-38c3-11ef-bd85-f92c3c7170c3 (1 rows)
-
IN: Query for matches from a list of values:
SELECT * FROM cycling.comments_vs WHERE created_at IN ('2017-03-21 21:11:09.999','2017-03-22 01:16:59.001');
Results
id | created_at | comment | comment_vector | commenter | record_id --------------------------------------+---------------------------------+-----------------------------------+---------------------------------------------------------------------------------+-----------+-------------------------------------- e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | b'?}p\\xa4?\\x00\\x00\\x00?}p\\xa4=\\xcc\\xcc\\xcd>\\xae\\x14{' | Alex | f2583561-38c3-11ef-bd85-f92c3c7170c3 c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | b'=\\xcc\\xcc\\xcd>\\xcc\\xcc\\xcd=\\xcc\\xcc\\xcd?\\x05\\x1e\\xb8=\\xb8Q\\xec' | Amy | f25d3e71-38c3-11ef-bd85-f92c3c7170c3 (2 rows)
-
OR: Query for a match on either one column or the other:
SELECT * FROM cycling.comments_vs WHERE created_at='2017-03-21 21:11:09.999' OR created_at='2017-03-22 01:16:59.001';
Results
id | created_at | comment | comment_vector | commenter | record_id --------------------------------------+---------------------------------+-----------------------------------+---------------------------------------------------------------------------------+-----------+-------------------------------------- e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | b'?}p\\xa4?\\x00\\x00\\x00?}p\\xa4=\\xcc\\xcc\\xcd>\\xae\\x14{' | Alex | f2583561-38c3-11ef-bd85-f92c3c7170c3 c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | b'=\\xcc\\xcc\\xcd>\\xcc\\xcc\\xcd=\\xcc\\xcc\\xcd?\\x05\\x1e\\xb8=\\xb8Q\\xec' | Amy | f25d3e71-38c3-11ef-bd85-f92c3c7170c3 (2 rows)
See also
For more examples, see SAI examples.