Vector support

Native CQL vector type

Introduced in Cassandra 5.0, DSE 6.9 and Datastax Astra, a vector is represented as a CqlVector<T>.

The vector type is handled by the driver the same way as any other CQL type. You can use

The CqlVector<T> C# type

The API documentation for this class contains useful information. Here’s some examples:

Creating vectors

// these 2 are equivalent
var vector = new CqlVector<int>(1, 2, 3);
var vector = CqlVector<int>.New(new int[] { 1, 2, 3 });

// CqlVector<int>.New requires an array but you prefer using other types such as List 
// you can call the IEnumerable extension method .ToArray() - note that it performs a copy
var vector = CqlVector<int>.New(new List<int> { 1, 2, 3 }.ToArray());

// create a vector with the specified number of dimensions (this is similar to creating an array - new int[dimensions])
var vector = CqlVector<int>.New(3);

// Converting an array to a CqlVector without copying
var vector = new int[] { 1, 2, 3 }.AsCqlVector();

// Converting an IEnumerable to a CqlVector (calls .ToArray() internally so it performs a copy)
var vector = new int[] { 1, 2, 3 }.ToCqlVector();

Modifying vectors

var vector = CqlVector<int>.New(3);

// you can use the index operator just as if you were dealing with an array or list
vector[0] = 1; 
vector[1] = 2;
vector[2] = 3;

Equality

Equals() is defined in the CqlVector<T> class but keep in mind that it uses Array.SequenceEqual internally which doesn’t account for nested arrays/collections so Equals() will not work correctly for those cases.

var vector1 = new CqlVector<int>(1, 2, 3);
var vector2 = new CqlVector<int>(1, 2, 3);
vector1.Equals(vector2); // this returns true

Writing vector data and performing vector search operations

The vector type is handled by the driver the same way as any other CQL type.

The following examples use this schema. In this case, j is a 3 dimensional vector column of float values. Both the vector subtype and the number of dimensions can be changed. Any CQL type is valid as a vector subtype.

CREATE TABLE IF NOT EXISTS table1 (
     i int PRIMARY KEY, 
     j vector<float, 3>
);

/* Supported by C* 5.0, for vector search with the ANN operator */
CREATE CUSTOM INDEX IF NOT EXISTS ann_table1_index ON table1(j) USING 'StorageAttachedIndex';

Simple Statements

await session.ExecuteAsync(
    new SimpleStatement(
        "INSERT INTO table1 (i, j) VALUES (?, ?)", 
        1, 
        new CqlVector<float>(1.0f, 2.0f, 3.0f)));
var rowSet = await session.ExecuteAsync(
    new SimpleStatement(
        "SELECT * FROM table1 ORDER BY j ANN OF ? LIMIT ?", 
        new CqlVector<float>(0.6f, 0.5f, 0.9f),
        1));
var row = rowSet.Single();
var i = row.GetValue<int>("i");
var j = row.GetValue<CqlVector<float>?>("j");

Prepared Statements

var psInsert = await session.PrepareAsync("INSERT INTO table1 (i, j) VALUES (?, ?)");
var psSelect = await session.PrepareAsync("SELECT * FROM table1 ORDER BY j ANN OF ? LIMIT ?");

var boundInsert = psInsert.Bind(2, new CqlVector<float>(5.0f, 6.0f, 7.0f));
await session.ExecuteAsync(boundInsert);

var boundSelect = psSelect.Bind(new CqlVector<float>(4.7f, 5.0f, 5.0f), 1);
var rowSet = await session.ExecuteAsync(boundSelect);

var row = rowSet.Single();
var i = row.GetValue<int>("i");
var j = row.GetValue<CqlVector<float>>("j");

LINQ and Mapper

The LINQ component of the driver doesn’t support the ANN operator so it’s probably best to avoid using LINQ when working with vectors. If a particular workload doesn’t require the ANN operator then LINQ can be used without issues.

// you can also provide a MappingConfiguration object to the Table/Mapper constructors 
// (or use MappingConfiguration.Global) programatically instead of these attributes
[Cassandra.Mapping.Attributes.Table("table1")]
public class Table1
{
    [Cassandra.Mapping.Attributes.PartitionKey]
    [Cassandra.Mapping.Attributes.Column("i")]
    public int I { get; set; }

    [Cassandra.Mapping.Attributes.Column("j")]
    public CqlVector<float>? J { get; set; }
}

// LINQ

var table = new Table<TestTable1>(session);
await table
    .Insert(new TestTable1 { I = 3, J = new CqlVector<float>(10.1f, 10.2f, 10.3f) })
    .ExecuteAsync();

// Using AllowFiltering is not recommended due to unpredictable performance. 
// Here we use AllowFiltering because the example schema is meant to showcase vector search
// but the ANN operator is not supported in LINQ yet.
var entity = (await table.Where(t => t.I == 3 && t.J == CqlVector<float>.New(new [] {10.1f, 10.2f, 10.3f})).AllowFiltering().ExecuteAsync()).SingleOrDefault(); 

// Alternative select using Query syntax instead of Method syntax
var entity = (await (
    from t in table 
    where t.J == CqlVector<float>.New(new [] {10.1f, 10.2f, 10.3f}) 
    select t
    ).AllowFiltering().ExecuteAsync()).SingleOrDefault();

// Mapper

var mapper = new Mapper(session);
await mapper.InsertAsync(
    new TestTable1 { I = 4, J = new CqlVector<float>(11.1f, 11.2f, 11.3f) });
var vectorSearchData = await mapper.FetchAsync<TestTable1>(
    "ORDER BY j ANN OF ? LIMIT ?", 
    new CqlVector<float>(10.9f, 10.9f, 10.9f), 
    1);
var entity = vectorSearchData.SingleOrDefault();