Use vector search with Cassandra Query Language (CQL)
This example demonstrates how to use the vector search with Cassandra Query Language (CQL) with Hyper-Converged Database (HCD) 1.2.
To enable your machine learning model, vector search uses data compared by similarity within a database, even if it is not explicitly defined by a connection. A vector is an array of floating point type that represents a specific object or entity.
The foundation of vector search lies within the embeddings, which are compact representations of text as vectors of floating-point numbers. These embeddings are generated by feeding the text through an API, which uses a neural network to transform the input into a fixed-length vector. Embeddings capture the semantic meaning of the text, providing a more nuanced understanding than traditional term-based approaches. The vector representation allows for input that is substantially similar to produce output vectors that are geometrically close; inputs that are not similar are geometrically further apart.
To enable vector search, a vector
data type is available in your HCD database with vector search.
For more information, see vector search examples in the CQL documentation.
To use vector search in the cloud, check out the Astra DB Serverless vector search quickstart. |
Prerequisites
-
Install a terminal to run your client
-
Identify your credentials
Install HCD
Go install HCD if you haven’t already. For exploration, use the Docker installation option.
Install a terminal to run your client
The clients can be tested by running them in a terminal. You’ll want Xterm, Terminal, or another terminal emulator.
Identify your credentials
include::ROOT:partial$contents through an embeddings generator, as well as the query you were asking to match. This example is simply to show the mechanics of how to use CQL to create vector search data objects.
Create vector search keyspace
Create a new vector search keyspace called cycling
:
CREATE KEYSPACE IF NOT EXISTS cycling
WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
include::ROOT:partial$/cassio.org[CassIO] abstracts away the details of accessing the Cassandra database for the typical needs of generative artificial intelligence (AI) or other machine learning workloads. CassIO offers a low-boilerplate, ready-to-use set of tools for seamless integration of Cassandra in most AI-oriented applications.
For more information, see CassIO.