About vector search
Here’s a closer look at how Serverless Cassandra with Vector Search performs and why.
CassIO
For typical generative artificial intelligence (AI) or other machine learning workloads, CassIO abstracts away the details of accessing the Cassandra database. CassIO offers a low-boilerplate, ready-to-use set of tools for seamless integration of Cassandra in most AI-oriented applications. For more, see CassIO.
Preprocessing embedding vectors
Depending on the nature of the data, to get the best from the vector search, some form of preprocessing might be necessary before writing to the database. Two common preprocessing methods are normalizing and standardizing.
Method | Definition | Features |
---|---|---|
Normalizing (Not required for all vector examples) |
Scale data to a length of one by dividing each element in a vector by the vector’s length
(also known as its If embeddings are NOT normalized, |
|
Standardizing |
Shift and scale data for a mean of zero and a standard deviation of one |
|
When you use OpenAI, PaLM, or Simsce to generate your embeddings, they are normalized by default. If you use a different library, normalize your vectors and set the similarity function to |
Similarity metric
Vector search requires appropriate similarity metrics such as cosine
, Euclidean
, or Jaccard
. The data type and desired search behavior determines the metric. Our search supports cosine
(default), dot product
, and Euclidean
similarities.
DataStax recommends using |
Scalability and performance
Scalability is a critical consideration as your dataset expands. Vector search algorithms should be designed to handle large-scale datasets efficiently. The database distributes data and accesses that data with parallel processing, enhancing performance.
Evaluation and iteration
Iteratively refining the vector representations, similarity metrics, indexing techniques, or preprocessing steps can lead to better search performance and user satisfaction. Continuously evaluate and iterate the data to refine search results against known truth and user feedback.
Integrations with vector search
Third-party integrations connect your Serverless Cassandra with Vector Search to various Large Language Model (LLM) frameworks. Use integrations to streamline your vector-based similarity searches and to aid in developing LLM-powered applications.
For the full list of third-party integrations, see Astra DB integrations.