Intro to vector databases

The DataStax Astra DB Serverless (Vector) documentation site is currently in Public Preview and is provided on an “AS IS” basis, without warranty or indemnity of any kind. For more, see the DataStax Preview Terms.

Vector databases enable use cases that require efficient similarity search.


Embeddings are vectors, often generated by machine learning models, that capture semantic relationships between concepts or objects. Related objects are positioned close to each other in the embedding space.

Preprocess embeddings

You may need to normalize or standardize your vectors before writing them to the database.

Method Definition Features


Scale data to a length of one by dividing each element in a vector by the vector’s length (also known as its Euclidean norm or L2 norm).

  • Eliminates the impact of vector scale.

  • Makes high-dimensional data consistent.

  • Allows you to use the dot product similarity metric, which is ~50% faster than cosine similarity.


Shift and scale data for a mean of zero and a standard deviation of one.

  • Gives vectors the properties of a Gaussian distribution.

  • Ensures all features contribute equally to distance calculations.

If embeddings are not normalized, the dot product silently returns meaningless query results. When you use OpenAI, PaLM, or Simsce to generate your embeddings, they are normalized by default. If you use a different library, you may need to normalize your vectors to use the dot product.

Define a vector field

It’s important to define the right type and embedding model for your vector fields.


Vector fields use the VECTOR type with a fixed dimensionality. The dimensionality refers to the number of floats in the vector, which could be represented as VECTOR<FLOAT, 768>. The dimension value is defined by the embedding model you use.

Embedding model

Select an embedding model for your dataset that creates good structure by ensuring related objects are near each other in the embedding space. You may need to test different embedding models. You must embed the query with the same embedding model you used for the data.

There are many embedding models. Here are some of the most popular models to get you started:

Model Dimensions Link





























At its core, a vector database is about efficient vector search, which allows you to find similar content. Here’s how vector search works:

  1. Create a collection of embeddings for some content.

  2. Pick a new piece of content.

  3. Generate an embedding for that piece of content.

  4. Run a similarity search on the collection.

You’ll get a list of the content in your collection with embeddings that are most similar to this new content.

To use vector search effectively, you need to pair it with metadata and the right embedding model.

  • Store relevant metadata about a vector in other fields in your table. For example, if your vector is an image, store a reference to the original image in the same table.

  • Select an embedding model based on your data and the queries you will make. Embedding models exist for text, images, audio, video, and more.

While vector embeddings can replace or augment some functions of a traditional database, vector embeddings are not a replacement for other data types. Vector search is best used as a supplement to existing search techniques because of its limitations:

  • Vector embeddings are not human-readable.

  • Embeddings are not best for directly retrieving data from a table. However, you can pair a vector search with a traditional search. For example, you can find the most similar blog posts by a particular author.

  • The embedding model might not be able to capture all relevant information from the data, leading to incorrect or incomplete results.

Common use cases

Vector search is important for LLM use cases, including Retrieval-Augmented Generation (RAG) and AI agents.

Retrieval-Augmented Generation (RAG)

RAG is a technique for improving the accuracy of an LLM. RAG accomplishes this by adding relevant content directly to the LLM’s context window. Here’s how it works:

  1. Pick an embedding model.

  2. Generate embeddings from your data.

  3. Store these embeddings in a vector database.

  4. When the user submits a query, generate an embedding from the query using the same model.

  5. Run a vector search to find data that’s similar to the user’s query.

  6. Pass this data to the LLM so it’s available in the context window.

Now, when the LLM generates a response, it is less likely to make things up (hallucinate). See our chatbot and recommendation system tutorials to learn how to implement RAG.

AI agents

An AI agent provides an LLM with the ability to take different actions depending on the goal. In the preceding RAG example, a user might submit a query unrelated to your content. You can build an agent to take the necessary actions to fetch relevant content.

For example, you might design an agent to run a Google search with the user’s query. It can pass the results of that search to the LLM’s context window. It can also generate embeddings and store both the content and the embeddings in a vector database. In this way, your agent can build a persistent memory of the world and its actions.


Was This Helpful?

Give Feedback

How can we improve the documentation?