Terminology
The DataStax Astra DB Serverless (Vector) documentation site is currently in Public Preview and is provided on an “AS IS” basis, without warranty or indemnity of any kind. For more, see the DataStax Preview Terms. |
This page is a collection of terms and concepts used to describe vector databases within the the context of Generative AI, with links for further reading.
- Agent
-
A system that can make decisions based on its inputs or environment. Intelligent agents, such as those employed in machine learning and artificial intelligence systems, often use vector databases to facilitate rapid and efficient search, comparison, and retrieval of high-dimensional data. Read More
- Approximate Nearest Neighbors (ANN) search
-
A method used to quickly find approximate nearest neighbors in large datasets, often with high-dimensional features, sacrificing some accuracy for speed. Read More
- Chunking
-
Chunking breaks text into chunks (subsets of tokens) that represent a piece of information. In techniques like RAG, documents undergo chunking, where embeddings are generated from these chunks, stored in a vector database, and retrieved as part of the prompting process. Read More
- Collection
-
Datasets that contain various forms of data, such as vector embeddings. Datasets and collections are often used interchangeably - we will use "collections" in this guide. Read More
- Dataset
-
A collection of data points or records used for analysis. Datasets and collections are often used interchangeably - we will use "collections" in this guide. Read More
- Embedding
-
Turning data, like words or images, into vectors to capture their meaning. Read More
- FLARE pattern
-
A method to effectively ask questions to AI models. Read More
- Indexing
-
Organizing data to make retrieval more efficient. Read More
- k-Nearest Neighbors (kNN)
-
A supervised machine learning algorithm that classifies an item based on the majority class of its 'k' most similar items in the dataset. Read More
- Large Language Models (LLMs)
-
Models that can generate long passages of text. Read More
- Normalization
-
The process of adjusting data values to a common scale to ensure that different features have equal importance in machine learning algorithms. Read More
- Prompt engineering
-
Crafting the right questions to get desired answers from AI. Read More
- RAG (Retrieval Augmented Generation)
-
A method that retrieves relevant documents and then generates a response. Read More
- Reflexion
-
The ability of an AI agent to iteratively inspect its own code, evaluate its performance, and correct mistakes. Read More
- Similarity metric/function
-
A function that quantifies how similar two objects or datasets are, commonly used in machine learning and data analysis. Read More
- Tokenizer
-
A tool or process that breaks down input data, such as text, into smaller units that are semantically relevant for processing in a model, often called tokens. Read More
- Transformers
-
A type of deep learning architecture used for processing sequences of data. Read More
- Vector
-
An ordered list of numbers, frequently used in AI. Embeddings are a specific type of vectors that encode semantic meaning. Read More
- Vector database
-
A database designed for storing vectors. Read More
- Vector index
-
A data structure used to efficiently store and query high-dimensional vectors for similarity or distance-based retrievals. Read More