RAGStack Tests
The latest RAGStack test reports are available on the Testspace dashboard.
Why is this important?
Generative AI moves very quickly. The RAGStack test suite is designed to ensure that the RAGStack components are working together as expected, no matter what new features are added or what new versions of the underlying components are released upstream.
This thoroughly tested approach allows your enterprise to confidently deploy RAGStack in production without worrying about breaking changes.
For the test source code, see ragstack-e2e-tests GitHub repository.
Testspace reports
RAGStack tests are published to Testspace.
Tests are run multiple times daily against DataStax Enterprise (DSE) and vector-enabled Astra DB Serverless databases.
Test suites
Suite and Environment | Tested Components | Tests Run |
---|---|---|
RAGStack test suite - LangChain dev - DSE |
Tests LangChain against DSE snapshot |
e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.llama_index.test_compatibility_rag |
RAGStack test suite - LangChain dev - Astra DB Serverless |
Tests LangChain against Astra DB Serverless snapshot |
e2e_tests.langchain.test_astra e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.langchain_llamaindex.test_astra e2e_tests.llama_index.test_astra e2e_tests.llama_index.test_compatibility_rag |
RAGStack test suite - RAGStack dev - DSE |
Tests RAGStack against DSE snapshot |
e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.llama_index.test_compatibility_rag |
RAGStack test suite - RAGStack dev - Astra DB Serverless |
Tests RAGStack against Astra DB Serverless snapshot |
e2e_tests.langchain.test_astra e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.langchain_llamaindex.test_astra e2e_tests.llama_index.test_astra e2e_tests.llama_index.test_compatibility_rag |
RAGStack test suite - LlamaIndex dev - DSE |
Tests LlamaIndex against DSE snapshot |
e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.llama_index.test_compatibility_rag |
RAGStack test suite - LLamaIndex dev - Astra DB Serverless |
Tests LLamaIndex against Astra DB Serverless snapshot |
e2e_tests.langchain.test_astra e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.langchain_llamaindex.test_astra e2e_tests.llama_index.test_astra e2e_tests.llama_index.test_compatibility_rag |
RAGStack security scans - RAGStack dev |
Tests for vulnerabilties with Snyk |
security-scans |
RAGStack security scans - RAGStack latest |
Tests for vulnerabilties with Snyk |
security-scans |
What is being tested?
test_astra
Test | Description |
---|---|
|
Tests the basic vector search functionality. |
|
Checks how the system handles errors during data ingestion. |
|
Tests error handling when the vector store is initialized with incorrect connection parameters. |
|
Tests the functionality of filtering documents based on metadata, without considering their vector representations. |
|
Tests various aspects of vector search combined with metadata filtering. |
test_compatibility_rag
Test | Description |
---|---|
|
RAG Tests: This part tests Retrieval-Augmented Generation (RAG) capabilities. |
|
Multimodal RAG Tests: This tests the multimodal RAG capabilities (handling both text and images). |
|
Chat Model Tests: These tests check the functionality of chat models. It sets up a chat prompt and expects the chat model to generate a relevant response. |
test_document_loaders
Test | Description |
---|---|
|
Tests the CSVLoader, which loads documents from a CSV file. |
|
Tests the WebBaseLoader, which fetches documents from specified web URLs. |
|
Tests the S3DirectoryLoader, which loads documents from an AWS S3 bucket. |
|
Tests the AzureBlobStorageContainerLoader, which loads documents from an Azure Blob Storage container. |
|
Tests the AstraDBLoader, which loads documents from an Astra DB Serverless database. |
e2e_tests.langchain_llamaindex.test_astra
Test | Description |
---|---|
|
This test checks the integration where a document is ingested using LlamaIndex and then retrieved using LangChain. |
|
This test ingests a document using LangChain and retrieves it using LlamaIndex, the opposite of the first test. |
security-scans
Test | Description |
---|---|
|
This test scans Python dependencies for vulnerabilities using Snyk. |
|
This test scans the RAGStack Docker image for vulnerabilities using Snyk. |
Navigate Testspace dashboard
Tests are presented in a hierarchical structure.
To navigate from your Testspace Project down to an individual test case, select each item in the hierarchy to drill down to the next level.
-
Project (contains spaces) Example:
ragstack-ai
-
Space (contains test sequences) Example:
RAGStack test suite - LangChain dev - DSE
-
Test sequence (contains tests) Example:
e2e_tests.langchain.test_compatibility_rag
-
Test cases (passed, failed, skipped, etc.) Example:
Test rag: openai embedding | openai llm | cassandra | rag custom chain
-
-
-
LangSmith trace
LangSmith tracing currently requires logging into Testspace. We are working on a solution to make these traces publicly available. |
Within individual test cases, LangSmith traces are also available to view.
A LangSmith trace displays the test’s entire LLM chain, including the input prompt, the generated response, token spend, and the metadata associated with the response.
For example, you can see that a test fails because the LLM lacks the context to answer the prompt and when was it released?
because it doesn’t understand what it
is. Providing the LLM more context would likely solve this problem.
query: ' I do not have enough context to rephrase the follow up question "and when was it released?" into a standalone question. Without knowing what "it" refers to in the original conversation, I cannot create a coherent standalone question. Please provide more context about what "MyFakeProductForTesting" refers to so I can understand what the follow up question is asking about.'
Metrics
Testspace provides a number of metrics to help you understand the health of your test suite.
-
Results Strength - measures the stability of results and infrastructure with the
Pass Rate
andHealth Rate
metrics. -
Test Effectiveness - measures if tests are effectively capturing side-effects with the
Effective Regression Rate
metric. Measures the percentage of results with unique regressions, including invalid results. -
Workflow Efficiency - measures if failures are being resolved quickly and efficiently with the
Resolved Failures
andFailure Resolution Time
metrics.
For more, see the Testspace docs.
Results
The Results tab displays the results of the latest Test Sequence run.
Filter tracked test failures by New
, Flaky
, Consistent
, Resolved
, and Exempt
.