RAGStack Tests

The latest RAGStack test reports are available on the Testspace dashboard.

Why is this important?

Generative AI moves very quickly. The RAGStack test suite is designed to ensure that the RAGStack components are working together as expected, no matter what new features are added or what new versions of the underlying components are released upstream.

This thoroughly tested approach allows your enterprise to confidently deploy RAGStack in production without worrying about breaking changes.

For the test source code, see ragstack-e2e-tests GitHub repository.

Testspace reports

RAGStack tests are published to Testspace.

Tests are run multiple times daily against DataStax Enterprise (DSE) and vector-enabled Astra DB Serverless databases.

Test suites

Suite and Environment Tested Components Tests Run

RAGStack test suite - LangChain dev - DSE

Tests LangChain against DSE snapshot

e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.llama_index.test_compatibility_rag

RAGStack test suite - LangChain dev - Astra DB Serverless

Tests LangChain against Astra DB Serverless snapshot

e2e_tests.langchain.test_astra e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.langchain_llamaindex.test_astra e2e_tests.llama_index.test_astra e2e_tests.llama_index.test_compatibility_rag

RAGStack test suite - RAGStack dev - DSE

Tests RAGStack against DSE snapshot

e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.llama_index.test_compatibility_rag

RAGStack test suite - RAGStack dev - Astra DB Serverless

Tests RAGStack against Astra DB Serverless snapshot

e2e_tests.langchain.test_astra e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.langchain_llamaindex.test_astra e2e_tests.llama_index.test_astra e2e_tests.llama_index.test_compatibility_rag

RAGStack test suite - LlamaIndex dev - DSE

Tests LlamaIndex against DSE snapshot

e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.llama_index.test_compatibility_rag

RAGStack test suite - LLamaIndex dev - Astra DB Serverless

Tests LLamaIndex against Astra DB Serverless snapshot

e2e_tests.langchain.test_astra e2e_tests.langchain.test_compatibility_rag e2e_tests.langchain.test_document_loaders e2e_tests.langchain_llamaindex.test_astra e2e_tests.llama_index.test_astra e2e_tests.llama_index.test_compatibility_rag

RAGStack security scans - RAGStack dev

Tests for vulnerabilties with Snyk

security-scans

RAGStack security scans - RAGStack latest

Tests for vulnerabilties with Snyk

security-scans

What is being tested?

test_astra

Test Description

test_basic_vector_search

Tests the basic vector search functionality.

test_ingest_errors

Checks how the system handles errors during data ingestion.

test_wrong_connection_parameters

Tests error handling when the vector store is initialized with incorrect connection parameters.

test_basic_metadata_filtering_no_vector

Tests the functionality of filtering documents based on metadata, without considering their vector representations.

test_vector_search_with_metadata

Tests various aspects of vector search combined with metadata filtering.

test_compatibility_rag

Test Description

test_rag

RAG Tests: This part tests Retrieval-Augmented Generation (RAG) capabilities.

test_multimodal

Multimodal RAG Tests: This tests the multimodal RAG capabilities (handling both text and images).

test_chat

Chat Model Tests: These tests check the functionality of chat models. It sets up a chat prompt and expects the chat model to generate a relevant response.

test_document_loaders

Test Description

test_csv_loader

Tests the CSVLoader, which loads documents from a CSV file.

test_web_based_loader

Tests the WebBaseLoader, which fetches documents from specified web URLs.

test_s3_loader

Tests the S3DirectoryLoader, which loads documents from an AWS S3 bucket.

test_azure_blob_doc_loader

Tests the AzureBlobStorageContainerLoader, which loads documents from an Azure Blob Storage container.

test_astradb_loader

Tests the AstraDBLoader, which loads documents from an Astra DB Serverless database.

e2e_tests.langchain_llamaindex.test_astra

Test Description

test_ingest_llama_retrieve_langchain

This test checks the integration where a document is ingested using LlamaIndex and then retrieved using LangChain.

test_ingest_langchain_retrieve_llama_index

This test ingests a document using LangChain and retrieves it using LlamaIndex, the opposite of the first test.

security-scans

Test Description

Python dependencies

This test scans Python dependencies for vulnerabilities using Snyk.

Docker image

This test scans the RAGStack Docker image for vulnerabilities using Snyk.

Tests are presented in a hierarchical structure.

To navigate from your Testspace Project down to an individual test case, select each item in the hierarchy to drill down to the next level.

Testspace hierarchy
  • Project (contains spaces) Example: ragstack-ai

    • Space (contains test sequences) Example: RAGStack test suite - LangChain dev - DSE

      • Test sequence (contains tests) Example: e2e_tests.langchain.test_compatibility_rag

        • Test cases (passed, failed, skipped, etc.) Example: Test rag: openai embedding | openai llm | cassandra | rag custom chain

LangSmith trace

LangSmith tracing currently requires logging into Testspace. We are working on a solution to make these traces publicly available.

Within individual test cases, LangSmith traces are also available to view.

A LangSmith trace displays the test’s entire LLM chain, including the input prompt, the generated response, token spend, and the metadata associated with the response.

For example, you can see that a test fails because the LLM lacks the context to answer the prompt and when was it released? because it doesn’t understand what it is. Providing the LLM more context would likely solve this problem.

query: ' I do not have enough context to rephrase the follow up question "and when was it released?" into a standalone question. Without knowing what "it" refers to in the original conversation, I cannot create a coherent standalone question. Please provide more context about what "MyFakeProductForTesting" refers to so I can understand what the follow up question is asking about.'

Metrics

Testspace provides a number of metrics to help you understand the health of your test suite.

  • Results Strength - measures the stability of results and infrastructure with the Pass Rate and Health Rate metrics.

  • Test Effectiveness - measures if tests are effectively capturing side-effects with the Effective Regression Rate metric. Measures the percentage of results with unique regressions, including invalid results.

  • Workflow Efficiency - measures if failures are being resolved quickly and efficiently with the Resolved Failures and Failure Resolution Time metrics.

For more, see the Testspace docs.

Results

The Results tab displays the results of the latest Test Sequence run.

Filter tracked test failures by New, Flaky, Consistent, Resolved, and Exempt.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com