Migrate to RAGStack

Migrating existing LangChain or LlamaIndex applications to RAGStack is easy - just change your requirements.txt or pyproject.toml file to use ragstack-ai.

RAGStack contains the below packages as of version 0.7.0. When RAGStack is installed, these packages are replaced with the stable, tested ragstack-ai versions listed here. For the latest list, see Changelog.

Library Version

astrapy

>=0.7.0,<0.8.0

cassio

>=0.1.3,<0.2.0

langchain

==0.1.4

llama-index

==0.9.48

unstructured

>=0.10,<0.11

Example LangChain migration

Here is a simple LangChain application that loads a dataset from HuggingFace and embeds the document objects in Astra DB Serverless.

langchain-migration.py
import os
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_astradb import AstraDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

load_dotenv()

ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_API_ENDPOINT = os.environ.get("ASTRA_DB_API_ENDPOINT")
OPEN_AI_API_KEY = os.environ.get("OPENAI_API_KEY")
ASTRA_DB_COLLECTION = os.environ.get("ASTRA_DB_COLLECTION")

embedding = OpenAIEmbeddings()
vstore = AstraDBVectorStore(
    embedding=embedding,
    collection_name=ASTRA_DB_COLLECTION,
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)
print(vstore.astra_db.collection(ASTRA_DB_COLLECTION).find())

philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
print("An example entry:")
print(philo_dataset[16])

docs = []
for entry in philo_dataset:
    metadata = {"author": entry["author"]}
    if entry["tags"]:
        for tag in entry["tags"].split(";"):
            metadata[tag] = "y"
    doc = Document(page_content=entry["quote"], metadata=metadata)
    docs.append(doc)

inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len(inserted_ids)} documents.")

print(vstore.astra_db.collection(ASTRA_DB_COLLECTION).find())

vstore.clear()
  1. This application requires installation of the following packages:

    pip install langchain datasets openai astrapy tiktoken python-dotenv
  2. You decide you want to use RAGStack’s pinned, tested version of LangChain (langchain-0.0.349) instead of the latest version of LangChain (langchain-0.0.350).

  3. Install the ragstack-ai package with the --upgrade-strategy="only-if-needed" option. This ensures pip will not upgrade any packages that are already installed, unless required by the ragstack-ai package.

    pip install ragstack-ai --upgrade-strategy="only-if-needed"

    If you’re having trouble with your migration, try uninstalling your current LangChain package and reinstalling the ragstack-ai package.

    pip uninstall langchain
    Successfully uninstalled langchain-0.0.350
    pip install ragstack-ai --upgrade-strategy="only-if-needed"
  4. Once the ragstack-ai package is installed, run pip list to see your current list of packages. Notice that the installed version of langchain is 0.0.349.

    Pip list
    Package             Version
    ------------------- ------------
    aiohttp             3.9.1
    aiosignal           1.3.1
    annotated-types     0.6.0
    anyio               4.1.0
    astrapy             0.6.2
    attrs               23.1.0
    backoff             2.2.1
    beautifulsoup4      4.12.2
    cassandra-driver    3.28.0
    cassio              0.1.3
    certifi             2023.11.17
    chardet             5.2.0
    charset-normalizer  3.3.2
    click               8.1.7
    dataclasses-json    0.6.3
    datasets            2.15.0
    Deprecated          1.2.14
    dill                0.3.7
    distro              1.8.0
    emoji               2.9.0
    filelock            3.13.1
    filetype            1.2.0
    frozenlist          1.4.0
    fsspec              2023.10.0
    geomet              0.2.1.post1
    greenlet            3.0.2
    h11                 0.14.0
    h2                  4.1.0
    hpack               4.0.0
    httpcore            1.0.2
    httpx               0.25.2
    huggingface-hub     0.19.4
    hyperframe          6.0.1
    idna                3.6
    joblib              1.3.2
    jsonpatch           1.33
    jsonpointer         2.4
    langchain           0.0.349
    langchain-community 0.0.1
    langchain-core      0.0.13
    langdetect          1.0.9
    langsmith           0.0.69
    llama-index         0.9.14
    lxml                4.9.3
    marshmallow         3.20.1
    multidict           6.0.4
    multiprocess        0.70.15
    mypy-extensions     1.0.0
    nest-asyncio        1.5.8
    nltk                3.8.1
    numpy               1.26.2
    openai              1.3.8
    packaging           23.2
    pandas              2.1.4
    pip                 23.2.1
    pyarrow             14.0.1
    pyarrow-hotfix      0.6
    pydantic            2.5.2
    pydantic_core       2.14.5
    python-dateutil     2.8.2
    python-dotenv       1.0.0
    python-iso639       2023.12.11
    python-magic        0.4.27
    pytz                2023.3.post1
    PyYAML              6.0.1
    ragstack-ai         0.3.1
    rapidfuzz           3.5.2
    regex               2023.10.3
    requests            2.31.0
    setuptools          65.5.0
    six                 1.16.0
    sniffio             1.3.0
    soupsieve           2.5
    SQLAlchemy          2.0.23
    tabulate            0.9.0
    tenacity            8.2.3
    tiktoken            0.5.2
    tqdm                4.66.1
    typing_extensions   4.9.0
    typing-inspect      0.9.0
    tzdata              2023.3
    unstructured        0.10.30
    urllib3             2.1.0
    wrapt               1.16.0
    xxhash              3.4.1
    yarl                1.9.4
  5. Run your application…​

    python3 langchain-migration.py

…​and you should see the same output as before, with no changes to your code required!

Example LlamaIndex migration

Here is an application that uses LlamaIndex to index a set of documents.

llama-migration.py
import os
from dotenv import load_dotenv
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.vector_stores import AstraDBVectorStore
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext

load_dotenv()

ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_API_ENDPOINT = os.environ.get("ASTRA_DB_API_ENDPOINT")

# Download and load dataset
dataset = download_llama_dataset("PaulGrahamEssayDataset", "./data")
documents = SimpleDirectoryReader("./data/source_files").load_data()

# Display basic information about the documents
print(f"Total documents: {len(documents)}")
first_doc = documents[0]
print(f"First document, id: {first_doc.doc_id}")
print(f"First document, hash: {first_doc.hash}")
print(f"First document, text ({len(first_doc.text)} characters):\n{'=' * 20}\n{first_doc.text[:360]} ...")

# Setup AstraDB Vector Store
astra_db_store = AstraDBVectorStore(
    token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"),
    collection_name="test",
    embedding_dimension=1536
)

# Create Storage Context and Index
storage_context = StorageContext.from_defaults(vector_store=astra_db_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# Query the index
def execute_query(query_string, mode="default", top_k=3, mmr_prefetch_factor=None):
    retriever = index.as_retriever(
        vector_store_query_mode=mode,
        similarity_top_k=top_k,
        vector_store_kwargs={"mmr_prefetch_factor": mmr_prefetch_factor} if mmr_prefetch_factor else {}
    )

    nodes_with_scores = retriever.retrieve(query_string)

    print(query_string)
    print(f"Found {len(nodes_with_scores)} nodes.")
    for idx, node_with_score in enumerate(nodes_with_scores):
        print(f"    [{idx}] score = {node_with_score.score}")
        print(f"        id    = {node_with_score.node.node_id}")
        print(f"        text  = {node_with_score.node.text[:90]} ...")

# Execute queries
query_string_1 = "Why did the author choose to work on AI?"
execute_query(query_string_1)
execute_query(query_string_1, mode="mmr", mmr_prefetch_factor=4)
  1. This application requires installation of the following packages:

    pip install llama-index
  2. Your application is tested and working at llama-index version 0.9.29. But then, LlamaIndex version 0.10.1 changes the module to split every integration into its own PyPi package. Oh no, your application no longer works!

  3. You decide to use RAGStack’s pinned, tested version of LlamaIndex (currently 0.9.34) instead of the latest version of LlamaIndex(0.10.1), to avoid this sudden change happening again in the future.

  4. Install the ragstack-ai package with the --upgrade-strategy="only-if-needed" option. This ensures pip will not upgrade any packages that are already installed, unless required by the ragstack-ai package.

    pip install ragstack-ai --upgrade-strategy="only-if-needed"

    If you’re having trouble with your migration, try uninstalling your current LlamaIndex packages and reinstalling the ragstack-ai package.

    pip uninstall llama-index-agent-openai llama-index-core llama-index-embeddings-openai llama-index-legacy llama-index-llms-openai llama-index-multi-modal-llms-openai llama-index-question-gen-openai llama-index-readers-file llama-index-program-openai
    Successfully uninstalled llama-index-0.9.29
    pip install ragstack-ai --upgrade-strategy="only-if-needed"
  5. Once the ragstack-ai package is installed, run pip list to see your current list of packages. Notice that the installed version of llama-index is 0.9.34.

    Pip list
    Package             Version
    ------------------- ------------
    aiohttp             3.9.1
    aiosignal           1.3.1
    annotated-types     0.6.0
    anyio               4.2.0
    astrapy             0.7.4
    attrs               23.2.0
    backoff             2.2.1
    beautifulsoup4      4.12.3
    cassandra-driver    3.29.0
    cassio              0.1.4
    certifi             2023.11.17
    chardet             5.2.0
    charset-normalizer  3.3.2
    click               8.1.7
    dataclasses-json    0.6.3
    Deprecated          1.2.14
    deprecation         2.1.0
    distro              1.9.0
    emoji               2.10.0
    filetype            1.2.0
    frozenlist          1.4.1
    fsspec              2023.12.2
    geomet              0.2.1.post1
    greenlet            3.0.3
    h11                 0.14.0
    h2                  4.1.0
    hpack               4.0.0
    httpcore            1.0.2
    httpx               0.25.2
    hyperframe          6.0.1
    idna                3.6
    joblib              1.3.2
    jsonpatch           1.33
    jsonpointer         2.4
    langchain           0.1.4
    langchain-community 0.0.16
    langchain-core      0.1.16
    langchain-openai    0.0.3
    langdetect          1.0.9
    langsmith           0.0.83
    llama-index         0.9.34
    lxml                5.1.0
    marshmallow         3.20.2
    multidict           6.0.4
    mypy-extensions     1.0.0
    nest-asyncio        1.6.0
    networkx            3.2.1
    nltk                3.8.1
    numpy               1.26.3
    openai              1.9.0
    packaging           23.2
    pandas              2.2.0
    pip                 23.3.1
    pydantic            2.5.3
    pydantic_core       2.14.6
    python-dateutil     2.8.2
    python-dotenv       1.0.1
    python-iso639       2024.1.2
    python-magic        0.4.27
    pytz                2023.3.post1
    PyYAML              6.0.1
    ragstack-ai         0.6.0
    rapidfuzz           3.6.1
    regex               2023.12.25
    requests            2.31.0
    setuptools          68.2.2
    six                 1.16.0
    sniffio             1.3.0
    soupsieve           2.5
    SQLAlchemy          2.0.25
    tabulate            0.9.0
    tenacity            8.2.3
    tiktoken            0.5.2
    toml                0.10.2
    tqdm                4.66.1
    typing_extensions   4.9.0
    typing-inspect      0.9.0
    tzdata              2023.4
    unstructured        0.10.30
    urllib3             2.1.0
    wrapt               1.16.0
    yarl                1.9.4
  6. Run your application…​

    python3 llama-migration.py

    …​and you should see the same output as before, with no changes to your code required!

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com