Knowledge Graph RAG

Knowledge Graph is a RAGStack library that provides graph-based representation and retrieval of information. It is designed to store and retrieve information in a way that is more efficient and accurate than vector-based similarity search over Document chunks.

See the Knowledge graph example code to get started using Knowledge Graph RAG.

This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only.

The ragstack-ai-knowledge-graph library

The ragstack-ai-knowledge-graph library contains functions for the extraction and traversal of entity-centric knowledge graphs.

To install the package, run:

pip install ragstack-ai-knowledge-graph

To install the library as an extra with the RAGStack Langchain package, run:

pip install "ragstack-ai-langchain[knowledge-graph]"

How is Knowledge Graph different from RAG?

Short answer: it isn’t. Knowledge graphs are a method of doing RAG, but with a different representation of the information.

RAG with similarity search creates a vector representation of information based on chunks of text. The query is compared to the question, and the most similar chunks are returned as the answer.

Knowledge graph RAG extracts a knowledge graph from information, and stores the graph representation in a vector or graph knowledge store.

Instead of a similarity search query, the graph store is traversed to extract a sub-graph of the knowledge graph’s edges and properties. For example, a query for "Marie Curie" returns a sub-graph of nodes representing her relationships, accomplishments, and other relevant information - the context.

You’re telling the graph store to "start with this node, and show me the relationships to a depth of 2 nodes outwards."

Here is how the Knowledge graph example code uses the Knowledge Graph library to extract a sub-graph around Marie Curie:

from ragstack_knowledge_graph.traverse import Node

graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))

Result:

{Marie Curie(Person) -> Chemist(Profession): HAS_PROFESSION,
 Marie Curie(Person) -> French(Nationality): HAS_NATIONALITY,
 Marie Curie(Person) -> Nobel Prize(Award): WON,
 Marie Curie(Person) -> Physicist(Profession): HAS_PROFESSION,
 Marie Curie(Person) -> Pierre Curie(Person): MARRIED_TO,
 Marie Curie(Person) -> Polish(Nationality): HAS_NATIONALITY,
 Marie Curie(Person) -> Professor(Profession): HAS_PROFESSION,
 Marie Curie(Person) -> Radioactivity(Scientific concept): RESEARCHED,
 Marie Curie(Person) -> Radioactivity(Scientific field): RESEARCHED_IN,
 Marie Curie(Person) -> University Of Paris(Organization): WORKED_AT,
 Pierre Curie(Person) -> Nobel Prize(Award): WON}

As with RAG, this sub-graph context is then dropped into the prompt to generate answers.

ANSWER_PROMPT = (
    "The original question is given below."
    "This question has been used to retrieve information from a knowledge graph."
    "The matching triples are shown below."
    "Use the information in the triples to answer the original question.\n\n"
    "Original Question: {question}\n\n"
    "Knowledge Graph Triples:\n{context}\n\n"
    "Response:"
)

chain = (
    { "question": RunnablePassthrough() }
       # extract_entities is provided by the Cassandra knowledge graph library
       # and extracts entitise as shown above.
    | RunnablePassthrough.assign(entities = extract_entities(llm))
    | RunnablePassthrough.assign(
        # graph_store.as_runnable() is provided by the CassandraGraphStore
        # and takes one or more entities and retrieves the relevant sub-graph(s).
        triples = itemgetter("entities") | graph_store.as_runnable())
    | RunnablePassthrough.assign(
        context = itemgetter("triples") | RunnableLambda(_combine_relations))
    | ChatPromptTemplate.from_messages([ANSWER_PROMPT])
    | llm
)

Result:

Nodes: [Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Nationality'), Node(id='French', type='Nationality'), Node(id='Physicist', type='Profession'), Node(id='Chemist', type='Profession'), Node(id='Radioactivity', type='Scientific concept'), Node(id='Nobel Prize', type='Award'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Institution'), Node(id='Professor', type='Profession')]
Relationships: [Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Physicist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Chemist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Radioactivity', type='Scientific concept'), type='RESEARCHED'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Pierre Curie', type='Person'), type='MARRIED_TO'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Institution'), type='WORKED_AT'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Professor', type='Profession'), type='IS_A')]
Chain Response: content='Marie Curie was a physicist, chemist, and professor. She was of French and Polish nationality. She was married to Pierre Curie and both of them won the Nobel Prize. She worked at the University of Paris and researched radioactivity.' response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 308, 'total_tokens': 358}, 'model_name': 'gpt-4', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-79178e44-64a0-4077-8b90-f21fd004f745-0'

Knowledge Graph, RAGStack, and Astra DB

Knowledge graph extracts graphs from documents using the LLMGraphTransformer library from Langchain, stores the graphs in a Cassandra database, and traverses the graph to extract sub-graphs for answering questions with a custom function.

A graph database or query language isn’t required to use the knowledge graph library.

Retrieving the sub-knowledge graph around a few nodes is a simple graph traversal, while graph DBs are designed for much more complex queries searching for paths with specific sequences of properties. Sub-knowledge graph traversal is often only to a depth of 2 or 3, since nodes which are farther removed become irrelevant to the question pretty quickly. This can be expressed as a few rounds of simple queries (one for each step) or a SQL join.

Eliminating the need for a separate graph database makes it easier to use knowledge graphs. Using Astra DB or Cassandra simplifies transactional writes to both the graph and other data stored in the same place, and likely scales better. Finally, using RAGStack ensures Langchain components like LLMGraphTransformer remain stable.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com