Knowledge Graph

Use RAGStack, LLMGraphTransformer, and DataStax AstraDB to extract knowledge triples and store them in a vector database.

This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only.

Prerequisites

  • An active DataStax AstraDB

  • Python 3.11 (to use Union and self hints)

  • OpenAI API key

Environment

  1. Install dependencies:

    pip install "ragstack-ai-langchain[knowledge-graph]" python-dotenv
  2. Create a .env file and store the necessary credentials.

    OPENAI_API_KEY="sk-..."
    ASTRA_DB_DATABASE_ID="670d40c2-80f9-4cb0-8c74-d524dd6944d1"
    ASTRA_DB_APPLICATION_TOKEN="AstraCS:..."
    ASTRA_DB_KEYSPACE="default_keyspace"

If you’re running the notebook in Colab, run the cell using getpass to set the necessary environment variables.

Create a graph store in Astra

  1. Import the necessary libraries and load the variables from your .env files.

    import dotenv
    import cassio
    from ragstack_knowledge_graph.cassandra_graph_store import CassandraGraphStore
    from langchain_experimental.graph_transformers import LLMGraphTransformer
    from langchain_openai import ChatOpenAI
    from langchain_core.documents import Document
    from ragstack_knowledge_graph.render import render_graph_documents
    from ragstack_knowledge_graph.traverse import Node
    from ragstack_knowledge_graph import extract_entities
    from operator import itemgetter
    from langchain_core.runnables import RunnableLambda, RunnablePassthrough
    from langchain_core.prompts import ChatPromptTemplate
    
    dotenv.load_dotenv()
  2. Initialize a connection to AstraDB with the Cass-IO library.

    import cassio
    cassio.init(auto=True)
  3. Create graph store.

    from knowledge_graph.cassandra_graph_store import CassandraGraphStore
    graph_store = CassandraGraphStore()

Extract a knowledge graph from your data

  1. Extract a knowledge graph with LLMGraphTransformer, and render it to Astra with GraphViz.

    llm = ChatOpenAI(temperature=0, model_name="gpt-4")
    
    llm_transformer = LLMGraphTransformer(llm=llm)
    
    text = """
    Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
    She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
    Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
    She was, in 1906, the first woman to become a professor at the University of Paris.
    """
    documents = [Document(page_content=text)]
    graph_documents = llm_transformer.convert_to_graph_documents(documents)
    print(f"Nodes:{graph_documents[0].nodes}")
    print(f"Relationships:{graph_documents[0].relationships}")
  2. Render the extracted graph to GraphViz and save the extracted graph documents to the AstraDB graph store.

    render_graph_documents(graph_documents)
    graph_store.add_graph_documents(graph_documents)

Query the graph store

  1. Query the GraphStore. The as_runnable method takes some configuration for how to extract the subgraph and returns a LangChain Runnable. This Runnable can be invoked on a node or sequence of nodes to traverse from those starting points.

    graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))
  2. For getting started, the library also provides a Runnable for extracting the starting entities from a question.

    extract_entities(llm).invoke({ "question": "Who is Marie Curie?"})

Query Chain

Create a chain which does the following:

  1. Use the entity extraction Runnable from the library to determine the starting points.

  2. Retrieve the sub-knowledge graphs starting from those nodes.

  3. Create a context containing those knowledge triples.

  4. Apply the LLM to answer the question given the context.

    llm = ChatOpenAI(model_name = "gpt-4")
    
    def _combine_relations(relations):
        return "\n".join(map(repr, relations))
    
    ANSWER_PROMPT = (
        "The original question is given below."
        "This question has been used to retrieve information from a knowledge graph."
        "The matching triples are shown below."
        "Use the information in the triples to answer the original question.\n\n"
        "Original Question: {question}\n\n"
        "Knowledge Graph Triples:\n{context}\n\n"
        "Response:"
    )
    
    chain = (
        { "question": RunnablePassthrough() }
        | RunnablePassthrough.assign(entities = extract_entities(llm))
        | RunnablePassthrough.assign(triples = itemgetter("entities") | graph_store.as_runnable())
        | RunnablePassthrough.assign(context = itemgetter("triples") | RunnableLambda(_combine_relations))
        | ChatPromptTemplate.from_messages([ANSWER_PROMPT])
        | llm
    )
    
    response=chain.invoke("Who is Marie Curie?")
    print(f"Chain Response: {response}")
  5. Run the chain end-to-end to answer a question using the retrieved knowledge.

    python3.11 knowledge-graph-marie-curie.py

    Result:

    Nodes: [Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Nationality'), Node(id='French', type='Nationality'), Node(id='Physicist', type='Profession'), Node(id='Chemist', type='Profession'), Node(id='Radioactivity', type='Scientific concept'), Node(id='Nobel Prize', type='Award'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Institution'), Node(id='Professor', type='Profession')]
    Relationships: [Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Physicist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Chemist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Radioactivity', type='Scientific concept'), type='RESEARCHED'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Pierre Curie', type='Person'), type='MARRIED_TO'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Institution'), type='WORKED_AT'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Professor', type='Profession'), type='IS_A')]
    Chain Response: content='Marie Curie was a physicist, chemist, and professor. She was of French and Polish nationality. She was married to Pierre Curie and both of them won the Nobel Prize. She worked at the University of Paris and researched radioactivity.' response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 308, 'total_tokens': 358}, 'model_name': 'gpt-4', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-79178e44-64a0-4077-8b90-f21fd004f745-0'

Complete code

Python
import dotenv
import cassio
from ragstack_knowledge_graph.cassandra_graph_store import CassandraGraphStore
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from ragstack_knowledge_graph.render import render_graph_documents
from ragstack_knowledge_graph.traverse import Node
from ragstack_knowledge_graph import extract_entities
from operator import itemgetter
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

# Load environment variables
dotenv.load_dotenv()

# Initialize cassio
cassio.init(auto=True)

# Create graph store
graph_store = CassandraGraphStore()

# Initialize LLM for graph transformer
llm = ChatOpenAI(temperature=0, model_name="gpt-4")
llm_transformer = LLMGraphTransformer(llm=llm)

# Sample text
text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]

# Convert documents to graph documents
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes: {graph_documents[0].nodes}")
print(f"Relationships: {graph_documents[0].relationships}")

# Render the extracted graph to GraphViz
render_graph_documents(graph_documents)

# Save the extracted graph documents to the AstraDB / Cassandra Graph Store
graph_store.add_graph_documents(graph_documents)

# Query the graph
graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))

# Example showing extracted entities (nodes)
extract_entities(llm).invoke({"question": "Who is Marie Curie?"})

# Define the answer prompt
ANSWER_PROMPT = (
    "The original question is given below."
    "This question has been used to retrieve information from a knowledge graph."
    "The matching triples are shown below."
    "Use the information in the triples to answer the original question.\n\n"
    "Original Question: {question}\n\n"
    "Knowledge Graph Triples:\n{context}\n\n"
    "Response:"
)

# Combine relations function
def _combine_relations(relations):
    return "\n".join(map(repr, relations))

# Create the chain for querying
chain = (
    {"question": RunnablePassthrough()}
    | RunnablePassthrough.assign(entities=extract_entities(llm))
    | RunnablePassthrough.assign(triples=itemgetter("entities") | graph_store.as_runnable())
    | RunnablePassthrough.assign(context=itemgetter("triples") | RunnableLambda(_combine_relations))
    | ChatPromptTemplate.from_messages([ANSWER_PROMPT])
    | llm
)

# Invoke the chain
response=chain.invoke("Who is Marie Curie?")
print(f"Chain Response: {response}")

Use KnowledgeSchema instead of LLMGraphTransformer

Instead of using LLMGraphTransformer to build your graph, the Knowledge Graph library also includes a unique knowledge extraction system called KnowledgeSchema that lets you define your nodes and relationships in a YAML file and load it to guide the graph extraction process.

Example usage

  1. Copy the sample marie_curie_schema.yaml file from the RAGStack repo. This example assumes you copy it to the same directory as your script.

  2. Create a new Python script and add the following code. In this example, KnowledgeSchema is initialized from a YAML file, the KnowledgeSchemaExtractor uses an LLM to extract knowledge from the source according to the YAML-defined schema, and the extracted nodes and relationships are printed.

    extraction-test.py
    from os import path
    
    from langchain_community.graphs.graph_document import Node, Relationship
    from langchain_core.documents import Document
    from langchain_core.language_models import BaseChatModel
    from langchain_openai import ChatOpenAI
    
    OPENAI_API_KEY = "sk-..."
    
    from ragstack_knowledge_graph.extraction import (
        KnowledgeSchema,
        KnowledgeSchemaExtractor,
    )
    
    def extractor(llm: BaseChatModel) -> KnowledgeSchemaExtractor:
        schema = KnowledgeSchema.from_file(
            path.join(path.dirname(__file__), "./marie_curie_schema.yaml")
        )
        return KnowledgeSchemaExtractor(
            llm=llm,
            schema=schema,
        )
    
    MARIE_CURIE_SOURCE = """
    Marie Curie, was a Polish and naturalised-French physicist and chemist who
    conducted pioneering research on radioactivity. She was the first woman to win a
    Nobel Prize, the first person to win a Nobel Prize twice, and the only person to
    win a Nobel Prize in two scientific fields. Her husband, Pierre Curie, was a
    won first Nobel Prize with her, making them the first-ever married couple to
    win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
    She was, in 1906, the first woman to become a professor at the University of
    Paris.
    """
    
    def test_extraction(extractor: KnowledgeSchemaExtractor):
        results = extractor.extract([Document(page_content=MARIE_CURIE_SOURCE)])
    
        print("Extracted Nodes:")
        for node in results[0].nodes:
            print(f"Node ID: {node.id}, Type: {node.type}")
    
        print("\nExtracted Relationships:")
        for relationship in results[0].relationships:
            print(f"Relationship: {relationship.source.id} -> {relationship.target.id}, Type: {relationship.type}")
    
    if __name__ == "__main__":
        llm = ChatOpenAI(temperature=0, model_name="gpt-4", openai_api_key=OPENAI_API_KEY)
        extractor_instance = extractor(llm)
        test_extraction(extractor_instance)
  3. Run the script with python3 extraction-test.py and view the results.

    Extracted Nodes:
    Node ID: Marie Curie, Type: Person
    Node ID: Polish, Type: Nationality
    Node ID: French, Type: Nationality
    Node ID: Physicist, Type: Occupation
    Node ID: Chemist, Type: Occupation
    Node ID: Nobel Prize, Type: Award
    Node ID: Pierre Curie, Type: Person
    Node ID: University Of Paris, Type: Institution
    Node ID: Professor, Type: Occupation
    
    Extracted Relationships:
    Relationship: Marie Curie -> Polish, Type: HAS_NATIONALITY
    Relationship: Marie Curie -> French, Type: HAS_NATIONALITY
    Relationship: Marie Curie -> Physicist, Type: HAS_OCCUPATION
    Relationship: Marie Curie -> Chemist, Type: HAS_OCCUPATION
    Relationship: Marie Curie -> Nobel Prize, Type: RECEIVED
    Relationship: Pierre Curie -> Nobel Prize, Type: RECEIVED
    Relationship: Marie Curie -> Pierre Curie, Type: MARRIED_TO
    Relationship: Pierre Curie -> Marie Curie, Type: MARRIED_TO
    Relationship: Marie Curie -> University Of Paris, Type: WORKED_AT
    Relationship: Marie Curie -> Professor, Type: HAS_OCCUPATION

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com