Knowledge Graph

Run this on Colab

Use RAGStack, LLMGraphTransformer, and DataStax AstraDB to extract knowledge triples and store them in a vector database.

This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only.

Prerequisites

An active DataStax AstraDB
Python 3.11 (to use Union and self hints)
OpenAI API key

Environment

Install dependencies:

pip install "ragstack-ai-langchain[knowledge-graph]" python-dotenv

Create a .env file and store the necessary credentials.

OPENAI_API_KEY="sk-..."
ASTRA_DB_DATABASE_ID="670d40c2-80f9-4cb0-8c74-d524dd6944d1"
ASTRA_DB_APPLICATION_TOKEN="AstraCS:..."
ASTRA_DB_KEYSPACE="default_keyspace"

If you’re running the notebook in Colab, run the cell using getpass to set the necessary environment variables.

Create a graph store in Astra

Import the necessary libraries and load the variables from your .env files.

import dotenv
import cassio
from ragstack_knowledge_graph.cassandra_graph_store import CassandraGraphStore
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from ragstack_knowledge_graph.render import render_graph_documents
from ragstack_knowledge_graph.traverse import Node
from ragstack_knowledge_graph import extract_entities
from operator import itemgetter
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

dotenv.load_dotenv()

Initialize a connection to AstraDB with the Cass-IO library.
```
import cassio
cassio.init(auto=True)
```

Create graph store.

from knowledge_graph.cassandra_graph_store import CassandraGraphStore
graph_store = CassandraGraphStore()

Extract a knowledge graph from your data

Extract a knowledge graph with LLMGraphTransformer, and render it to Astra with GraphViz.

llm = ChatOpenAI(temperature=0, model_name="gpt-4")

llm_transformer = LLMGraphTransformer(llm=llm)

text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

Render the extracted graph to GraphViz and save the extracted graph documents to the AstraDB graph store.
```
render_graph_documents(graph_documents)
graph_store.add_graph_documents(graph_documents)
```

Query the graph store

Query the GraphStore. The as_runnable method takes some configuration for how to extract the subgraph and returns a LangChain Runnable. This Runnable can be invoked on a node or sequence of nodes to traverse from those starting points.
```
graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))
```
For getting started, the library also provides a Runnable for extracting the starting entities from a question.
```
extract_entities(llm).invoke({ "question": "Who is Marie Curie?"})
```

Query Chain

Create a chain which does the following:

Use the entity extraction Runnable from the library to determine the starting points.
Retrieve the sub-knowledge graphs starting from those nodes.
Create a context containing those knowledge triples.

Apply the LLM to answer the question given the context.

llm = ChatOpenAI(model_name = "gpt-4")

def _combine_relations(relations):
    return "\n".join(map(repr, relations))

ANSWER_PROMPT = (
    "The original question is given below."
    "This question has been used to retrieve information from a knowledge graph."
    "The matching triples are shown below."
    "Use the information in the triples to answer the original question.\n\n"
    "Original Question: {question}\n\n"
    "Knowledge Graph Triples:\n{context}\n\n"
    "Response:"
)

chain = (
    { "question": RunnablePassthrough() }
    | RunnablePassthrough.assign(entities = extract_entities(llm))
    | RunnablePassthrough.assign(triples = itemgetter("entities") | graph_store.as_runnable())
    | RunnablePassthrough.assign(context = itemgetter("triples") | RunnableLambda(_combine_relations))
    | ChatPromptTemplate.from_messages([ANSWER_PROMPT])
    | llm
)

response=chain.invoke("Who is Marie Curie?")
print(f"Chain Response: {response}")

Run the chain end-to-end to answer a question using the retrieved knowledge.

python3.11 knowledge-graph-marie-curie.py

Result:

Nodes: [Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Nationality'), Node(id='French', type='Nationality'), Node(id='Physicist', type='Profession'), Node(id='Chemist', type='Profession'), Node(id='Radioactivity', type='Scientific concept'), Node(id='Nobel Prize', type='Award'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Institution'), Node(id='Professor', type='Profession')]
Relationships: [Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Physicist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Chemist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Radioactivity', type='Scientific concept'), type='RESEARCHED'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Pierre Curie', type='Person'), type='MARRIED_TO'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Institution'), type='WORKED_AT'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Professor', type='Profession'), type='IS_A')]
Chain Response: content='Marie Curie was a physicist, chemist, and professor. She was of French and Polish nationality. She was married to Pierre Curie and both of them won the Nobel Prize. She worked at the University of Paris and researched radioactivity.' response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 308, 'total_tokens': 358}, 'model_name': 'gpt-4', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-79178e44-64a0-4077-8b90-f21fd004f745-0'

Complete code

Python

import dotenv
import cassio
from ragstack_knowledge_graph.cassandra_graph_store import CassandraGraphStore
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from ragstack_knowledge_graph.render import render_graph_documents
from ragstack_knowledge_graph.traverse import Node
from ragstack_knowledge_graph import extract_entities
from operator import itemgetter
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

# Load environment variables
dotenv.load_dotenv()

# Initialize cassio
cassio.init(auto=True)

# Create graph store
graph_store = CassandraGraphStore()

# Initialize LLM for graph transformer
llm = ChatOpenAI(temperature=0, model_name="gpt-4")
llm_transformer = LLMGraphTransformer(llm=llm)

# Sample text
text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]

# Convert documents to graph documents
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes: {graph_documents[0].nodes}")
print(f"Relationships: {graph_documents[0].relationships}")

# Render the extracted graph to GraphViz
render_graph_documents(graph_documents)

# Save the extracted graph documents to the AstraDB / Cassandra Graph Store
graph_store.add_graph_documents(graph_documents)

# Query the graph
graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))

# Example showing extracted entities (nodes)
extract_entities(llm).invoke({"question": "Who is Marie Curie?"})

# Define the answer prompt
ANSWER_PROMPT = (
    "The original question is given below."
    "This question has been used to retrieve information from a knowledge graph."
    "The matching triples are shown below."
    "Use the information in the triples to answer the original question.\n\n"
    "Original Question: {question}\n\n"
    "Knowledge Graph Triples:\n{context}\n\n"
    "Response:"
)

# Combine relations function
def _combine_relations(relations):
    return "\n".join(map(repr, relations))

# Create the chain for querying
chain = (
    {"question": RunnablePassthrough()}
    | RunnablePassthrough.assign(entities=extract_entities(llm))
    | RunnablePassthrough.assign(triples=itemgetter("entities") | graph_store.as_runnable())
    | RunnablePassthrough.assign(context=itemgetter("triples") | RunnableLambda(_combine_relations))
    | ChatPromptTemplate.from_messages([ANSWER_PROMPT])
    | llm
)

# Invoke the chain
response=chain.invoke("Who is Marie Curie?")
print(f"Chain Response: {response}")

Use KnowledgeSchema instead of LLMGraphTransformer

Instead of using LLMGraphTransformer to build your graph, the Knowledge Graph library also includes a unique knowledge extraction system called KnowledgeSchema that lets you define your nodes and relationships in a YAML file and load it to guide the graph extraction process.

Example usage

Copy the sample marie_curie_schema.yaml file from the RAGStack repo. This example assumes you copy it to the same directory as your script.

Create a new Python script and add the following code. In this example, KnowledgeSchema is initialized from a YAML file, the KnowledgeSchemaExtractor uses an LLM to extract knowledge from the source according to the YAML-defined schema, and the extracted nodes and relationships are printed.

extraction-test.py

from os import path

from langchain_community.graphs.graph_document import Node, Relationship
from langchain_core.documents import Document
from langchain_core.language_models import BaseChatModel
from langchain_openai import ChatOpenAI

OPENAI_API_KEY = "sk-..."

from ragstack_knowledge_graph.extraction import (
    KnowledgeSchema,
    KnowledgeSchemaExtractor,
)

def extractor(llm: BaseChatModel) -> KnowledgeSchemaExtractor:
    schema = KnowledgeSchema.from_file(
        path.join(path.dirname(__file__), "./marie_curie_schema.yaml")
    )
    return KnowledgeSchemaExtractor(
        llm=llm,
        schema=schema,
    )

MARIE_CURIE_SOURCE = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who
conducted pioneering research on radioactivity. She was the first woman to win a
Nobel Prize, the first person to win a Nobel Prize twice, and the only person to
win a Nobel Prize in two scientific fields. Her husband, Pierre Curie, was a
won first Nobel Prize with her, making them the first-ever married couple to
win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of
Paris.
"""

def test_extraction(extractor: KnowledgeSchemaExtractor):
    results = extractor.extract([Document(page_content=MARIE_CURIE_SOURCE)])

    print("Extracted Nodes:")
    for node in results[0].nodes:
        print(f"Node ID: {node.id}, Type: {node.type}")

    print("\nExtracted Relationships:")
    for relationship in results[0].relationships:
        print(f"Relationship: {relationship.source.id} -> {relationship.target.id}, Type: {relationship.type}")

if __name__ == "__main__":
    llm = ChatOpenAI(temperature=0, model_name="gpt-4", openai_api_key=OPENAI_API_KEY)
    extractor_instance = extractor(llm)
    test_extraction(extractor_instance)

Run the script with python3 extraction-test.py and view the results.

Extracted Nodes:
Node ID: Marie Curie, Type: Person
Node ID: Polish, Type: Nationality
Node ID: French, Type: Nationality
Node ID: Physicist, Type: Occupation
Node ID: Chemist, Type: Occupation
Node ID: Nobel Prize, Type: Award
Node ID: Pierre Curie, Type: Person
Node ID: University Of Paris, Type: Institution
Node ID: Professor, Type: Occupation

Extracted Relationships:
Relationship: Marie Curie -> Polish, Type: HAS_NATIONALITY
Relationship: Marie Curie -> French, Type: HAS_NATIONALITY
Relationship: Marie Curie -> Physicist, Type: HAS_OCCUPATION
Relationship: Marie Curie -> Chemist, Type: HAS_OCCUPATION
Relationship: Marie Curie -> Nobel Prize, Type: RECEIVED
Relationship: Pierre Curie -> Nobel Prize, Type: RECEIVED
Relationship: Marie Curie -> Pierre Curie, Type: MARRIED_TO
Relationship: Pierre Curie -> Marie Curie, Type: MARRIED_TO
Relationship: Marie Curie -> University Of Paris, Type: WORKED_AT
Relationship: Marie Curie -> Professor, Type: HAS_OCCUPATION