Knowledge Graph
Use RAGStack, LLMGraphTransformer
, and DataStax AstraDB to extract knowledge triples and store them in a vector database.
This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only. |
Prerequisites
-
An active DataStax AstraDB
-
Python 3.11 (to use
Union
andself
hints) -
OpenAI API key
Environment
-
Install dependencies:
pip install "ragstack-ai-langchain[knowledge-graph]" python-dotenv
-
Create a
.env
file and store the necessary credentials.OPENAI_API_KEY="sk-..." ASTRA_DB_DATABASE_ID="670d40c2-80f9-4cb0-8c74-d524dd6944d1" ASTRA_DB_APPLICATION_TOKEN="AstraCS:..." ASTRA_DB_KEYSPACE="default_keyspace"
If you’re running the notebook in Colab, run the cell using getpass
to set the necessary environment variables.
Create a graph store in Astra
-
Import the necessary libraries and load the variables from your
.env
files.import dotenv import cassio from ragstack_knowledge_graph.cassandra_graph_store import CassandraGraphStore from langchain_experimental.graph_transformers import LLMGraphTransformer from langchain_openai import ChatOpenAI from langchain_core.documents import Document from ragstack_knowledge_graph.render import render_graph_documents from ragstack_knowledge_graph.traverse import Node from ragstack_knowledge_graph import extract_entities from operator import itemgetter from langchain_core.runnables import RunnableLambda, RunnablePassthrough from langchain_core.prompts import ChatPromptTemplate dotenv.load_dotenv()
-
Initialize a connection to AstraDB with the Cass-IO library.
import cassio cassio.init(auto=True)
-
Create graph store.
from knowledge_graph.cassandra_graph_store import CassandraGraphStore graph_store = CassandraGraphStore()
Extract a knowledge graph from your data
-
Extract a knowledge graph with LLMGraphTransformer, and render it to Astra with GraphViz.
llm = ChatOpenAI(temperature=0, model_name="gpt-4") llm_transformer = LLMGraphTransformer(llm=llm) text = """ Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity. She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields. Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes. She was, in 1906, the first woman to become a professor at the University of Paris. """ documents = [Document(page_content=text)] graph_documents = llm_transformer.convert_to_graph_documents(documents) print(f"Nodes:{graph_documents[0].nodes}") print(f"Relationships:{graph_documents[0].relationships}")
-
Render the extracted graph to GraphViz and save the extracted graph documents to the AstraDB graph store.
render_graph_documents(graph_documents) graph_store.add_graph_documents(graph_documents)
Query the graph store
-
Query the
GraphStore
. Theas_runnable
method takes some configuration for how to extract the subgraph and returns a LangChainRunnable
. ThisRunnable
can be invoked on a node or sequence of nodes to traverse from those starting points.graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))
-
For getting started, the library also provides a
Runnable
for extracting the starting entities from a question.extract_entities(llm).invoke({ "question": "Who is Marie Curie?"})
Query Chain
Create a chain which does the following:
-
Use the entity extraction
Runnable
from the library to determine the starting points. -
Retrieve the sub-knowledge graphs starting from those nodes.
-
Create a context containing those knowledge triples.
-
Apply the LLM to answer the question given the context.
llm = ChatOpenAI(model_name = "gpt-4") def _combine_relations(relations): return "\n".join(map(repr, relations)) ANSWER_PROMPT = ( "The original question is given below." "This question has been used to retrieve information from a knowledge graph." "The matching triples are shown below." "Use the information in the triples to answer the original question.\n\n" "Original Question: {question}\n\n" "Knowledge Graph Triples:\n{context}\n\n" "Response:" ) chain = ( { "question": RunnablePassthrough() } | RunnablePassthrough.assign(entities = extract_entities(llm)) | RunnablePassthrough.assign(triples = itemgetter("entities") | graph_store.as_runnable()) | RunnablePassthrough.assign(context = itemgetter("triples") | RunnableLambda(_combine_relations)) | ChatPromptTemplate.from_messages([ANSWER_PROMPT]) | llm ) response=chain.invoke("Who is Marie Curie?") print(f"Chain Response: {response}")
-
Run the chain end-to-end to answer a question using the retrieved knowledge.
python3.11 knowledge-graph-marie-curie.py
Result:
Nodes: [Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Nationality'), Node(id='French', type='Nationality'), Node(id='Physicist', type='Profession'), Node(id='Chemist', type='Profession'), Node(id='Radioactivity', type='Scientific concept'), Node(id='Nobel Prize', type='Award'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Institution'), Node(id='Professor', type='Profession')] Relationships: [Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Physicist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Chemist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Radioactivity', type='Scientific concept'), type='RESEARCHED'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Pierre Curie', type='Person'), type='MARRIED_TO'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Institution'), type='WORKED_AT'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Professor', type='Profession'), type='IS_A')] Chain Response: content='Marie Curie was a physicist, chemist, and professor. She was of French and Polish nationality. She was married to Pierre Curie and both of them won the Nobel Prize. She worked at the University of Paris and researched radioactivity.' response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 308, 'total_tokens': 358}, 'model_name': 'gpt-4', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-79178e44-64a0-4077-8b90-f21fd004f745-0'
Complete code
Python
import dotenv
import cassio
from ragstack_knowledge_graph.cassandra_graph_store import CassandraGraphStore
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from ragstack_knowledge_graph.render import render_graph_documents
from ragstack_knowledge_graph.traverse import Node
from ragstack_knowledge_graph import extract_entities
from operator import itemgetter
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
# Load environment variables
dotenv.load_dotenv()
# Initialize cassio
cassio.init(auto=True)
# Create graph store
graph_store = CassandraGraphStore()
# Initialize LLM for graph transformer
llm = ChatOpenAI(temperature=0, model_name="gpt-4")
llm_transformer = LLMGraphTransformer(llm=llm)
# Sample text
text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]
# Convert documents to graph documents
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes: {graph_documents[0].nodes}")
print(f"Relationships: {graph_documents[0].relationships}")
# Render the extracted graph to GraphViz
render_graph_documents(graph_documents)
# Save the extracted graph documents to the AstraDB / Cassandra Graph Store
graph_store.add_graph_documents(graph_documents)
# Query the graph
graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person"))
# Example showing extracted entities (nodes)
extract_entities(llm).invoke({"question": "Who is Marie Curie?"})
# Define the answer prompt
ANSWER_PROMPT = (
"The original question is given below."
"This question has been used to retrieve information from a knowledge graph."
"The matching triples are shown below."
"Use the information in the triples to answer the original question.\n\n"
"Original Question: {question}\n\n"
"Knowledge Graph Triples:\n{context}\n\n"
"Response:"
)
# Combine relations function
def _combine_relations(relations):
return "\n".join(map(repr, relations))
# Create the chain for querying
chain = (
{"question": RunnablePassthrough()}
| RunnablePassthrough.assign(entities=extract_entities(llm))
| RunnablePassthrough.assign(triples=itemgetter("entities") | graph_store.as_runnable())
| RunnablePassthrough.assign(context=itemgetter("triples") | RunnableLambda(_combine_relations))
| ChatPromptTemplate.from_messages([ANSWER_PROMPT])
| llm
)
# Invoke the chain
response=chain.invoke("Who is Marie Curie?")
print(f"Chain Response: {response}")
Use KnowledgeSchema instead of LLMGraphTransformer
Instead of using LLMGraphTransformer
to build your graph, the Knowledge Graph library also includes a unique knowledge extraction system called KnowledgeSchema
that lets you define your nodes and relationships in a YAML file and load it to guide the graph extraction process.
Example usage
-
Copy the sample
marie_curie_schema.yaml
file from the RAGStack repo. This example assumes you copy it to the same directory as your script. -
Create a new Python script and add the following code. In this example,
KnowledgeSchema
is initialized from a YAML file, theKnowledgeSchemaExtractor
uses an LLM to extract knowledge from the source according to the YAML-defined schema, and the extracted nodes and relationships are printed.extraction-test.pyfrom os import path from langchain_community.graphs.graph_document import Node, Relationship from langchain_core.documents import Document from langchain_core.language_models import BaseChatModel from langchain_openai import ChatOpenAI OPENAI_API_KEY = "sk-..." from ragstack_knowledge_graph.extraction import ( KnowledgeSchema, KnowledgeSchemaExtractor, ) def extractor(llm: BaseChatModel) -> KnowledgeSchemaExtractor: schema = KnowledgeSchema.from_file( path.join(path.dirname(__file__), "./marie_curie_schema.yaml") ) return KnowledgeSchemaExtractor( llm=llm, schema=schema, ) MARIE_CURIE_SOURCE = """ Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity. She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields. Her husband, Pierre Curie, was a won first Nobel Prize with her, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes. She was, in 1906, the first woman to become a professor at the University of Paris. """ def test_extraction(extractor: KnowledgeSchemaExtractor): results = extractor.extract([Document(page_content=MARIE_CURIE_SOURCE)]) print("Extracted Nodes:") for node in results[0].nodes: print(f"Node ID: {node.id}, Type: {node.type}") print("\nExtracted Relationships:") for relationship in results[0].relationships: print(f"Relationship: {relationship.source.id} -> {relationship.target.id}, Type: {relationship.type}") if __name__ == "__main__": llm = ChatOpenAI(temperature=0, model_name="gpt-4", openai_api_key=OPENAI_API_KEY) extractor_instance = extractor(llm) test_extraction(extractor_instance)
-
Run the script with
python3 extraction-test.py
and view the results.Extracted Nodes: Node ID: Marie Curie, Type: Person Node ID: Polish, Type: Nationality Node ID: French, Type: Nationality Node ID: Physicist, Type: Occupation Node ID: Chemist, Type: Occupation Node ID: Nobel Prize, Type: Award Node ID: Pierre Curie, Type: Person Node ID: University Of Paris, Type: Institution Node ID: Professor, Type: Occupation Extracted Relationships: Relationship: Marie Curie -> Polish, Type: HAS_NATIONALITY Relationship: Marie Curie -> French, Type: HAS_NATIONALITY Relationship: Marie Curie -> Physicist, Type: HAS_OCCUPATION Relationship: Marie Curie -> Chemist, Type: HAS_OCCUPATION Relationship: Marie Curie -> Nobel Prize, Type: RECEIVED Relationship: Pierre Curie -> Nobel Prize, Type: RECEIVED Relationship: Marie Curie -> Pierre Curie, Type: MARRIED_TO Relationship: Pierre Curie -> Marie Curie, Type: MARRIED_TO Relationship: Marie Curie -> University Of Paris, Type: WORKED_AT Relationship: Marie Curie -> Professor, Type: HAS_OCCUPATION