库祖
Kùzu is an embeddable, scalable, extremely fast graph database. It is permissively licensed with an MIT license, and you can see its source code here.
Key characteristics of Kùzu:
- Performance and scalability: Implements modern, state-of-the-art join algorithms for graphs.
- Usability: Very easy to set up and get started with, as there are no servers (embedded architecture).
- Interoperability: Can conveniently scan and copy data from external columnar formats, CSV, JSON and relational databases.
- Structured property graph model: Implements the property graph model, with added structure.
- Cypher support: Allows convenient querying of the graph in Cypher, a declarative query language.
Get started with Kùzu by visiting their documentation.
设置
Kùzu 是一个嵌入式数据库(它在进程中运行),因此无需管理服务器。安装以下依赖项即可开始使用:
pip install -U langchain-kuzu langchain-openai langchain-experimental
这将安装 Kùzu 以及与之集成的 LangChain,同时还会安装 OpenAI 的 Python 包,以便我们可以使用 OpenAI 的大型语言模型(LLMs)。如果你想使用其他大型语言模型提供商的服务,可以安装它们各自对应的、与 LangChain 集成的 Python 包。
以下是您如何首先在本地计算机上创建 Kùzu 数据库并连接到它的方法:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
创建 KuzuGraph
Kùzu与LangChain的集成使得从非结构化文本创建和更新图变得方便,并且还可以通过利用LangChain的LLM链功能的Text2Cypher管道查询图。首先,我们创建一个KuzuGraph对象,该对象使用上面创建的数据库对象以及KuzuGraph构造函数。
from langchain_kuzu.graphs.kuzu_graph import KuzuGraph
graph = KuzuGraph(db, allow_dangerous_requests=True)
假设我们要将以下文本转换为图:
text = "Tim Cook is the CEO of Apple. Apple has its headquarters in California."
我们将使用 LLMGraphTransformer 来利用大型语言模型(LLM)从文本中提取节点和关系。
为了使图谱更有用,我们将定义以下模式,使得大型语言模型只会
提取与模式匹配的节点和关系。
# Define schema
allowed_nodes = ["Person", "Company", "Location"]
allowed_relationships = [
("Person", "IS_CEO_OF", "Company"),
("Company", "HAS_HEADQUARTERS_IN", "Location"),
]
LLMGraphTransformer 类提供了一种方便的方法,可以将文本转换为图文档列表。
from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
# Define the LLMGraphTransformer
llm_transformer = LLMGraphTransformer(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=OPENAI_API_KEY), # noqa: F821
allowed_nodes=allowed_nodes,
allowed_relationships=allowed_relationships,
)
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
graph_documents[:2]
[GraphDocument(nodes=[Node(id='Tim Cook', type='Person', properties={}), Node(id='Apple', type='Company', properties={}), Node(id='California', type='Location', properties={})], relationships=[Relationship(source=Node(id='Tim Cook', type='Person', properties={}), target=Node(id='Apple', type='Company', properties={}), type='IS_CEO_OF', properties={}), Relationship(source=Node(id='Apple', type='Company', properties={}), target=Node(id='California', type='Location', properties={}), type='HAS_HEADQUARTERS_IN', properties={})], source=Document(metadata={}, page_content='Tim Cook is the CEO of Apple. Apple has its headquarters in California.'))]
然后我们可以调用上述定义的 KuzuGraph 对象的 add_graph_documents 方法,将图文档导入 Kùzu 数据库。
include_source 参数设置为 True,以便我们还创建每个实体节点与其来源文档之间的关系。
# Add the graph document to the graph
graph.add_graph_documents(
graph_documents,
include_source=True,
)
创建 KuzuQAChain
要通过Text2Cypher管道查询图,我们可以定义一个KuzuQAChain对象。然后,我们可以通过连接到上面定义的test_db目录中存储的现有数据库来调用链并执行查询。
from langchain_kuzu.chains.graph_qa.kuzu import KuzuQAChain
# Create the KuzuQAChain with verbosity enabled to see the generated Cypher queries
chain = KuzuQAChain.from_llm(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.3, api_key=OPENAI_API_KEY), # noqa: F821
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)
请注意,我们将温度设置为略高于零,以避免大型语言模型在回复时过于简短。
让我们使用问答链提出一些问题。
chain.invoke("Who is the CEO of Apple?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:IS_CEO_OF]->(c:Company {id: 'Apple'}) RETURN p[0m
Full Context:
[32;1m[1;3m[{'p': {'_id': {'offset': 0, 'table': 1}, '_label': 'Person', 'id': 'Tim Cook', 'type': 'entity'}}][0m
[1m> Finished chain.[0m
{'query': 'Who is the CEO of Apple?',
'result': 'Tim Cook is the CEO of Apple.'}
chain.invoke("Where is Apple headquartered?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (c:Company {id: 'Apple'})-[:HAS_HEADQUARTERS_IN]->(l:Location) RETURN l[0m
Full Context:
[32;1m[1;3m[{'l': {'_id': {'offset': 0, 'table': 2}, '_label': 'Location', 'id': 'California', 'type': 'entity'}}][0m
[1m> Finished chain.[0m
{'query': 'Where is Apple headquartered?',
'result': 'Apple is headquartered in California.'}
刷新图表模式
如果您更改或更新图谱,您可以检查Text2Cypher链用来生成Cypher语句时使用的刷新后的模式信息。
您不需要每次手动调用refresh_schema(),因为它在调用链时会自动被调用。
graph.refresh_schema()
print(graph.get_schema)
Node properties: [{'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Person'}, {'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Location'}, {'properties': [('id', 'STRING'), ('text', 'STRING'), ('type', 'STRING')], 'label': 'Chunk'}, {'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Company'}]
Relationships properties: [{'properties': [], 'label': 'HAS_HEADQUARTERS_IN'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Person'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Location'}, {'properties': [], 'label': 'IS_CEO_OF'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Company'}]
Relationships: ['(:Company)-[:HAS_HEADQUARTERS_IN]->(:Location)', '(:Chunk)-[:MENTIONS_Chunk_Person]->(:Person)', '(:Chunk)-[:MENTIONS_Chunk_Location]->(:Location)', '(:Person)-[:IS_CEO_OF]->(:Company)', '(:Chunk)-[:MENTIONS_Chunk_Company]->(:Company)']
为Cypher和答案生成使用独立的LLM
您可以分别指定 cypher_llm 和 qa_llm 来使用不同的大型语言模型(LLM)进行 Cypher 生成和答案生成。
chain = KuzuQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-4"),
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)
chain.invoke("Who is the CEO of Apple?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:IS_CEO_OF]->(c:Company {id: 'Apple'}) RETURN p.id, p.type[0m
Full Context:
[32;1m[1;3m[{'p.id': 'Tim Cook', 'p.type': 'entity'}][0m
[1m> Finished chain.[0m
{'query': 'Who is the CEO of Apple?',
'result': 'Tim Cook is the CEO of Apple.'}