Skip to main content
Open In ColabOpen on GitHub

库祖

Kùzu is an embeddable, scalable, extremely fast graph database. It is permissively licensed with an MIT license, and you can see its source code here.

Key characteristics of Kùzu:

  • Performance and scalability: Implements modern, state-of-the-art join algorithms for graphs.
  • Usability: Very easy to set up and get started with, as there are no servers (embedded architecture).
  • Interoperability: Can conveniently scan and copy data from external columnar formats, CSV, JSON and relational databases.
  • Structured property graph model: Implements the property graph model, with added structure.
  • Cypher support: Allows convenient querying of the graph in Cypher, a declarative query language.

Get started with Kùzu by visiting their documentation.

设置

Kùzu 是一个嵌入式数据库(它在进程中运行),因此无需管理服务器。安装以下依赖项即可开始使用:

pip install -U langchain-kuzu langchain-openai langchain-experimental

这将安装 Kùzu 以及与之集成的 LangChain,同时还会安装 OpenAI 的 Python 包,以便我们可以使用 OpenAI 的大型语言模型(LLMs)。如果你想使用其他大型语言模型提供商的服务,可以安装它们各自对应的、与 LangChain 集成的 Python 包。

以下是您如何首先在本地计算机上创建 Kùzu 数据库并连接到它的方法:

import kuzu

db = kuzu.Database("test_db")
conn = kuzu.Connection(db)

创建 KuzuGraph

Kùzu与LangChain的集成使得从非结构化文本创建和更新图变得方便,并且还可以通过利用LangChain的LLM链功能的Text2Cypher管道查询图。首先,我们创建一个KuzuGraph对象,该对象使用上面创建的数据库对象以及KuzuGraph构造函数。

from langchain_kuzu.graphs.kuzu_graph import KuzuGraph

graph = KuzuGraph(db, allow_dangerous_requests=True)

假设我们要将以下文本转换为图:

text = "Tim Cook is the CEO of Apple. Apple has its headquarters in California."

我们将使用 LLMGraphTransformer 来利用大型语言模型(LLM)从文本中提取节点和关系。 为了使图谱更有用,我们将定义以下模式,使得大型语言模型只会 提取与模式匹配的节点和关系。

# Define schema
allowed_nodes = ["Person", "Company", "Location"]
allowed_relationships = [
("Person", "IS_CEO_OF", "Company"),
("Company", "HAS_HEADQUARTERS_IN", "Location"),
]

LLMGraphTransformer 类提供了一种方便的方法,可以将文本转换为图文档列表。

from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI

# Define the LLMGraphTransformer
llm_transformer = LLMGraphTransformer(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=OPENAI_API_KEY), # noqa: F821
allowed_nodes=allowed_nodes,
allowed_relationships=allowed_relationships,
)

documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
graph_documents[:2]
[GraphDocument(nodes=[Node(id='Tim Cook', type='Person', properties={}), Node(id='Apple', type='Company', properties={}), Node(id='California', type='Location', properties={})], relationships=[Relationship(source=Node(id='Tim Cook', type='Person', properties={}), target=Node(id='Apple', type='Company', properties={}), type='IS_CEO_OF', properties={}), Relationship(source=Node(id='Apple', type='Company', properties={}), target=Node(id='California', type='Location', properties={}), type='HAS_HEADQUARTERS_IN', properties={})], source=Document(metadata={}, page_content='Tim Cook is the CEO of Apple. Apple has its headquarters in California.'))]

然后我们可以调用上述定义的 KuzuGraph 对象的 add_graph_documents 方法,将图文档导入 Kùzu 数据库。 include_source 参数设置为 True,以便我们还创建每个实体节点与其来源文档之间的关系。

# Add the graph document to the graph
graph.add_graph_documents(
graph_documents,
include_source=True,
)

创建 KuzuQAChain

要通过Text2Cypher管道查询图,我们可以定义一个KuzuQAChain对象。然后,我们可以通过连接到上面定义的test_db目录中存储的现有数据库来调用链并执行查询。

from langchain_kuzu.chains.graph_qa.kuzu import KuzuQAChain

# Create the KuzuQAChain with verbosity enabled to see the generated Cypher queries
chain = KuzuQAChain.from_llm(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.3, api_key=OPENAI_API_KEY), # noqa: F821
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)

请注意,我们将温度设置为略高于零,以避免大型语言模型在回复时过于简短。

让我们使用问答链提出一些问题。

chain.invoke("Who is the CEO of Apple?")


> Entering new KuzuQAChain chain...
Generated Cypher:
MATCH (p:Person)-[:IS_CEO_OF]->(c:Company {id: 'Apple'}) RETURN p
Full Context:
[{'p': {'_id': {'offset': 0, 'table': 1}, '_label': 'Person', 'id': 'Tim Cook', 'type': 'entity'}}]

> Finished chain.
{'query': 'Who is the CEO of Apple?',
'result': 'Tim Cook is the CEO of Apple.'}
chain.invoke("Where is Apple headquartered?")


> Entering new KuzuQAChain chain...
Generated Cypher:
MATCH (c:Company {id: 'Apple'})-[:HAS_HEADQUARTERS_IN]->(l:Location) RETURN l
Full Context:
[{'l': {'_id': {'offset': 0, 'table': 2}, '_label': 'Location', 'id': 'California', 'type': 'entity'}}]

> Finished chain.
{'query': 'Where is Apple headquartered?',
'result': 'Apple is headquartered in California.'}

刷新图表模式

如果您更改或更新图谱,您可以检查Text2Cypher链用来生成Cypher语句时使用的刷新后的模式信息。 您不需要每次手动调用refresh_schema(),因为它在调用链时会自动被调用。

graph.refresh_schema()

print(graph.get_schema)
Node properties: [{'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Person'}, {'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Location'}, {'properties': [('id', 'STRING'), ('text', 'STRING'), ('type', 'STRING')], 'label': 'Chunk'}, {'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Company'}]
Relationships properties: [{'properties': [], 'label': 'HAS_HEADQUARTERS_IN'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Person'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Location'}, {'properties': [], 'label': 'IS_CEO_OF'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Company'}]
Relationships: ['(:Company)-[:HAS_HEADQUARTERS_IN]->(:Location)', '(:Chunk)-[:MENTIONS_Chunk_Person]->(:Person)', '(:Chunk)-[:MENTIONS_Chunk_Location]->(:Location)', '(:Person)-[:IS_CEO_OF]->(:Company)', '(:Chunk)-[:MENTIONS_Chunk_Company]->(:Company)']

为Cypher和答案生成使用独立的LLM

您可以分别指定 cypher_llmqa_llm 来使用不同的大型语言模型(LLM)进行 Cypher 生成和答案生成。

chain = KuzuQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-4"),
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)
chain.invoke("Who is the CEO of Apple?")


> Entering new KuzuQAChain chain...
Generated Cypher:
MATCH (p:Person)-[:IS_CEO_OF]->(c:Company {id: 'Apple'}) RETURN p.id, p.type
Full Context:
[{'p.id': 'Tim Cook', 'p.type': 'entity'}]

> Finished chain.
{'query': 'Who is the CEO of Apple?',
'result': 'Tim Cook is the CEO of Apple.'}