适用于 Apache Gremlin 的 Azure Cosmos DB
Azure Cosmos DB for Apache Gremlin is a graph database service that can be used to store massive graphs with billions of vertices and edges. You can query the graphs with millisecond latency and evolve the graph structure easily.
Gremlin is a graph traversal language and virtual machine developed by
Apache TinkerPopof theApache Software Foundation.
此笔记本展示了如何使用大型语言模型(LLMs)为图数据库提供自然语言接口,您可以使用 Gremlin 查询语言对其进行查询。
设置
安装一个库:
!pip3 install gremlinpython
你需要一个Azure CosmosDB 图数据库实例。一种选择是在Azure中创建一个免费的CosmosDB 图数据库实例。
创建 Cosmos DB 帐户和图时,请使用 /type 作为分区键。
cosmosdb_name = "mycosmosdb"
cosmosdb_db_id = "graphtesting"
cosmosdb_db_graph_id = "mygraph"
cosmosdb_access_Key = "longstring=="
import nest_asyncio
from langchain_community.chains.graph_qa.gremlin import GremlinQAChain
from langchain_community.graphs import GremlinGraph
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_openai import AzureChatOpenAI
graph = GremlinGraph(
url=f"wss://{cosmosdb_name}.gremlin.cosmos.azure.com:443/",
username=f"/dbs/{cosmosdb_db_id}/colls/{cosmosdb_db_graph_id}",
password=cosmosdb_access_Key,
)
填充数据库
假设你的数据库是空的,你可以使用GraphDocuments来填充它。
对于 Gremlin,始终为每个节点添加一个名为“label”的属性。 如果没有设置标签,则使用 Node.type 作为标签。 对于使用自然 ID 的 Cosmos,这样做是有意义的,因为它们在图浏览器中是可见的。
source_doc = Document(
page_content="Matrix is a movie where Keanu Reeves, Laurence Fishburne and Carrie-Anne Moss acted."
)
movie = Node(id="The Matrix", properties={"label": "movie", "title": "The Matrix"})
actor1 = Node(id="Keanu Reeves", properties={"label": "actor", "name": "Keanu Reeves"})
actor2 = Node(
id="Laurence Fishburne", properties={"label": "actor", "name": "Laurence Fishburne"}
)
actor3 = Node(
id="Carrie-Anne Moss", properties={"label": "actor", "name": "Carrie-Anne Moss"}
)
rel1 = Relationship(
id=5, type="ActedIn", source=actor1, target=movie, properties={"label": "ActedIn"}
)
rel2 = Relationship(
id=6, type="ActedIn", source=actor2, target=movie, properties={"label": "ActedIn"}
)
rel3 = Relationship(
id=7, type="ActedIn", source=actor3, target=movie, properties={"label": "ActedIn"}
)
rel4 = Relationship(
id=8,
type="Starring",
source=movie,
target=actor1,
properties={"label": "Strarring"},
)
rel5 = Relationship(
id=9,
type="Starring",
source=movie,
target=actor2,
properties={"label": "Strarring"},
)
rel6 = Relationship(
id=10,
type="Straring",
source=movie,
target=actor3,
properties={"label": "Strarring"},
)
graph_doc = GraphDocument(
nodes=[movie, actor1, actor2, actor3],
relationships=[rel1, rel2, rel3, rel4, rel5, rel6],
source=source_doc,
)
# The underlying python-gremlin has a problem when running in notebook
# The following line is a workaround to fix the problem
nest_asyncio.apply()
# Add the document to the CosmosDB graph.
graph.add_graph_documents([graph_doc])
刷新图谱模式信息
如果数据库的模式发生变化(在更新后),您可以刷新模式信息。
graph.refresh_schema()
print(graph.schema)
查询图谱
现在我们可以使用 Gremlin QA 链来向图提出问题。
chain = GremlinQAChain.from_llm(
AzureChatOpenAI(
temperature=0,
azure_deployment="gpt-4-turbo",
),
graph=graph,
verbose=True,
)
chain.invoke("Who played in The Matrix?")
chain.run("How many people played in The Matrix?")