YDB
YDB is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.
本笔记本展示了如何使用与 YDB 向量存储相关的功能。
设置
首先,使用 Docker 设置本地 YDB:
! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk
您需要安装 langchain-ydb 才能使用此集成
! pip install -qU langchain-ydb
凭据
此笔记本无需凭证,只需确保已按上述说明安装了相关包。
如果您希望获得一流的模型调用自动追踪功能,还可以通过取消注释以下代码来设置您的 LangSmith API 密钥:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
初始化
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings
settings = YDBSettings(
table="ydb_example",
strategy=YDBSearchStrategy.COSINE_SIMILARITY,
)
vector_store = YDB(embeddings, config=settings)
管理向量存储
创建向量存储后,您可以通过添加和删除不同的项目来与之交互。
将项目添加到向量存储
准备要处理的文档:
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
)
document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
)
document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
)
document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
)
document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
)
document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
)
document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
)
document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
)
document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
)
document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
你可以通过使用 add_documents 函数将项目添加到你的向量存储中。
vector_store.add_documents(documents=documents, ids=uuids)
Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]
['947be6aa-d489-44c5-910e-62e4d58d2ffb',
'7a62904d-9db3-412b-83b6-f01b34dd7de3',
'e5a49c64-c985-4ed7-ac58-5ffa31ade699',
'99cf4104-36ab-4bd5-b0da-e210d260e512',
'5810bcd0-b46e-443e-a663-e888c9e028d1',
'190c193d-844e-4dbb-9a4b-b8f5f16cfae6',
'f8912944-f80a-4178-954e-4595bf59e341',
'34fc7b09-6000-42c9-95f7-7d49f430b904',
'0f6b6783-f300-4a4d-bb04-8025c4dfd409',
'46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']
从向量存储中删除项目
你可以使用 delete 函数通过ID从向量存储中删除项目。
vector_store.delete(ids=[uuids[-1]])
True
查询向量存储
一旦创建了向量存储并添加了相关文档,您可能就希望在链或智能体的执行过程中对其进行查询。
直接查询
相似性搜索
可以按如下方式执行一个简单的相似性搜索:
results = vector_store.similarity_search(
"LangChain provides abstractions to make working with LLMs easy", k=2
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]
带分数的相似性搜索
你也可以使用分数进行搜索:
results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=3)
for res, score in results:
print(f"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]")
* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]
* [SIM=0.212] I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
过滤
你可以使用如下所述的过滤器进行搜索:
results = vector_store.similarity_search_with_score(
"What did I eat for breakfast?",
k=4,
filter={"source": "tweet"},
)
for res, _ in results:
print(f"* {res.page_content} [{res.metadata}]")
* I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]
通过转换为检索器进行查询
您还可以将向量存储转换为检索器,以便在链中更轻松地使用。
以下是将你的向量存储转换为检索器,然后使用简单查询和过滤器调用该检索器的方法。
retriever = vector_store.as_retriever(
search_kwargs={"k": 2},
)
results = retriever.invoke(
"Stealing from the bank is a crime", filter={"source": "news"}
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]
* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]
检索增强生成的用法
有关如何使用此向量存储进行检索增强生成 (RAG) 的指南,请参阅以下部分:
API 参考
有关所有 YDB 功能和配置的详细文档,请访问API参考:https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html