Skip to main content
Open In ColabOpen on GitHub

百度云 ElasticSearch 向量搜索

Baidu Cloud VectorSearch is a fully managed, enterprise-level distributed search and analysis service which is 100% compatible to open source. Baidu Cloud VectorSearch provides low-cost, high-performance, and reliable retrieval and analysis platform level product services for structured/unstructured data. As a vector database , it supports multiple index types and similarity distance methods.

Baidu Cloud ElasticSearch provides a privilege management mechanism, for you to configure the cluster privileges freely, so as to further ensure data security.

本笔记本展示了如何使用与Baidu Cloud ElasticSearch VectorStore相关的功能。 要运行此笔记本,您需要拥有一个正在运行的百度云 ElasticSearch实例:

阅读帮助文档以快速熟悉并配置百度云 ElasticSearch 实例。

实例启动并运行后,请按照以下步骤操作:拆分文档、获取嵌入向量、连接百度云 Elasticsearch 实例、索引文档以及执行向量检索。

我们需要先安装以下 Python 包。

%pip install --upgrade --quiet langchain-community elasticsearch == 7.11.0

首先,我们需要使用 QianfanEmbeddings,因此必须获取千帆的 AK 和 SK。有关千帆的详细信息请参阅 百度千帆工作坊

import getpass
import os

if "QIANFAN_AK" not in os.environ:
os.environ["QIANFAN_AK"] = getpass.getpass("Your Qianfan AK:")
if "QIANFAN_SK" not in os.environ:
os.environ["QIANFAN_SK"] = getpass.getpass("Your Qianfan SK:")

其次,拆分文档并获取嵌入向量。

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

from langchain_community.embeddings import QianfanEmbeddingsEndpoint

embeddings = QianfanEmbeddingsEndpoint()

然后,创建一个可访问的百度 Elasticsearch 实例。

# Create a bes instance and index docs.
from langchain_community.vectorstores import BESVectorStore

bes = BESVectorStore.from_documents(
documents=docs,
embedding=embeddings,
bes_url="your bes cluster url",
index_name="your vector index",
)
bes.client.indices.refresh(index="your vector index")
API 参考:BESVectorStore

最后,查询并检索数据

query = "What did the president say about Ketanji Brown Jackson"
docs = bes.similarity_search(query)
print(docs[0].page_content)

如果您在使用过程中遇到任何问题,请随时联系 liuboyao@baidu.comchenweixu01@baidu.com,我们将竭诚为您提供支持。