Skip to main content
Open In ColabOpen on GitHub

阿里云OpenSearch

Alibaba Cloud Opensearch is a one-stop platform to develop intelligent search services. OpenSearch was built on the large-scale distributed search engine developed by Alibaba. OpenSearch serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. OpenSearch helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.

OpenSearch helps you develop high-quality, maintenance-free, and high-performance intelligent search services to provide your users with high search efficiency and accuracy.

OpenSearch provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results.

本笔记本展示了如何使用与Alibaba Cloud OpenSearch Vector Search Edition相关的功能。

设置

购买实例并配置它

阿里云 购买 OpenSearch 向量搜索版,并根据帮助 文档 配置实例。

要运行,您需要拥有一个正在运行的 OpenSearch 向量搜索版 实例。

阿里云 OpenSearch 向量存储类

AlibabaCloudOpenSearch 类支持函数:

  • add_texts
  • add_documents
  • from_texts
  • from_documents
  • similarity_search
  • asimilarity_search
  • similarity_search_by_vector
  • asimilarity_search_by_vector
  • similarity_search_with_relevance_scores
  • delete_doc_by_texts

阅读帮助文档,快速熟悉并配置 OpenSearch 向量搜索版实例。

如果您在使用过程中遇到任何问题,请随时联系 xingshaomin.xsm@alibaba-inc.com,我们将竭诚为您提供帮助和支持。

实例启动并运行后,请按照以下步骤操作:拆分文档、获取嵌入向量、连接至阿里云 OpenSearch 实例、索引文档,并执行向量检索。

我们需要先安装以下 Python 包。

%pip install --upgrade --quiet  langchain-community alibabacloud_ha3engine_vector

我们要使用 OpenAIEmbeddings,因此我们需要获取OpenAI API密钥。

import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

示例

from langchain_community.vectorstores import (
AlibabaCloudOpenSearch,
AlibabaCloudOpenSearchSettings,
)
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

拆分文档并获取嵌入向量。

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
API 参考:TextLoader

创建 OpenSearch 设置。

settings = AlibabaCloudOpenSearchSettings(
endpoint=" The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
instance_id="The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
protocol="Communication Protocol between SDK and Server, default is http.",
username="The username specified when purchasing the instance.",
password="The password specified when purchasing the instance.",
namespace="The instance data will be partitioned based on the namespace field. If the namespace is enabled, you need to specify the namespace field name during initialization. Otherwise, the queries cannot be executed correctly.",
tablename="The table name specified during instance configuration.",
embedding_field_separator="Delimiter specified for writing vector field data, default is comma.",
output_fields="Specify the field list returned when invoking OpenSearch, by default it is the value list of the field mapping field.",
field_name_mapping={
"id": "id", # The id field name mapping of index document.
"document": "document", # The text field name mapping of index document.
"embedding": "embedding", # The embedding field name mapping of index document.
"name_of_the_metadata_specified_during_search": "opensearch_metadata_field_name,=",
# The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query,
# Currently supported logical operators are: > (greater than), < (less than), = (equal to), <= (less than or equal to), >= (greater than or equal to), != (not equal to).
# Refer to this link: https://help.aliyun.com/zh/open-search/vector-search-edition/filter-expression
},
)

# for example

# settings = AlibabaCloudOpenSearchSettings(
# endpoint='ha-cn-5yd3fhdm102.public.ha.aliyuncs.com',
# instance_id='ha-cn-5yd3fhdm102',
# username='instance user name',
# password='instance password',
# table_name='test_table',
# field_name_mapping={
# "id": "id",
# "document": "document",
# "embedding": "embedding",
# "string_field": "string_filed,=",
# "int_field": "int_filed,=",
# "float_field": "float_field,=",
# "double_field": "double_field,="
#
# },
# )

通过设置创建 OpenSearch 访问实例。

# Create an opensearch instance and index docs.
opensearch = AlibabaCloudOpenSearch.from_texts(
texts=docs, embedding=embeddings, config=settings
)

or

# Create an opensearch instance.
opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)

添加文本并构建索引。

metadatas = [
{"string_field": "value1", "int_field": 1, "float_field": 1.0, "double_field": 2.0},
{"string_field": "value2", "int_field": 2, "float_field": 3.0, "double_field": 4.0},
{"string_field": "value3", "int_field": 3, "float_field": 5.0, "double_field": 6.0},
]
# the key of metadatas must match field_name_mapping in settings.
opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)

查询和检索数据。

query = "What did the president say about Ketanji Brown Jackson"
docs = opensearch.similarity_search(query)
print(docs[0].page_content)

使用元数据查询和检索数据。

query = "What did the president say about Ketanji Brown Jackson"
metadata = {
"string_field": "value1",
"int_field": 1,
"float_field": 1.0,
"double_field": 2.0,
}
docs = opensearch.similarity_search(query, filter=metadata)
print(docs[0].page_content)

如果您在使用过程中遇到任何问题,请随时联系 xingshaomin.xsm@alibaba-inc.com,我们将竭诚为您提供帮助和支持。