TileDB
TileDB is a powerful engine for indexing and querying dense and sparse multi-dimensional arrays.
TileDB offers ANN search capabilities using the TileDB-Vector-Search module. It provides serverless execution of ANN queries and storage of vector indexes both on local disk and cloud object stores (i.e. AWS S3).
更多详情请见:
本笔记本展示了如何使用 TileDB 向量数据库。
%pip install --upgrade --quiet tiledb-vector-search langchain-community
基础示例
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import TileDB
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter
raw_documents = TextLoader("../../how_to/state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)
db = TileDB.from_documents(
documents, embeddings, index_uri="/tmp/tiledb_index", index_type="FLAT"
)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
docs[0].page_content
通过向量的相似性搜索
embedding_vector = embeddings.embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
docs[0].page_content
带分数的相似性搜索
docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores[0]
最大边际相关性搜索 (MMR)
除了在检索器对象中使用相似性搜索外,您还可以将 mmr 用作检索器。
retriever = db.as_retriever(search_type="mmr")
retriever.invoke(query)
或者直接使用 max_marginal_relevance_search:
db.max_marginal_relevance_search(query, k=2, fetch_k=10)