贝果

Bagel (Open Inference platform for AI), is like GitHub for AI data. It is a collaborative platform where users can create, share, and manage Inference datasets. It can support private projects for independent developers, internal collaborations for enterprises, and public contributions for data DAOs.

安装与设置

pip install bagelML langchain-community

从文本创建向量存储

from langchain_community.vectorstores import Bagel

texts = ["hello bagel", "hello langchain", "I love salad", "my car", "a dog"]
# create cluster and add texts
cluster = Bagel.from_texts(cluster_name="testing", texts=texts)

API 参考：贝果

# similarity search
cluster.similarity_search("bagel", k=3)

[Document(page_content='hello bagel', metadata={}),
 Document(page_content='my car', metadata={}),
 Document(page_content='I love salad', metadata={})]

# the score is a distance metric, so lower is better
cluster.similarity_search_with_score("bagel", k=3)

[(Document(page_content='hello bagel', metadata={}), 0.27392977476119995),
 (Document(page_content='my car', metadata={}), 1.4783176183700562),
 (Document(page_content='I love salad', metadata={}), 1.5342965126037598)]

# delete the cluster
cluster.delete_cluster()

从文档创建向量存储

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)[:10]

API 参考：TextLoader | CharacterTextSplitter

# create cluster with docs
cluster = Bagel.from_documents(cluster_name="testing_with_docs", documents=docs)

# similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = cluster.similarity_search(query)
print(docs[0].page_content[:102])

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the

从集群获取所有文本/文档

texts = ["hello bagel", "this is langchain"]
cluster = Bagel.from_texts(cluster_name="testing", texts=texts)
cluster_data = cluster.get()

# all keys
cluster_data.keys()

dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])

# all values and keys
cluster_data

{'ids': ['578c6d24-3763-11ee-a8ab-b7b7b34f99ba',
  '578c6d25-3763-11ee-a8ab-b7b7b34f99ba',
  'fb2fc7d8-3762-11ee-a8ab-b7b7b34f99ba',
  'fb2fc7d9-3762-11ee-a8ab-b7b7b34f99ba',
  '6b40881a-3762-11ee-a8ab-b7b7b34f99ba',
  '6b40881b-3762-11ee-a8ab-b7b7b34f99ba',
  '581e691e-3762-11ee-a8ab-b7b7b34f99ba',
  '581e691f-3762-11ee-a8ab-b7b7b34f99ba'],
 'embeddings': None,
 'metadatas': [{}, {}, {}, {}, {}, {}, {}, {}],
 'documents': ['hello bagel',
  'this is langchain',
  'hello bagel',
  'this is langchain',
  'hello bagel',
  'this is langchain',
  'hello bagel',
  'this is langchain']}

cluster.delete_cluster()

使用元数据创建集群并使用元数据进行过滤

texts = ["hello bagel", "this is langchain"]
metadatas = [{"source": "notion"}, {"source": "google"}]

cluster = Bagel.from_texts(cluster_name="testing", texts=texts, metadatas=metadatas)
cluster.similarity_search_with_score("hello bagel", where={"source": "notion"})

[(Document(page_content='hello bagel', metadata={'source': 'notion'}), 0.0)]

# delete the cluster
cluster.delete_cluster()

向量存储概念指南
向量存储操操作指南

安装与设置​

从文本创建向量存储​

从文档创建向量存储​

从集群获取所有文本/文档​

使用元数据创建集群并使用元数据进行过滤​

相关​

安装与设置

从文本创建向量存储

从文档创建向量存储

从集群获取所有文本/文档

使用元数据创建集群并使用元数据进行过滤

相关