Hippo
Transwarp Hippo is an enterprise-level cloud-native distributed vector database that supports storage, retrieval, and management of massive vector-based datasets. It efficiently solves problems such as vector similarity search and high-density vector clustering.
Hippofeatures high availability, high performance, and easy scalability. It has many functions, such as multiple vector search indexes, data partitioning and sharding, data persistence, incremental data ingestion, vector scalar field filtering, and mixed queries. It can effectively meet the high real-time search demands of enterprises for massive vector data
入门指南
唯一的先决条件是来自 OpenAI 网站的 API 密钥。确保您已经启动了一个 Hippo 实例。
安装依赖
首先,我们需要安装一些依赖项,例如 OpenAI、Langchain 和 Hippo-API。请注意,您应该根据您的环境安装合适的版本。
%pip install --upgrade --quiet langchain langchain_community tiktoken langchain-openai
%pip install --upgrade --quiet hippo-api==1.1.0.rc3
Requirement already satisfied: hippo-api==1.1.0.rc3 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (1.1.0rc3)
Requirement already satisfied: pyyaml>=6.0 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (from hippo-api==1.1.0.rc3) (6.0.1)
注意:Python版本需要 >=3.8。
最佳实践
导入依赖包
import os
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.hippo import Hippo
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
加载知识文档
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI KEY"
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
分割知识文档
在这里,我们使用 Langchain 的 CharacterTextSplitter 进行文本分割。分隔符是句号。分割后,每个文本片段不超过 1000 个字符,且重复字符数为 0。
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
声明嵌入模型
下面,我们使用 Langchain 中的 OpenAIEmbeddings 方法创建 OpenAI 或 Azure 嵌入模型。
# openai
embeddings = OpenAIEmbeddings()
# azure
# embeddings = OpenAIEmbeddings(
# openai_api_type="azure",
# openai_api_base="x x x",
# openai_api_version="x x x",
# model="x x x",
# deployment="x x x",
# openai_api_key="x x x"
# )
声明Hippo客户端
HIPPO_CONNECTION = {"host": "IP", "port": "PORT"}
存储文档
print("input...")
# insert docs
vector_store = Hippo.from_documents(
docs,
embedding=embeddings,
table_name="langchain_test",
connection_args=HIPPO_CONNECTION,
)
print("success")
input...
success
基于知识的问答执行
创建大型语言问答模型
下面,我们分别使用 Langchain 中的 AzureChatOpenAI 和 ChatOpenAI 方法创建 OpenAI 或 Azure 的大型语言问答模型。
# llm = AzureChatOpenAI(
# openai_api_base="x x x",
# openai_api_version="xxx",
# deployment_name="xxx",
# openai_api_key="xxx",
# openai_api_type="azure"
# )
llm = ChatOpenAI(openai_api_key="YOUR OPENAI KEY", model_name="gpt-3.5-turbo-16k")
根据问题获取相关知识:
query = "Please introduce COVID-19"
# query = "Please introduce Hippo Core Architecture"
# query = "What operations does the Hippo Vector Database support for vector data?"
# query = "Does Hippo use hardware acceleration technology? Briefly introduce hardware acceleration technology."
# Retrieve similar content from the knowledge base,fetch the top two most similar texts.
res = vector_store.similarity_search(query, 2)
content_list = [item.page_content for item in res]
text = "".join(content_list)
构建提示模板
prompt = f"""
Please use the content of the following [Article] to answer my question. If you don't know, please say you don't know, and the answer should be concise."
[Article]:{text}
Please answer this question in conjunction with the above article:{query}
"""
等待大型语言模型生成答案
response_with_hippo = llm.predict(prompt)
print(f"response_with_hippo:{response_with_hippo}")
response = llm.predict(query)
print("==========================================")
print(f"response_without_hippo:{response}")
response_with_hippo:COVID-19 is a virus that has impacted every aspect of our lives for over two years. It is a highly contagious and mutates easily, requiring us to remain vigilant in combating its spread. However, due to progress made and the resilience of individuals, we are now able to move forward safely and return to more normal routines.
==========================================
response_without_hippo:COVID-19 is a contagious respiratory illness caused by the novel coronavirus SARS-CoV-2. It was first identified in December 2019 in Wuhan, China and has since spread globally, leading to a pandemic. The virus primarily spreads through respiratory droplets when an infected person coughs, sneezes, talks, or breathes, and can also spread by touching contaminated surfaces and then touching the face. COVID-19 symptoms include fever, cough, shortness of breath, fatigue, muscle or body aches, sore throat, loss of taste or smell, headache, and in severe cases, pneumonia and organ failure. While most people experience mild to moderate symptoms, it can lead to severe illness and even death, particularly among older adults and those with underlying health conditions. To combat the spread of the virus, various preventive measures have been implemented globally, including social distancing, wearing face masks, practicing good hand hygiene, and vaccination efforts.