UpTrain
UpTrain [github || website || docs] is an open-source platform to evaluate and improve LLM applications. It provides grades for 20+ preconfigured checks (covering language, code, embedding use cases), performs root cause analyses on instances of failure cases and provides guidance for resolving them.
UpTrain 回调处理器
此笔记本展示了 UpTrain 回调处理程序无缝集成到您的管道中,促进各种评估。我们选择了一些我们认为适合评估链的评估。这些评估会自动运行,并在输出中显示结果。有关 UpTrain 评估的更多详细信息,请参阅 此处。
Langchain中选定的检索器被突出显示以供演示:
1. 基础检索增强生成(Vanilla RAG):
RAG在检索上下文和生成响应中扮演着至关重要的角色。为了确保其性能和响应质量,我们进行了以下评估:
2. 多查询生成:
MultiQueryRetriever 会创建多个与原始问题语义相似的问题变体。鉴于其复杂性,我们包含了之前的评估结果,并增加了以下内容:
- 多查询准确性: 确保生成的多查询与原始查询含义相同。
3. 上下文压缩与重排序:
重新排序涉及根据节点与查询的相关性进行重新排序,并选择前n个节点。由于重新排序完成后节点数量可能会减少,我们执行以下评估:
这些评估共同确保了RAG、MultiQueryRetriever以及重排序过程在链中的稳健性和有效性。
安装依赖项
%pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
``````output
[33mWARNING: There was an error checking the latest version of pip.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.
注意:如果您想使用该库的GPU版本,也可以安装faiss-gpu而不是faiss-cpu。
导入库
from getpass import getpass
from langchain.chains import RetrievalQA
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import (
RecursiveCharacterTextSplitter,
)
加载文档
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
将文档拆分为块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)
创建检索器
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()
定义大型语言模型(LLM)
llm = ChatOpenAI(temperature=0, model="gpt-4")
设置
UpTrain为您提供:
- 带有高级下钻和筛选选项的仪表盘
- 失败案例中的洞察与常见主题
- 可观测性和生产数据的实时监控
- 通过与您的CI/CD管道无缝集成进行回归测试
您可以选择以下选项之一来使用 UpTrain 进行评估:
1. UpTrain的开源软件(OSS):
您可以使用开源评估服务来评估您的模型。在这种情况下,您需要提供一个 OpenAI API 密钥。UpTrain 使用 GPT 模型来评估由 LLM 生成的响应。您可以 在此处 获取。
要在 UpTrain 仪表板中查看您的评估结果,您需要在终端中运行以下命令进行设置:
git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh
这将在您的本地机器上启动 UpTrain 仪表板。您可以在 http://localhost:3000/dashboard 访问它。
Parameters:
- key_type="openai"
- api_key="OPENAI_API_KEY"
- project_name="PROJECT_NAME"
2. UpTrain 管理服务和仪表盘:
或者,您可以使用 UpTrain 的托管服务来评估您的模型。您可以创建一个免费的 UpTrain 账户 在这里 并获得免费试用积分。如果您需要更多试用积分,请 在这里与 UpTrain 的维护者预约通话。
使用托管服务的好处是:
- 无需在本地机器上设置 UpTrain 仪表板。
- 无需API密钥即可访问许多大型语言模型。
评估完成后,您可以在 UpTrain 仪表板上查看它们,地址为 https://dashboard.uptrain.ai/dashboard
Parameters:
- key_type="uptrain"
- api_key="UPTRAIN_API_KEY"
- project_name="PROJECT_NAME"
注意: project_name 将是评估结果在 UpTrain 仪表板中显示的项目名称。
设置API密钥
笔记本将提示您输入API密钥。您可以通过更改下面单元格中的key_type参数来选择OpenAI API密钥或UpTrain API密钥。
KEY_TYPE = "openai" # or "uptrain"
API_KEY = getpass()
基础检索增强生成
UpTrain 回调处理器会在生成查询、上下文和响应后自动捕获它们,并对响应运行以下三项评估 (从 0 到 1 评分):
# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)
# Create the chain
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt_text
| llm
| StrOutputParser()
)
# Create the uptrain callback handler
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}
# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(query, config=config)
[32m2024-04-17 17:03:44.969[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:05.809[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output
Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that she is a former top litigator in private practice, a former federal public defender, and comes from a family of public school educators and police officers. He described her as a consensus builder and noted that since her nomination, she has received a broad range of support from various groups, including the Fraternal Order of Police and former judges appointed by both Democrats and Republicans.
Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0
多查询生成
< strong>MultiQueryRetriever 用于解决 RAG 流水线可能无法根据查询返回最佳文档集的问题。它生成多个与原始查询含义相同的查询,然后为每个查询获取文档。
为了评估这个检索器,UpTrain 将运行以下评估:
- 多查询准确性: 检查生成的多查询是否与原始查询意思相同。
# Create the retriever
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)
# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}
# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)
chain = (
{"context": multi_query_retriever, "question": RunnablePassthrough()}
| rag_prompt_text
| llm
| StrOutputParser()
)
# Invoke the chain with a query
question = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(question, config=config)
[32m2024-04-17 17:04:10.675[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:16.804[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output
Question: What did the president say about Ketanji Brown Jackson
Multi Queries:
- How did the president comment on Ketanji Brown Jackson?
- What were the president's remarks regarding Ketanji Brown Jackson?
- What statements has the president made about Ketanji Brown Jackson?
Multi Query Accuracy Score: 0.5
``````output
[32m2024-04-17 17:04:22.027[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:44.033[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output
Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that since her nomination, she has received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.
Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0
3. 上下文压缩与重排序
重排序过程涉及根据节点与查询的相关性重新排列节点,并选择前n个节点。由于一旦重排序完成,节点数量可能会减少,因此我们进行以下评估:
# Create the retriever
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
# Create the chain
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}
# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
result = chain.invoke(query, config=config)
[32m2024-04-17 17:04:46.462[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:53.561[0m | [1mINFO [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard