如何为聊天机器人添加记忆
聊天机器人的一个关键特性是它们能够利用先前对话轮次的内容作为上下文。这种状态管理可以采取多种形式,包括:
- 只需将之前的消息填充到聊天模型提示中即可。
- 上述内容,但会裁剪旧消息以减少模型需要处理的干扰信息量。
- 更复杂的修改,例如为长对话合成摘要。
我们将在下面更详细地介绍几种技术!
本指南此前使用 RunnableWithMessageHistory 构建了一个聊天机器人。您可以在 v0.2 文档 中访问此版本的指南。
自 LangChain v0.3 版本发布以来,我们建议 LangChain 用户利用 LangGraph 持久化 功能,将 memory 集成到新的 LangChain 应用中。
如果您的代码已经依赖于 RunnableWithMessageHistory 或 BaseChatMessageHistory,则无需进行任何更改。我们计划在近期不弃用此功能,因为它适用于简单的聊天应用程序,且使用 RunnableWithMessageHistory 的任何代码将继续按预期工作。
有关更多详细信息,请参阅如何迁移到 LangGraph Memory。
设置
你需要安装一些包,并将你的 OpenAI API 密钥设置为名为 OPENAI_API_KEY 的环境变量:
%pip install --upgrade --quiet langchain langchain-openai langgraph
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········
让我们也设置一个聊天模型,我们将用于以下示例。
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-mini")
消息传递
最简单的记忆形式只是将聊天历史消息传递给链。这里有一个示例:
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a helpful assistant. Answer all questions to the best of your ability."
),
MessagesPlaceholder(variable_name="messages"),
]
)
chain = prompt | model
ai_msg = chain.invoke(
{
"messages": [
HumanMessage(
content="Translate from English to French: I love programming."
),
AIMessage(content="J'adore la programmation."),
HumanMessage(content="What did you just say?"),
],
}
)
print(ai_msg.content)
I said, "I love programming" in French: "J'adore la programmation."
我们可以看到,通过将之前的对话传递给链(chain),它可以将其作为上下文来回答问题。这是支撑聊天机器人记忆的基本概念——本指南的其余部分将演示传递或重新格式化消息的便捷技术。
自动历史记录管理
前面的示例将消息显式传递给链(和模型)。这是一种完全可接受的方法,但它需要外部管理新消息。LangChain 还提供了使用 LangGraph 的 持久化 构建具有记忆的应用程序的方式。您可以通过在编译图时提供 checkpointer 来在 LangGraph 应用中 启用持久化。
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
workflow = StateGraph(state_schema=MessagesState)
# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + state["messages"]
response = model.invoke(messages)
return {"messages": response}
# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")
# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
我们将把最新的输入传递给此处对话,并让 LangGraph 使用检查点器跟踪对话历史:
app.invoke(
{"messages": [HumanMessage(content="Translate to French: I love programming.")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39})]}
app.invoke(
{"messages": [HumanMessage(content="What did I just ask you?")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39}),
HumanMessage(content='What did I just ask you?', additional_kwargs={}, response_metadata={}, id='c667529b-7c41-4cc0-9326-0af47328b816'),
AIMessage(content='You asked me to translate "I love programming" into French.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 54, 'total_tokens': 67, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-134a7ea0-d3a4-4923-bd58-25e5a43f6a1f-0', usage_metadata={'input_tokens': 54, 'output_tokens': 13, 'total_tokens': 67})]}
修改聊天历史
修改存储的聊天消息可以帮助您的聊天机器人处理各种情况。以下是一些示例:
修剪消息
LLM 和聊天模型具有有限的上下文窗口,即使您尚未直接触及限制,也可能希望减少模型需要处理的干扰。一种解决方案是在将历史消息传递给模型之前对其进行修剪。让我们使用上面声明的 app 作为示例历史:
demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]
app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What's my name?")]
},
config={"configurable": {"thread_id": "2"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}, id='c933eca3-5fd8-4651-af16-20fe2d49c216'),
AIMessage(content='Your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 63, 'total_tokens': 68, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-a0b21acc-9dbb-4fb6-a953-392020f37d88-0', usage_metadata={'input_tokens': 63, 'output_tokens': 5, 'total_tokens': 68})]}
我们可以看到该应用记住了预加载的名称。
但假设我们有一个非常小的上下文窗口,并且我们希望将传递给模型的对话消息数量缩减为仅保留最近的 2 条。我们可以使用内置的 trim_messages 工具,在消息到达提示(prompt)之前根据其 token 数量进行裁剪。在这种情况下,我们将每条消息计为 1 个 "token",并只保留最后两条消息:
from langchain_core.messages import trim_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
# Define trimmer
# count each message as 1 "token" (token_counter=len) and keep only the last two messages
trimmer = trim_messages(strategy="last", max_tokens=2, token_counter=len)
workflow = StateGraph(state_schema=MessagesState)
# Define the function that calls the model
def call_model(state: MessagesState):
trimmed_messages = trimmer.invoke(state["messages"])
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + trimmed_messages
response = model.invoke(messages)
return {"messages": response}
# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")
# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
让我们称这个新应用并检查响应
app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What is my name?")]
},
config={"configurable": {"thread_id": "3"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content='What is my name?', additional_kwargs={}, response_metadata={}, id='a22ab7c5-8617-4821-b3e9-a9e7dca1ff78'),
AIMessage(content="I'm sorry, but I don't have access to personal information about you unless you share it with me. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 39, 'total_tokens': 66, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-f7b32d72-9f57-4705-be7e-43bf1c3d293b-0', usage_metadata={'input_tokens': 39, 'output_tokens': 27, 'total_tokens': 66})]}
我们可以看到trim_messages被调用,且只有最近的两条消息会被传递给模型。在这种情况下,这意味着模型忘记了我们赋予它的名称。
查看我们的 消息修剪指南 以了解更多内容。
总结记忆
我们也可以以其他方式使用相同的模式。例如,我们可以在调用应用之前,先使用额外的 LLM 调用来生成对话摘要。让我们重新创建我们的聊天历史:
demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]
现在,让我们更新模型调用函数,将之前的交互内容浓缩为摘要:
from langchain_core.messages import HumanMessage, RemoveMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
workflow = StateGraph(state_schema=MessagesState)
# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability. "
"The provided chat history includes a summary of the earlier conversation."
)
system_message = SystemMessage(content=system_prompt)
message_history = state["messages"][:-1] # exclude the most recent user input
# Summarize the messages if the chat history reaches a certain size
if len(message_history) >= 4:
last_human_message = state["messages"][-1]
# Invoke the model to generate conversation summary
summary_prompt = (
"Distill the above chat messages into a single summary message. "
"Include as many specific details as you can."
)
summary_message = model.invoke(
message_history + [HumanMessage(content=summary_prompt)]
)
# Delete messages that we no longer want to show up
delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]]
# Re-add user message
human_message = HumanMessage(content=last_human_message.content)
# Call the model with summary & response
response = model.invoke([system_message, summary_message, human_message])
message_updates = [summary_message, human_message, response] + delete_messages
else:
message_updates = model.invoke([system_message] + state["messages"])
return {"messages": message_updates}
# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")
# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
让我们看看它是否还记得我们给它的名字:
app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage("What did I say my name was?")]
},
config={"configurable": {"thread_id": "4"}},
)
{'messages': [AIMessage(content="Nemo greeted me, and I responded positively, indicating that I'm doing well.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 60, 'total_tokens': 76, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-ee42f98d-907d-4bad-8f16-af2db789701d-0', usage_metadata={'input_tokens': 60, 'output_tokens': 16, 'total_tokens': 76}),
HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}, id='788555ea-5b1f-4c29-a2f2-a92f15d147be'),
AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 67, 'total_tokens': 75, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-099a43bd-a284-4969-bb6f-0be486614cd8-0', usage_metadata={'input_tokens': 67, 'output_tokens': 8, 'total_tokens': 75})]}
请注意,再次调用该应用将继续累积历史记录,直到达到指定的消息数量(本例中为四条)。届时,我们将基于初始摘要加上新消息生成另一个新摘要,依此类推。