如何为聊天机器人添加记忆

聊天机器人的一个关键特性是能够将之前对话轮次的内容作为上下文使用。这种状态管理可以采取多种形式，包括：

简单地将之前的消息塞入聊天模型的提示中。
上述内容，但会裁剪旧消息，以减少模型需要处理的干扰信息量。
更复杂的修改，例如为长时间运行的对话生成摘要。

我们将在下面详细介绍几种技术！

笔记

本教程此前使用 RunnableWithMessageHistory 构建了一个聊天机器人。您可以在 v0.2 文档中访问此版本的教程。

在 LangChain v0.3 版本发布后，我们建议 LangChain 用户利用 LangGraph 持久化将 memory 集成到新的 LangChain 应用中。

如果您的代码已经依赖于 RunnableWithMessageHistory 或 BaseChatMessageHistory，则无需进行任何更改。由于该功能适用于简单的聊天应用，且使用 RunnableWithMessageHistory 的任何代码将继续按预期工作，因此我们计划在近期不弃用此功能。

请参阅如何迁移到 LangGraph Memory 以获取更多详细信息。

设置

你需要安装一些包，并将你的 OpenAI API 密钥设置为名为 OPENAI_API_KEY 的环境变量：

%pip install --upgrade --quiet langchain langchain-openai langgraph

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key: ········

我们还应设置一个聊天模型，用于以下示例。

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")

API 参考：ChatOpenAI

消息传递

最简单的记忆形式就是将聊天历史消息直接传递给链。以下是示例：

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content="You are a helpful assistant. Answer all questions to the best of your ability."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model

ai_msg = chain.invoke(
    {
        "messages": [
            HumanMessage(
                content="Translate from English to French: I love programming."
            ),
            AIMessage(content="J'adore la programmation."),
            HumanMessage(content="What did you just say?"),
        ],
    }
)
print(ai_msg.content)

API 参考：AIMessage | HumanMessage | SystemMessage | ChatPromptTemplate | MessagesPlaceholder

I said, "I love programming" in French: "J'adore la programmation."

我们可以看到，通过将之前的对话传递给链，它可以利用这些内容作为上下文来回答问题。这正是聊天机器人记忆机制的基本原理——本指南的其余部分将展示如何方便地传递或重新格式化消息。

自动历史管理

之前的示例会显式地将消息传递给链（和模型）。这是一种完全可接受的方法，但需要外部管理新消息。LangChain还提供了一种使用LangGraph的持久化功能构建具有记忆能力的应用程序的方式。您可以通过在编译图时提供一个checkpointer来启用持久化。

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability."
    )
    messages = [SystemMessage(content=system_prompt)] + state["messages"]
    response = model.invoke(messages)
    return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

API 参考：MemorySaver | StateGraph

我们将把最新的输入传递给此处的对话，并让 LangGraph 使用检查点来跟踪对话历史：

app.invoke(
    {"messages": [HumanMessage(content="Translate to French: I love programming.")]},
    config={"configurable": {"thread_id": "1"}},
)

{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
  AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39})]}

app.invoke(
    {"messages": [HumanMessage(content="What did I just ask you?")]},
    config={"configurable": {"thread_id": "1"}},
)

{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
  AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39}),
  HumanMessage(content='What did I just ask you?', additional_kwargs={}, response_metadata={}, id='c667529b-7c41-4cc0-9326-0af47328b816'),
  AIMessage(content='You asked me to translate "I love programming" into French.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 54, 'total_tokens': 67, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-134a7ea0-d3a4-4923-bd58-25e5a43f6a1f-0', usage_metadata={'input_tokens': 54, 'output_tokens': 13, 'total_tokens': 67})]}

修改聊天历史

修改已存储的聊天消息可以帮助您的聊天机器人应对各种情况。以下是一些示例：

修剪消息

大型语言模型和聊天模型的上下文窗口有限，即使您没有直接达到限制，也可能希望限制模型需要处理的干扰信息量。一种解决方案是在将消息历史传递给模型之前对其进行截断。让我们使用上面声明的 app 作为示例历史：

demo_ephemeral_chat_history = [
    HumanMessage(content="Hey there! I'm Nemo."),
    AIMessage(content="Hello!"),
    HumanMessage(content="How are you today?"),
    AIMessage(content="Fine thanks!"),
]

app.invoke(
    {
        "messages": demo_ephemeral_chat_history
        + [HumanMessage(content="What's my name?")]
    },
    config={"configurable": {"thread_id": "2"}},
)

{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
  AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
  HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
  AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
  HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}, id='c933eca3-5fd8-4651-af16-20fe2d49c216'),
  AIMessage(content='Your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 63, 'total_tokens': 68, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-a0b21acc-9dbb-4fb6-a953-392020f37d88-0', usage_metadata={'input_tokens': 63, 'output_tokens': 5, 'total_tokens': 68})]}

我们可以看到，应用程序记住了预先加载的名称。

但假设我们的上下文窗口非常小，我们希望将传递给模型的消息数量减少到仅最近的两条。我们可以使用内置的 trim_messages 工具，根据消息的标记数量在它们进入提示之前进行裁剪。在这种情况下，我们将每条消息视为 1 个“标记”，并只保留最后两条消息：

from langchain_core.messages import trim_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define trimmer
# count each message as 1 "token" (token_counter=len) and keep only the last two messages
trimmer = trim_messages(strategy="last", max_tokens=2, token_counter=len)

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    trimmed_messages = trimmer.invoke(state["messages"])
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability."
    )
    messages = [SystemMessage(content=system_prompt)] + trimmed_messages
    response = model.invoke(messages)
    return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

API 参考：trim_messages | MemorySaver | StateGraph

让我们调用这个新应用并检查响应

app.invoke(
    {
        "messages": demo_ephemeral_chat_history
        + [HumanMessage(content="What is my name?")]
    },
    config={"configurable": {"thread_id": "3"}},
)

{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
  AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
  HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
  AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
  HumanMessage(content='What is my name?', additional_kwargs={}, response_metadata={}, id='a22ab7c5-8617-4821-b3e9-a9e7dca1ff78'),
  AIMessage(content="I'm sorry, but I don't have access to personal information about you unless you share it with me. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 39, 'total_tokens': 66, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-f7b32d72-9f57-4705-be7e-43bf1c3d293b-0', usage_metadata={'input_tokens': 39, 'output_tokens': 27, 'total_tokens': 66})]}

我们可以看到 trim_messages 被调用了，只有最近的两条消息会被传递给模型。在这种情况下，这意味着模型忘记了我们给它的名字。

查看我们的如何裁剪消息的指南以了解更多信息。

摘要记忆

我们也可以以其他方式使用这种模式。例如，可以在调用我们的应用之前，通过额外的LLM调用生成对话摘要。让我们重新创建我们的聊天历史：

demo_ephemeral_chat_history = [
    HumanMessage(content="Hey there! I'm Nemo."),
    AIMessage(content="Hello!"),
    HumanMessage(content="How are you today?"),
    AIMessage(content="Fine thanks!"),
]

现在，让我们更新模型调用函数，将之前的交互内容提炼为摘要：

from langchain_core.messages import HumanMessage, RemoveMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability. "
        "The provided chat history includes a summary of the earlier conversation."
    )
    system_message = SystemMessage(content=system_prompt)
    message_history = state["messages"][:-1]  # exclude the most recent user input
    # Summarize the messages if the chat history reaches a certain size
    if len(message_history) >= 4:
        last_human_message = state["messages"][-1]
        # Invoke the model to generate conversation summary
        summary_prompt = (
            "Distill the above chat messages into a single summary message. "
            "Include as many specific details as you can."
        )
        summary_message = model.invoke(
            message_history + [HumanMessage(content=summary_prompt)]
        )

        # Delete messages that we no longer want to show up
        delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]]
        # Re-add user message
        human_message = HumanMessage(content=last_human_message.content)
        # Call the model with summary & response
        response = model.invoke([system_message, summary_message, human_message])
        message_updates = [summary_message, human_message, response] + delete_messages
    else:
        message_updates = model.invoke([system_message] + state["messages"])

    return {"messages": message_updates}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

API 参考：HumanMessage | RemoveMessage | MemorySaver | StateGraph

让我们看看它是否还记得我们给它的名字：

app.invoke(
    {
        "messages": demo_ephemeral_chat_history
        + [HumanMessage("What did I say my name was?")]
    },
    config={"configurable": {"thread_id": "4"}},
)

{'messages': [AIMessage(content="Nemo greeted me, and I responded positively, indicating that I'm doing well.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 60, 'total_tokens': 76, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-ee42f98d-907d-4bad-8f16-af2db789701d-0', usage_metadata={'input_tokens': 60, 'output_tokens': 16, 'total_tokens': 76}),
  HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}, id='788555ea-5b1f-4c29-a2f2-a92f15d147be'),
  AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 67, 'total_tokens': 75, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-099a43bd-a284-4969-bb6f-0be486614cd8-0', usage_metadata={'input_tokens': 67, 'output_tokens': 8, 'total_tokens': 75})]}

请注意，再次调用该应用时，对话历史将持续累积，直到达到指定的消息数量（我们的情况是四个）。此时，我们将基于初始摘要和新增消息生成另一份摘要，如此循环往复。

设置​

消息传递​

自动历史管理​

修改聊天历史​

修剪消息​

摘要记忆​

设置

消息传递

自动历史管理

修改聊天历史

修剪消息

摘要记忆