如何修剪消息

先决条件

本指南假定读者熟悉以下概念：

本指南中的方法还需要 langchain-core>=0.2.9。

所有模型都有有限的上下文窗口，这意味着它们能接收的输入标记数量是有限的。标记。如果你有非常长的消息，或者一个会累积长时间消息历史的链/代理，你需要管理传递给模型的消息长度。

trim_messages 可用于将聊天历史记录的大小减少到指定的标记计数或指定的消息计数。

如果将修剪后的聊天历史记录直接传回聊天模型，则修剪后的聊天历史记录应满足以下属性：

生成的聊天历史记录应为有效。通常这意味着应满足以下属性：
- 聊天历史记录以 (1) HumanMessage 或 (2) 一个 SystemMessage 后跟 HumanMessage 开始。
- 聊天历史记录以 HumanMessage 或 ToolMessage 结束。
- ToolMessage 只能在涉及工具调用的 AIMessage 之后出现。
可以通过设置 start_on="human" 和 ends_on=("human", "tool") 来实现这一点。
它包含最近的消息，并在聊天历史记录中删除旧消息。这可以通过设置 strategy="last" 来实现。
通常情况下，新的聊天历史记录应包含 SystemMessage，如果它在原始聊天历史记录中存在的话，因为 SystemMessage 包含了对聊天模型的特殊指令。SystemMessage 几乎总是如果存在的话，是历史记录中的第一条消息。这可以通过设置 include_system=True 来实现。

基于标记数量的截断

在这里，我们将根据标记数量修剪聊天历史记录。修剪后的聊天历史记录将生成一个包含 SystemMessage 的有效聊天历史记录。

为了保留最新的消息，我们将设置为 strategy="last"。我们还将设置 include_system=True 以包含 SystemMessage，并设置 start_on="human" 以确保生成的聊天历史记录有效。

使用基于标记计数的 trim_messages 时，这是一个很好的默认配置。请根据您的使用场景调整 token_counter 和 max_tokens。

请注意，对于我们的 token_counter，我们可以传入一个函数（更多内容见下文）或一个语言模型（因为语言模型具有消息标记计数方法）。当你需要将消息裁剪以适应特定模型的上下文窗口时，传入一个模型是合理的：

pip install -qU langchain-openai

from langchain_core.messages import (
    AIMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
    trim_messages,
)
from langchain_core.messages.utils import count_tokens_approximately

messages = [
    SystemMessage("you're a good assistant, you always respond with a joke."),
    HumanMessage("i wonder why it's called langchain"),
    AIMessage(
        'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
    ),
    HumanMessage("and who is harrison chasing anyways"),
    AIMessage(
        "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
    ),
    HumanMessage("what do you call a speechless parrot"),
]


trim_messages(
    messages,
    # Keep the last <= n_count tokens of the messages.
    strategy="last",
    # Remember to adjust based on your model
    # or else pass a custom token_counter
    token_counter=count_tokens_approximately,
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    # Remember to adjust based on the desired conversation
    # length
    max_tokens=45,
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    start_on="human",
    # Most chat models expect that chat history ends with either:
    # (1) a HumanMessage or
    # (2) a ToolMessage
    end_on=("human", "tool"),
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,
    allow_partial=False,
)

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

基于消息数量的裁剪

或者，我们可以通过设置 token_counter=len 来根据 消息数量 裁剪聊天记录。在这种情况下，每条消息将计为一个标记，而 max_tokens 将控制最大消息数量。

使用 trim_messages 时，这是基于消息数量的良好默认配置。请记得根据您的使用场景调整 max_tokens。

trim_messages(
    messages,
    # Keep the last <= n_count tokens of the messages.
    strategy="last",
    token_counter=len,
    # When token_counter=len, each message
    # will be counted as a single token.
    # Remember to adjust for your use case
    max_tokens=5,
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    start_on="human",
    # Most chat models expect that chat history ends with either:
    # (1) a HumanMessage or
    # (2) a ToolMessage
    end_on=("human", "tool"),
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,
)

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

高级用法

您可以使用 trim_messages 作为构建模块来创建更复杂的处理逻辑。

如果我们希望允许拆分消息的内容，可以指定 allow_partial=True：

trim_messages(
    messages,
    max_tokens=56,
    strategy="last",
    token_counter=count_tokens_approximately,
    include_system=True,
    allow_partial=True,
)

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

默认情况下，SystemMessage 不会被包含在内，因此你可以通过设置 include_system=False 或者省略 include_system 参数来去掉它。

trim_messages(
    messages,
    max_tokens=45,
    strategy="last",
    token_counter=count_tokens_approximately,
)

[AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

我们可以执行翻转操作，通过指定 strategy="first" 来获取第一个 0：

trim_messages(
    messages,
    max_tokens=45,
    strategy="first",
    token_counter=count_tokens_approximately,
)

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]

使用 `ChatModel` 作为标记计数器

您可以将 ChatModel 作为 token 计数器传递。这将使用 ChatModel.get_num_tokens_from_messages。让我们演示如何将其与 OpenAI 一起使用：

from langchain_openai import ChatOpenAI

trim_messages(
    messages,
    max_tokens=45,
    strategy="first",
    token_counter=ChatOpenAI(model="gpt-4o"),
)

API 参考：ChatOpenAI

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]

编写自定义令牌计数器

我们可以编写一个自定义的令牌计数函数，该函数接收消息列表并返回一个整数。

pip install -qU tiktoken

from typing import List

import tiktoken
from langchain_core.messages import BaseMessage, ToolMessage


def str_token_counter(text: str) -> int:
    enc = tiktoken.get_encoding("o200k_base")
    return len(enc.encode(text))


def tiktoken_counter(messages: List[BaseMessage]) -> int:
    """Approximately reproduce https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

    For simplicity only supports str Message.contents.
    """
    num_tokens = 3  # every reply is primed with <|start|>assistant<|message|>
    tokens_per_message = 3
    tokens_per_name = 1
    for msg in messages:
        if isinstance(msg, HumanMessage):
            role = "user"
        elif isinstance(msg, AIMessage):
            role = "assistant"
        elif isinstance(msg, ToolMessage):
            role = "tool"
        elif isinstance(msg, SystemMessage):
            role = "system"
        else:
            raise ValueError(f"Unsupported messages type {msg.__class__}")
        num_tokens += (
            tokens_per_message
            + str_token_counter(role)
            + str_token_counter(msg.content)
        )
        if msg.name:
            num_tokens += tokens_per_name + str_token_counter(msg.name)
    return num_tokens


trim_messages(
    messages,
    token_counter=tiktoken_counter,
    # Keep the last <= n_count tokens of the messages.
    strategy="last",
    # When token_counter=len, each message
    # will be counted as a single token.
    # Remember to adjust for your use case
    max_tokens=45,
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    start_on="human",
    # Most chat models expect that chat history ends with either:
    # (1) a HumanMessage or
    # (2) a ToolMessage
    end_on=("human", "tool"),
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,
)

API 参考：BaseMessage | ToolMessage

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

链式调用

trim_messages 可以以命令式（如上所示）或声明式使用，便于与其他组件组合成链式结构

llm = ChatOpenAI(model="gpt-4o")

# Notice we don't pass in messages. This creates
# a RunnableLambda that takes messages as input
trimmer = trim_messages(
    token_counter=llm,
    # Keep the last <= n_count tokens of the messages.
    strategy="last",
    # When token_counter=len, each message
    # will be counted as a single token.
    # Remember to adjust for your use case
    max_tokens=45,
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    start_on="human",
    # Most chat models expect that chat history ends with either:
    # (1) a HumanMessage or
    # (2) a ToolMessage
    end_on=("human", "tool"),
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,
)

chain = trimmer | llm
chain.invoke(messages)

AIMessage(content='A "polly-no-wanna-cracker"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 32, 'total_tokens': 43, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_90d33c15d4', 'finish_reason': 'stop', 'logprobs': None}, id='run-b1f8b63b-6bc2-4df4-b3b9-dfc4e3e675fe-0', usage_metadata={'input_tokens': 32, 'output_tokens': 11, 'total_tokens': 43, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

查看 LangSmith 跟踪信息，我们可以看到在消息传递给模型之前，它们首先被截断：https://smith.langchain.com/public/65af12c4-c24d-4824-90f0-6547566e59bb/r

仅从修剪器来看，我们可以看到它是一个 Runnable 对象，可以像所有 Runnable 一样被调用：

trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

与 ChatMessageHistory 一起使用

当处理聊天历史记录时，裁剪消息尤其有用，因为聊天历史记录可能会变得非常长：

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

chat_history = InMemoryChatMessageHistory(messages=messages[:-1])


def dummy_get_session_history(session_id):
    if session_id != "1":
        return InMemoryChatMessageHistory()
    return chat_history


trimmer = trim_messages(
    max_tokens=45,
    strategy="last",
    token_counter=llm,
    # Usually, we want to keep the SystemMessage
    # if it's present in the original history.
    # The SystemMessage has special instructions for the model.
    include_system=True,
    # Most chat models expect that chat history starts with either:
    # (1) a HumanMessage or
    # (2) a SystemMessage followed by a HumanMessage
    # start_on="human" makes sure we produce a valid chat history
    start_on="human",
)

chain = trimmer | llm
chain_with_history = RunnableWithMessageHistory(chain, dummy_get_session_history)
chain_with_history.invoke(
    [HumanMessage("what do you call a speechless parrot")],
    config={"configurable": {"session_id": "1"}},
)

API 参考：InMemoryChatMessageHistory | RunnableWithMessageHistory

AIMessage(content='A "polygon"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_c17d3befe7', 'finish_reason': 'stop', 'logprobs': None}, id='run-71d9fce6-bb0c-4bb3-acc8-d5eaee6ae7bc-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})

查看 LangSmith 跟踪信息，我们可以看到我们检索了所有消息，但在将消息传递给模型之前，它们被截断为仅包含系统消息和最后一条人类消息：https://smith.langchain.com/public/17dd700b-9994-44ca-930c-116e00997315/r

API 参考

有关所有参数的完整说明，请参阅API参考： https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html

基于标记数量的截断​

基于消息数量的裁剪​

高级用法​

使用 ChatModel 作为标记计数器​

编写自定义令牌计数器​

链式调用​

与 ChatMessageHistory 一起使用​

API 参考​