Skip to main content
Open In ColabOpen on GitHub

ChatOpenAI

本笔记本提供了快速入门OpenAI聊天模型的概述。有关ChatOpenAI所有功能和配置的详细文档,请访问API参考

OpenAI 有多个聊天模型。你可以在 OpenAI 文档 中找到有关其最新模型的信息,包括成本、上下文窗口和支持的输入类型。

Azure OpenAI

请注意,某些OpenAI模型也可以通过Microsoft Azure平台访问。要使用Azure OpenAI服务,请使用AzureChatOpenAI集成

概览

集成详情

本地可序列化的JS 支持软件包下载最新包裹
ChatOpenAIlangchain-openaibetaPyPI - DownloadsPyPI - Version

模型特性

工具调用结构化输出JSON模式图像输入音频输入视频输入令牌级流式传输原生异步令牌使用量对数概率

设置

要访问OpenAI模型,您需要创建一个OpenAI帐户,获取API密钥,并安装langchain-openai集成包。

凭据

前往 https://platform.openai.com 注册OpenAI并生成API密钥。完成后,请设置OPENAI_API_KEY环境变量:

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

如果你想自动跟踪模型调用,也可以通过取消注释下面的内容来设置你的 LangSmith API 密钥:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

安装

LangChain OpenAI 集成位于 langchain-openai 包中:

%pip install -qU langchain-openai

实例化

现在我们可以实例化我们的模型对象并生成聊天补全:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# api_key="...", # if you prefer to pass api key in directly instaed of using env vars
# base_url="...",
# organization="...",
# other params...
)
API 参考:ChatOpenAI

调用

messages = [
(
"system",
"You are a helpful assistant that translates English to French. Translate the user sentence.",
),
("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 31, 'total_tokens': 36}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'stop', 'logprobs': None}, id='run-63219b22-03e3-4561-8cc4-78b7c7c3a3ca-0', usage_metadata={'input_tokens': 31, 'output_tokens': 5, 'total_tokens': 36})
print(ai_msg.content)
J'adore la programmation.

链式调用

我们可以像这样将我们的模型与提示模板链接

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)

chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
API 参考:ChatPromptTemplate
AIMessage(content='Ich liebe das Programmieren.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 26, 'total_tokens': 32}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'stop', 'logprobs': None}, id='run-350585e1-16ca-4dad-9460-3d9e7e49aaf1-0', usage_metadata={'input_tokens': 26, 'output_tokens': 6, 'total_tokens': 32})

工具调用

OpenAI 提供了一个 工具调用(在这里我们交替使用“工具调用”和“函数调用”)API,允许你描述工具及其参数,并让模型返回一个包含要调用的工具及该工具输入的JSON对象。工具调用在构建使用工具的链和代理时非常有用,也更广泛地适用于从模型中获取结构化输出。

ChatOpenAI.bind_tools()

通过 ChatOpenAI.bind_tools,我们可以轻松地将 Pydantic 类、字典模式、LangChain 工具,甚至函数作为工具传递给模型。在底层,这些会被转换为 OpenAI 工具模式,其格式如下:

{
"name": "...",
"description": "...",
"parameters": {...} # JSONSchema
}

并在每次模型调用时传入。

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
"""Get the current weather in a given location"""

location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


llm_with_tools = llm.bind_tools([GetWeather])
ai_msg = llm_with_tools.invoke(
"what is the weather like in San Francisco",
)
ai_msg
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_o9udf3EVOWiV4Iupktpbpofk', 'function': {'arguments': '{"location":"San Francisco, CA"}', 'name': 'GetWeather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-1617c9b2-dda5-4120-996b-0333ed5992e2-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_o9udf3EVOWiV4Iupktpbpofk', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})

strict=True

需要 langchain-openai>=0.1.21

截至2024年8月6日,OpenAI在调用工具时支持一个strict参数,这将强制模型遵守工具参数模式。更多信息请参见:https://platform.openai.com/docs/guides/function-calling

注意: 如果为 strict=True,工具定义也将被验证,并且接受JSON schema的一个子集。重要的是,schema不能包含可选参数(即那些具有默认值的参数)。有关支持的schema类型的完整文档,请参阅这里: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas

llm_with_tools = llm.bind_tools([GetWeather], strict=True)
ai_msg = llm_with_tools.invoke(
"what is the weather like in San Francisco",
)
ai_msg
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_jUqhd8wzAIzInTJl72Rla8ht', 'function': {'arguments': '{"location":"San Francisco, CA"}', 'name': 'GetWeather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-5e3356a9-132d-4623-8e73-dd5a898cf4a6-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_jUqhd8wzAIzInTJl72Rla8ht', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})

AIMessage.tool_calls

请注意,AIMessage 具有 tool_calls 属性。该属性以标准化的 ToolCall 格式包含与模型提供商无关的内容。

ai_msg.tool_calls
[{'name': 'GetWeather',
'args': {'location': 'San Francisco, CA'},
'id': 'call_jUqhd8wzAIzInTJl72Rla8ht',
'type': 'tool_call'}]

有关绑定工具和工具调用输出的更多信息,请访问工具调用文档。

响应 API

需要 langchain-openai>=0.3.9

OpenAI 支持一个面向构建 代理型 应用程序的 响应 API。它包含一套 内置工具,包括网络和文件搜索。它还支持管理 对话状态,允许您在不显式传入先前消息的情况下继续对话线程,以及输出来自 推理过程 的结果。

ChatOpenAI 如果使用了这些功能之一,将路由到 Responses API。你也可以在实例化 ChatOpenAI 时指定 use_responses_api=True

内置工具

ChatOpenAI 配备内置工具后,其回答将基于外部信息进行补充,例如通过文件或网络中的上下文。模型生成的 AIMessage 将包含有关内置工具调用的信息。

要触发网络搜索,请像使用其他工具一样将 {"type": "web_search_preview"} 传递给模型。

提示

你还可以将内置工具作为调用参数传递:

llm.invoke("...", tools=[{"type": "web_search_preview"}])
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

tool = {"type": "web_search_preview"}
llm_with_tools = llm.bind_tools([tool])

response = llm_with_tools.invoke("What was a positive news story from today?")
API 参考:ChatOpenAI

请注意,响应中包含结构化的内容块,其中不仅有响应的文本,还包括引用其来源的OpenAI注释

response.content
[{'type': 'text',
'text': 'Today, a heartwarming story emerged from Minnesota, where a group of high school robotics students built a custom motorized wheelchair for a 2-year-old boy named Cillian Jackson. Born with a genetic condition that limited his mobility, Cillian\'s family couldn\'t afford the $20,000 wheelchair he needed. The students at Farmington High School\'s Rogue Robotics team took it upon themselves to modify a Power Wheels toy car into a functional motorized wheelchair for Cillian, complete with a joystick, safety bumpers, and a harness. One team member remarked, "I think we won here more than we do in our competitions. Instead of completing a task, we\'re helping change someone\'s life." ([boredpanda.com](https://www.boredpanda.com/wholesome-global-positive-news/?utm_source=openai))\n\nThis act of kindness highlights the profound impact that community support and innovation can have on individuals facing challenges. ',
'annotations': [{'end_index': 778,
'start_index': 682,
'title': '“Global Positive News”: 40 Posts To Remind Us There’s Good In The World',
'type': 'url_citation',
'url': 'https://www.boredpanda.com/wholesome-global-positive-news/?utm_source=openai'}]}]
提示

您可以通过使用 response.text() 来恢复响应的纯文本内容为字符串。例如,要流式传输响应文本:

for token in llm_with_tools.stream("..."):
print(token.text(), end="|")

查看更多细节,请参阅流式传输指南

输出消息还将包含任何工具调用的相关信息:

response.additional_kwargs
{'tool_outputs': [{'id': 'ws_67d192aeb6cc81918e736ad4a57937570d6f8507990d9d71',
'status': 'completed',
'type': 'web_search_call'}]}

要触发文件搜索,请像传递其他工具一样,将文件搜索工具传递给模型。您需要填充一个由OpenAI管理的向量存储,并在工具定义中包含向量存储ID。更多详细信息,请参阅OpenAI文档

llm = ChatOpenAI(model="gpt-4o-mini")

openai_vector_store_ids = [
"vs_...", # your IDs here
]

tool = {
"type": "file_search",
"vector_store_ids": openai_vector_store_ids,
}
llm_with_tools = llm.bind_tools([tool])

response = llm_with_tools.invoke("What is deep research by OpenAI?")
print(response.text())
Deep Research by OpenAI is a new capability integrated into ChatGPT that allows for the execution of multi-step research tasks independently. It can synthesize extensive amounts of online information and produce comprehensive reports similar to what a research analyst would do, significantly speeding up processes that would typically take hours for a human.

### Key Features:
- **Independent Research**: Users simply provide a prompt, and the model can find, analyze, and synthesize information from hundreds of online sources.
- **Multi-Modal Capabilities**: The model is also able to browse user-uploaded files, plot graphs using Python, and embed visualizations in its outputs.
- **Training**: Deep Research has been trained using reinforcement learning on real-world tasks that require extensive browsing and reasoning.

### Applications:
- Useful for professionals in sectors like finance, science, policy, and engineering, enabling them to obtain accurate and thorough research quickly.
- It can also be beneficial for consumers seeking personalized recommendations on complex purchases.

### Limitations:
Although Deep Research presents significant advancements, it has some limitations, such as the potential to hallucinate facts or struggle with authoritative information.

Deep Research aims to facilitate access to thorough and documented information, marking a significant step toward the broader goal of developing artificial general intelligence (AGI).

网络搜索一样,响应将包括带有引用的内容块:

response.content[0]["annotations"][:2]
[{'file_id': 'file-3UzgX7jcC8Dt9ZAFzywg5k',
'index': 346,
'type': 'file_citation',
'filename': 'deep_research_blog.pdf'},
{'file_id': 'file-3UzgX7jcC8Dt9ZAFzywg5k',
'index': 575,
'type': 'file_citation',
'filename': 'deep_research_blog.pdf'}]

它还将包括来自内置工具调用的信息:

response.additional_kwargs
{'tool_outputs': [{'id': 'fs_67d196fbb83c8191ba20586175331687089228ce932eceb1',
'queries': ['What is deep research by OpenAI?'],
'status': 'completed',
'type': 'file_search_call'}]}

计算机使用

ChatOpenAI 支持 "computer-use-preview" 模型,这是一种专为内置计算机使用工具设计的模型。要启用此功能,请像传递其他工具一样传递一个 计算机使用工具

目前,计算机使用的工具输出显示在 AIMessage.additional_kwargs["tool_outputs"] 中。要回复计算机的工具调用,需构造一个带有 {"type": "computer_call_output"}ToolMessage,并将其置于 additional_kwargs 中。消息的内容将是一张截图。下面,我们展示一个简单的示例。

首先,加载两张截图:

import base64


def load_png_as_base64(file_path):
with open(file_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string.decode("utf-8")


screenshot_1_base64 = load_png_as_base64(
"/path/to/screenshot_1.png"
) # perhaps a screenshot of an application
screenshot_2_base64 = load_png_as_base64(
"/path/to/screenshot_2.png"
) # perhaps a screenshot of the Desktop
from langchain_openai import ChatOpenAI

# Initialize model
llm = ChatOpenAI(
model="computer-use-preview",
model_kwargs={"truncation": "auto"},
)

# Bind computer-use tool
tool = {
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "browser",
}
llm_with_tools = llm.bind_tools([tool])

# Construct input message
input_message = {
"role": "user",
"content": [
{
"type": "text",
"text": (
"Click the red X to close and reveal my Desktop. "
"Proceed, no confirmation needed."
),
},
{
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot_1_base64}",
},
],
}

# Invoke model
response = llm_with_tools.invoke(
[input_message],
reasoning={
"generate_summary": "concise",
},
)
API 参考:ChatOpenAI

响应将包括对其additional_kwargs中的计算机使用工具的调用:

response.additional_kwargs
{'reasoning': {'id': 'rs_67ddb381c85081919c46e3e544a161e8051ff325ba1bad35',
'summary': [{'text': 'Closing Visual Studio Code application',
'type': 'summary_text'}],
'type': 'reasoning'},
'tool_outputs': [{'id': 'cu_67ddb385358c8191bf1a127b71bcf1ea051ff325ba1bad35',
'action': {'button': 'left', 'type': 'click', 'x': 17, 'y': 38},
'call_id': 'call_Ae3Ghz8xdqZQ01mosYhXXMho',
'pending_safety_checks': [],
'status': 'completed',
'type': 'computer_call'}]}

接下来,我们使用这些属性构造一个工具消息:

  1. 它有一个与计算机调用中的 call_id 相匹配的 tool_call_id
  2. 它在 additional_kwargs 中有 {"type": "computer_call_output"}
  3. 其内容是image_urlinput_image的输出块(有关格式,请参阅OpenAI 文档)。
from langchain_core.messages import ToolMessage

tool_call_id = response.additional_kwargs["tool_outputs"][0]["call_id"]

tool_message = ToolMessage(
content=[
{
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot_2_base64}",
}
],
# content=f"data:image/png;base64,{screenshot_2_base64}", # <-- also acceptable
tool_call_id=tool_call_id,
additional_kwargs={"type": "computer_call_output"},
)
API 参考:ToolMessage

我们现在可以使用消息历史再次调用模型:

messages = [
input_message,
response,
tool_message,
]

response_2 = llm_with_tools.invoke(
messages,
reasoning={
"generate_summary": "concise",
},
)
response_2.text()
'Done! The Desktop is now visible.'

我们可以使用 previous_response_id,而不是返回整个序列:

previous_response_id = response.response_metadata["id"]

response_2 = llm_with_tools.invoke(
[tool_message],
previous_response_id=previous_response_id,
reasoning={
"generate_summary": "concise",
},
)
response_2.text()
'The Visual Studio Code terminal has been closed and your desktop is now visible.'

管理对话状态

Responses API 支持管理对话状态

手动管理状态

您可以手动管理状态,也可以使用LangGraph,就像其他聊天模型一样:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

tool = {"type": "web_search_preview"}
llm_with_tools = llm.bind_tools([tool])

first_query = "What was a positive news story from today?"
messages = [{"role": "user", "content": first_query}]

response = llm_with_tools.invoke(messages)
response_text = response.text()
print(f"{response_text[:100]}... {response_text[-100:]}")
API 参考:ChatOpenAI
As of March 12, 2025, here are some positive news stories that highlight recent uplifting events:

*... exemplify positive developments in health, environmental sustainability, and community well-being.
second_query = (
"Repeat my question back to me, as well as the last sentence of your answer."
)

messages.extend(
[
response,
{"role": "user", "content": second_query},
]
)
second_response = llm_with_tools.invoke(messages)
print(second_response.text())
Your question was: "What was a positive news story from today?"

The last sentence of my answer was: "These stories exemplify positive developments in health, environmental sustainability, and community well-being."
提示

您可以使用 LangGraph 在多种后端(包括内存和Postgres)中为您管理对话线程。请参阅 本教程 开始使用。

传递 previous_response_id

在使用Responses API时,LangChain消息的元数据中将包含一个"id"字段。将此ID传递给后续调用将继续对话。请注意,从计费的角度来看,这与手动传入消息是等效的。

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
model="gpt-4o-mini",
use_responses_api=True,
)
response = llm.invoke("Hi, I'm Bob.")
print(response.text())
API 参考:ChatOpenAI
Hi Bob! How can I assist you today?
second_response = llm.invoke(
"What is my name?",
previous_response_id=response.response_metadata["id"],
)
print(second_response.text())
Your name is Bob. How can I help you today, Bob?

推理输出

一些OpenAI模型会生成单独的文本内容来说明其推理过程。详情请参阅OpenAI的推理文档

OpenAI 可以返回模型推理的摘要(尽管它不暴露原始的推理标记)。要配置 ChatOpenAI 以返回此摘要,请指定 reasoning 参数:

from langchain_openai import ChatOpenAI

reasoning = {
"effort": "medium", # 'low', 'medium', or 'high'
"summary": "auto", # 'detailed', 'auto', or None
}

llm = ChatOpenAI(
model="o4-mini",
use_responses_api=True,
model_kwargs={"reasoning": reasoning},
)
response = llm.invoke("What is 3^3?")

# Output
response.text()
API 参考:ChatOpenAI
'3^3 = 3 × 3 × 3 = 27.'
# Reasoning
reasoning = response.additional_kwargs["reasoning"]
for block in reasoning["summary"]:
print(block["text"])
**Calculating power of three**

The user is asking for the result of 3 to the power of 3, which I know is 27. It's a straightforward question, so I’ll keep my answer concise: 27. I could explain that this is the same as multiplying 3 by itself twice: 3 × 3 × 3 equals 27. However, since the user likely just needs the answer, I’ll simply respond with 27.

Fine-tuning

您可以通过传入相应的 modelName 参数来调用经过微调的 OpenAI 模型。

这通常采用 ft:{OPENAI_MODEL_NAME}:{ORG_NAME}::{MODEL_ID} 的形式。例如:

fine_tuned_model = ChatOpenAI(
temperature=0, model_name="ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR"
)

fine_tuned_model.invoke(messages)
AIMessage(content="J'adore la programmation.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 31, 'total_tokens': 39}, 'model_name': 'ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-0f39b30e-c56e-4f3b-af99-5c948c984146-0', usage_metadata={'input_tokens': 31, 'output_tokens': 8, 'total_tokens': 39})

多模态输入

OpenAI 提供支持多模态输入的模型。您可以将图像或音频传递给这些模型。有关如何在 LangChain 中实现这一点的更多信息,请访问 多模态输入 文档。

您可以在OpenAI的文档中查看支持不同模态的模型列表。

在撰写本文档时,你会使用的主要 OpenAI 模型包括:

  • 图像输入: gpt-4o, gpt-4o-mini
  • 音频输入: gpt-4o-audio-preview

有关传递图像输入的示例,请参阅多模态输入操操作指南

以下是将音频输入传递给 gpt-4o-audio-preview 的示例:

import base64

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
model="gpt-4o-audio-preview",
temperature=0,
)

with open(
"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav",
"rb",
) as f:
# b64 encode it
audio = f.read()
audio_b64 = base64.b64encode(audio).decode()


output_message = llm.invoke(
[
(
"human",
[
{"type": "text", "text": "Transcribe the following:"},
# the audio clip says "I'm sorry, but I can't create..."
{
"type": "input_audio",
"input_audio": {"data": audio_b64, "format": "wav"},
},
],
),
]
)
output_message.content
API 参考:ChatOpenAI
"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?"

预测输出

信息

需要 langchain-openai>=0.2.6

一些OpenAI模型(例如其gpt-4ogpt-4o-mini系列)支持预测输出,这允许您提前传入LLM预期输出的已知部分,以减少延迟。这在诸如编辑文本或代码的情况下非常有用,因为只有模型输出的一小部分会发生变化。

这里有一个示例:

code = """
/// <summary>
/// Represents a user with a first name, last name, and username.
/// </summary>
public class User
{
/// <summary>
/// Gets or sets the user's first name.
/// </summary>
public string FirstName { get; set; }

/// <summary>
/// Gets or sets the user's last name.
/// </summary>
public string LastName { get; set; }

/// <summary>
/// Gets or sets the user's username.
/// </summary>
public string Username { get; set; }
}
"""

llm = ChatOpenAI(model="gpt-4o")
query = (
"Replace the Username property with an Email property. "
"Respond only with code, and with no markdown formatting."
)
response = llm.invoke(
[{"role": "user", "content": query}, {"role": "user", "content": code}],
prediction={"type": "content", "content": code},
)
print(response.content)
print(response.response_metadata)
/// <summary>
/// Represents a user with a first name, last name, and email.
/// </summary>
public class User
{
/// <summary>
/// Gets or sets the user's first name.
/// </summary>
public string FirstName { get; set; }

/// <summary>
/// Gets or sets the user's last name.
/// </summary>
public string LastName { get; set; }

/// <summary>
/// Gets or sets the user's email.
/// </summary>
public string Email { get; set; }
}
{'token_usage': {'completion_tokens': 226, 'prompt_tokens': 166, 'total_tokens': 392, 'completion_tokens_details': {'accepted_prediction_tokens': 49, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 107}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_45cf54deae', 'finish_reason': 'stop', 'logprobs': None}

请注意,目前预测会被计费为额外的令牌,这可能会增加您的使用量和成本,以换取降低的延迟。

音频生成(预览)

信息

需要 langchain-openai>=0.2.3

OpenAI 推出了一项新的 音频生成功能,您可以通过该功能使用 gpt-4o-audio-preview 模型处理音频输入和输出。

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
model="gpt-4o-audio-preview",
temperature=0,
model_kwargs={
"modalities": ["text", "audio"],
"audio": {"voice": "alloy", "format": "wav"},
},
)

output_message = llm.invoke(
[
("human", "Are you made by OpenAI? Just answer yes or no"),
]
)
API 参考:ChatOpenAI

output_message.additional_kwargs['audio'] 将包含类似以下的字典

{
'data': '<audio data b64-encoded',
'expires_at': 1729268602,
'id': 'audio_67127d6a44348190af62c1530ef0955a',
'transcript': 'Yes.'
}

格式将为传入 model_kwargs['audio']['format'] 的内容。

我们还可以将此消息以及音频数据作为消息历史记录的一部分在达到 openai expires_at 之前传回模型。

笔记

输出音频存储在AIMessage.additional_kwargs中的audio键下,但输入内容块的类型为input_audio,并且键位于HumanMessage.content列表中。

有关更多信息,请参阅OpenAI的音频文档

history = [
("human", "Are you made by OpenAI? Just answer yes or no"),
output_message,
("human", "And what is your name? Just give your name."),
]
second_output_message = llm.invoke(history)

灵活处理

OpenAI 提供了多种服务层级。 “弹性”层级对请求提供更便宜的价格,但响应时间可能更长,资源也可能并非始终可用。 这种方法最适合用于非关键任务,包括模型测试、数据增强或可以异步运行的任务。

要使用它,请用 service_tier="flex" 初始化模型:

llm = ChatOpenAI(model="o4-mini", service_tier="flex")

请注意,这是一个仅适用于部分模型的测试版功能。更多详细信息,请参阅OpenAI的文档

API 参考

有关ChatOpenAI所有功能和配置的详细文档,请访问API参考