Huggingface端点

The Hugging Face Hub is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.

Hugging Face Hub 还提供了各种端点来构建机器学习应用程序。此示例展示了如何连接到不同的端点类型。

特别是，文本生成推理由文本生成推理提供支持：一个自定义的 Rust、Python 和 gRPC 服务器，用于快速的文本生成推理。

from langchain_huggingface import HuggingFaceEndpoint

API 参考：HuggingFaceEndpoint

安装与设置

要使用，您应该安装 huggingface_hub Python 软件包。

%pip install --upgrade --quiet huggingface_hub

# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

准备示例

from langchain_huggingface import HuggingFaceEndpoint

API 参考：HuggingFaceEndpoint

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

API 参考：LLMChain | PromptTemplate

question = "Who won the FIFA World Cup in the year 1994? "

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

示例

这是一个如何访问免费无服务器端点 API 的 HuggingFaceEndpoint 集成示例。

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
)
llm_chain = prompt | llm
print(llm_chain.invoke({"question": question}))

专用端点

免费的无服务器 API 使您能够快速实现解决方案并进行迭代，但对于高负载的使用场景可能会受到速率限制，因为这些负载与其他请求共享。

对于企业级工作负载，最佳选择是使用推理端点 - 专用。这将提供一个完全托管的基础设施，提供更高的灵活性和速度。这些资源附带持续支持和正常运行时间保证，以及自动扩展等选项。

# Set the url to your Inference Endpoint below
your_endpoint_url = "https://fayjubiy2xqn36z0.us-east-1.aws.endpoints.huggingface.cloud"

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("What did foo say about bar?")

流式传输

from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
)
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])

API 参考：StreamingStdOutCallbackHandler | HuggingFaceEndpoint

同样的 HuggingFaceEndpoint 类可以与本地的 HuggingFace TGI 实例一起使用，该实例提供大型语言模型（LLM）。有关各种硬件（GPU、TPU、Gaudi 等）支持的详细信息，请查看 TGI 仓库。

大语言模型概念指南
大语言模型操操作指南

安装与设置​

准备示例​

示例​

专用端点​

流式传输​

相关​

安装与设置

准备示例

示例

专用端点

流式传输

相关