Skip to main content
Open In ColabOpen on GitHub

IPEX-LLM:在英特尔GPU上本地运行的BGE嵌入

IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.

本示例介绍了如何使用 LangChain 在 Intel GPU 上以 ipex-llm 优化执行嵌入任务。这在 RAG、文档问答等应用中会非常有用。

Note

It is recommended that only Windows users with Intel Arc A-Series GPU (except for Intel Arc A300-Series or Pro A60) run this Jupyter notebook directly. For other cases (e.g. Linux users, Intel iGPU, etc.), it is recommended to run the code with Python scripts in terminal for best experiences.

安装先决条件

为了在英特尔GPU上受益于IPEX-LLM,需要完成一些工具安装和环境准备的先决步骤。

如果您是Windows用户,请访问在Windows上使用Intel GPU安装IPEX-LLM指南,并按照安装先决条件更新GPU驱动程序(可选)并安装Conda。

如果您是Linux用户,请访问在配备Intel GPU的Linux上安装IPEX-LLM,并按照安装先决条件安装GPU驱动程序、Intel® oneAPI基础工具包2024.0以及Conda。

设置

在安装完先决条件后,你应该已经创建了一个 conda 环境,并且所有先决条件都已安装。在这个 conda 环境中启动 Jupyter 服务

%pip install -qU langchain langchain-community

安装 IPEX-LLM 以优化 Intel GPU,以及 sentence-transformers

%pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
%pip install sentence-transformers

Note

You can also use https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ as the extra-indel-url.

运行时配置

为了获得最佳性能,建议根据您的设备设置几个环境变量:

适用于使用Intel Core Ultra集成显卡的Windows用户

import os

os.environ["SYCL_CACHE_PERSISTENT"] = "1"
os.environ["BIGDL_LLM_XMX_DISABLED"] = "1"

适用于拥有Intel Arc A系列显卡的Windows用户

import os

os.environ["SYCL_CACHE_PERSISTENT"] = "1"

Note

For the first time that each model runs on Intel iGPU/Intel Arc A300-Series or Pro A60, it may take several minutes to compile.

For other GPU type, please refer to here for Windows users, and here for Linux users.

基本用法

在初始化 IpexLLMBgeEmbeddings 时,将 model_kwargs 中的 device 设置为 "xpu" 将使嵌入模型运行在 Intel GPU 上,并利用 IPEX-LLM 优化:

from langchain_community.embeddings import IpexLLMBgeEmbeddings

embedding_model = IpexLLMBgeEmbeddings(
model_name="BAAI/bge-large-en-v1.5",
model_kwargs={"device": "xpu"},
encode_kwargs={"normalize_embeddings": True},
)

API 参考

sentence = "IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency."
query = "What is IPEX-LLM?"

text_embeddings = embedding_model.embed_documents([sentence, query])
print(f"text_embeddings[0][:10]: {text_embeddings[0][:10]}")
print(f"text_embeddings[1][:10]: {text_embeddings[1][:10]}")

query_embedding = embedding_model.embed_query(query)
print(f"query_embedding[:10]: {query_embedding[:10]}")