Nuclia
Nuclia automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.
Nuclia Understanding API 文档转换器将文本拆分为段落和句子,识别实体,提供文本摘要,并为所有句子生成嵌入。
要使用Nuclia理解API,您需要拥有一个Nuclia账户。您可以免费在 https://nuclia.cloud 创建一个账户,然后 创建一个NUA密钥。
from langchain_community.document_transformers.nuclia_text_transform import NucliaTextTransformer
%pip install --upgrade --quiet protobuf
%pip install --upgrade --quiet nucliadb-protos
import os
os.environ["NUCLIA_ZONE"] = "<YOUR_ZONE>" # e.g. europe-1
os.environ["NUCLIA_NUA_KEY"] = "<YOUR_API_KEY>"
要使用 Nuclia 文档转换器,您需要使用将 enable_ml 设置为 True 的 NucliaUnderstandingAPI 工具进行实例化:
from langchain_community.tools.nuclia import NucliaUnderstandingAPI
nua = NucliaUnderstandingAPI(enable_ml=True)
API 参考:NucliaUnderstandingAPI
Nuclia 文档转换器必须以异步模式调用,因此您需要使用 atransform_documents 方法:
import asyncio
from langchain_community.document_transformers.nuclia_text_transform import (
NucliaTextTransformer,
)
from langchain_core.documents import Document
async def process():
documents = [
Document(page_content="<TEXT 1>", metadata={}),
Document(page_content="<TEXT 2>", metadata={}),
Document(page_content="<TEXT 3>", metadata={}),
]
nuclia_transformer = NucliaTextTransformer(nua)
transformed_documents = await nuclia_transformer.atransform_documents(documents)
print(transformed_documents)
asyncio.run(process())
API 参考:NucliaTextTransformer | 文档