Skip to main content
Open on GitHub

Doctran

Doctran is a python package. It uses LLMs and open-source NLP libraries to transform raw text into clean, structured, information-dense documents that are optimized for vector space retrieval. You can think of Doctran as a black box where messy strings go in and nice, clean, labelled strings come out.

安装与设置

pip install doctran

文档转换器

文档查询器

查看 DoctranQATransformer 的使用示例

from langchain_community.document_loaders import DoctranQATransformer

属性提取器

查看 DoctranPropertyExtractor 的使用示例

from langchain_community.document_loaders import DoctranPropertyExtractor

文档翻译器

查看 DoctranTextTranslator 的使用示例

from langchain_community.document_loaders import DoctranTextTranslator