Doctran
Doctran is a python package. It uses LLMs and open-source NLP libraries to transform raw text into clean, structured, information-dense documents that are optimized for vector space retrieval. You can think of
Doctranas a black box where messy strings go in and nice, clean, labelled strings come out.
安装与设置
pip install doctran
文档转换器
文档查询器
查看 DoctranQATransformer 的使用示例。
from langchain_community.document_loaders import DoctranQATransformer
属性提取器
查看 DoctranPropertyExtractor 的使用示例。
from langchain_community.document_loaders import DoctranPropertyExtractor
文档翻译器
查看 DoctranTextTranslator 的使用示例。
from langchain_community.document_loaders import DoctranTextTranslator