Skip to main content
Open In ColabOpen on GitHub

WikipediaRetriever

概览

Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.

此笔记本展示了如何将维基页面从wikipedia.org检索并转换为下游使用的文档格式。

集成详情

检索器
WikipediaRetrieverWikipedia articleslangchain_community

设置

要启用对单个工具的自动跟踪,请设置您的 LangSmith API 密钥:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

安装

集成位于 langchain-community 包中。我们还需要安装 wikipedia Python 包本身。

%pip install -qU langchain_community wikipedia

实例化

现在我们可以实例化我们的检索器:

WikipediaRetriever 参数包括:

  • 可选 lang: 默认="en"。使用它来搜索维基百科特定语言部分
  • 可选 load_max_docs: 默认值=100。使用它来限制下载文档的数量。下载全部100个文档需要花费时间,因此在实验时请使用较小的数字。目前存在300的硬性限制。
  • 可选 load_all_available_meta: 默认值=False。默认情况下,仅下载最重要的字段:Published(文档发布/最后更新的日期)、titleSummary。如果为True,也会下载其他字段。

get_relevant_documents() 有一个参数,query:用于在维基百科中查找文档的自由文本

from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever()
API 参考:WikipediaRetriever

使用

docs = retriever.invoke("TOKYO GHOUL")
print(docs[0].page_content[:400])
Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who loo

在链中使用

与其他检索器类似,可以通过ChainsWikipediaRetriever集成到LLM应用程序中。

我们需要一个LLM或聊天模型:

pip install -qU "langchain[openai]"
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
"""
Answer the question based only on the context provided.
Context: {context}
Question: {question}
"""
)


def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)


chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke(
"Who is the main character in `Tokyo Ghoul` and does he transform into a ghoul?"
)
'The main character in Tokyo Ghoul is Ken Kaneki, who transforms into a ghoul after receiving an organ transplant from a ghoul named Rize.'

API 参考

有关所有 WikipediaRetriever 功能和配置的详细文档,请访问 API 参考