Apify Actor
Apify Actors are cloud programs designed for a wide range of web scraping, crawling, and data extraction tasks. These actors facilitate automated data gathering from the web, enabling users to extract, process, and store information efficiently. Actors can be used to perform tasks like scraping e-commerce sites for product details, monitoring price changes, or gathering search engine results. They integrate seamlessly with Apify Datasets, allowing the structured data collected by actors to be stored, managed, and exported in formats like JSON, CSV, or Excel for further analysis or use.
概览
本笔记本将指导您如何将Apify Actors与LangChain结合使用,以实现网页抓取和数据提取的自动化。 langchain-apify 包将Apify基于云的工具与LangChain智能体集成,从而为人工智能应用提供高效的数据收集与处理。
设置
此集成位于 langchain-apify 包中。该包可以使用 pip 安装。
%pip install langchain-apify
先决条件
import os
os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
实例化
在这里,我们实例化了 ApifyActorsTool 以能够调用 RAG 网络浏览器 Apify Actor。该Actor为AI和LLM应用程序提供了网络浏览功能,类似于ChatGPT中的网络浏览功能。 Apify Store 中的任何Actor都可以以这种方式使用。
from langchain_apify import ApifyActorsTool
tool = ApifyActorsTool("apify/rag-web-browser")
调用
ApifyActorsTool 接受单个参数,该参数是 run_input - 一个作为运行输入传递给 Actor 的字典。运行输入模式的文档可以在 Actor 详细信息页面的输入部分找到。参见 RAG Web 浏览器输入模式。
tool.invoke({"run_input": {"query": "what is apify?", "maxResults": 2}})
链式调用
我们可以将创建的工具提供给代理。当被要求搜索信息时,代理将调用Apify Actor,它将搜索网络,然后检索搜索结果。
%pip install langgraph langchain-openai
from langchain_core.messages import ToolMessage
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
model = ChatOpenAI(model="gpt-4o")
tools = [tool]
graph = create_react_agent(model, tools=tools)
inputs = {"messages": [("user", "search for what is Apify")]}
for s in graph.stream(inputs, stream_mode="values"):
message = s["messages"][-1]
# skip tool messages
if isinstance(message, ToolMessage):
continue
message.pretty_print()
================================[1m Human Message [0m=================================
search for what is Apify
==================================[1m Ai Message [0m==================================
Tool Calls:
apify_actor_apify_rag-web-browser (call_27mjHLzDzwa5ZaHWCMH510lm)
Call ID: call_27mjHLzDzwa5ZaHWCMH510lm
Args:
run_input: {"run_input":{"query":"Apify","maxResults":3,"outputFormats":["markdown"]}}
==================================[1m Ai Message [0m==================================
Apify is a comprehensive platform for web scraping, browser automation, and data extraction. It offers a wide array of tools and services that cater to developers and businesses looking to extract data from websites efficiently and effectively. Here's an overview of Apify:
1. **Ecosystem and Tools**:
- Apify provides an ecosystem where developers can build, deploy, and publish data extraction and web automation tools called Actors.
- The platform supports various use cases such as extracting data from social media platforms, conducting automated browser-based tasks, and more.
2. **Offerings**:
- Apify offers over 3,000 ready-made scraping tools and code templates.
- Users can also build custom solutions or hire Apify's professional services for more tailored data extraction needs.
3. **Technology and Integration**:
- The platform supports integration with popular tools and services like Zapier, GitHub, Google Sheets, Pinecone, and more.
- Apify supports open-source tools and technologies such as JavaScript, Python, Puppeteer, Playwright, Selenium, and its own Crawlee library for web crawling and browser automation.
4. **Community and Learning**:
- Apify hosts a community on Discord where developers can get help and share expertise.
- It offers educational resources through the Web Scraping Academy to help users become proficient in data scraping and automation.
5. **Enterprise Solutions**:
- Apify provides enterprise-grade web data extraction solutions with high reliability, 99.95% uptime, and compliance with SOC2, GDPR, and CCPA standards.
For more information, you can visit [Apify's official website](https://apify.com/) or their [GitHub page](https://github.com/apify) which contains their code repositories and further details about their projects.
API 参考
有关如何使用此集成的更多信息,请参阅git 仓库或Apify 集成文档。