Skip to content

Commit d27cad5

Browse files
committed
docs(graph): added new graphs and schema
1 parent 5684578 commit d27cad5

File tree

2 files changed

+22
-11
lines changed

2 files changed

+22
-11
lines changed

docs/source/scrapers/graphs.rst

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,29 @@ Graphs
33

44
Graphs are scraping pipelines aimed at solving specific tasks. They are composed by nodes which can be configured individually to address different aspects of the task (fetching data, extracting information, etc.).
55

6-
There are three types of graphs available in the library:
6+
There are several types of graphs available in the library, each with its own purpose and functionality. The most common ones are:
77

8-
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information from using LLM.
8+
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information using LLM.
9+
- **SmartScraperMultiGraph**: multi-page scraper that requires a user-defined prompt and a list of URLs (or local files) to extract information using LLM. It is built on top of SmartScraperGraph.
910
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
1011
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
12+
- **ScriptCreatorGraph**: script generator that creates a Python script to scrape a website using the specified library (e.g. BeautifulSoup). It requires a user-defined prompt and a URL (or local file).
1113

1214
With the introduction of `GPT-4o`, two new powerful graphs have been created:
1315

1416
- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
1517
- **OmniSearchGraph**: similar to `SearchGraph`, but with the ability to scrape images and describe them.
1618

19+
1720
.. note::
1821

1922
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
2023

24+
25+
.. note::
26+
27+
We can pass an optional `schema` parameter to the graph constructor to specify the output schema. If not provided or set to `None`, the schema will be generated by the LLM itself.
28+
2129
OmniScraperGraph
2230
^^^^^^^^^^^^^^^^
2331

@@ -41,7 +49,8 @@ It will fetch the data from the source and extract the information based on the
4149
omni_scraper_graph = OmniScraperGraph(
4250
prompt="List me all the projects with their titles and image links and descriptions.",
4351
source="https://perinim.github.io/projects",
44-
config=graph_config
52+
config=graph_config,
53+
schema=schema
4554
)
4655
4756
result = omni_scraper_graph.run()
@@ -70,15 +79,16 @@ It will create a search query, fetch the first n results from the search engine,
7079
# Create the OmniSearchGraph instance
7180
omni_search_graph = OmniSearchGraph(
7281
prompt="List me all Chioggia's famous dishes and describe their pictures.",
73-
config=graph_config
82+
config=graph_config,
83+
schema=schema
7484
)
7585
7686
# Run the graph
7787
result = omni_search_graph.run()
7888
print(result)
7989
80-
SmartScraperGraph
81-
^^^^^^^^^^^^^^^^^
90+
SmartScraperGraph & SmartScraperMultiGraph
91+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8292

8393
.. image:: ../../assets/smartscrapergraph.png
8494
:align: center
@@ -100,12 +110,14 @@ It will fetch the data from the source and extract the information based on the
100110
smart_scraper_graph = SmartScraperGraph(
101111
prompt="List me all the projects with their descriptions",
102112
source="https://perinim.github.io/projects",
103-
config=graph_config
113+
config=graph_config,
114+
schema=schema
104115
)
105116
106117
result = smart_scraper_graph.run()
107118
print(result)
108119
120+
**SmartScraperMultiGraph** is similar to SmartScraperGraph, but it can handle multiple sources. We define the graph configuration, create an instance of the SmartScraperMultiGraph class, and run the graph.
109121

110122
SearchGraph
111123
^^^^^^^^^^^
@@ -132,7 +144,8 @@ It will create a search query, fetch the first n results from the search engine,
132144
# Create the SearchGraph instance
133145
search_graph = SearchGraph(
134146
prompt="List me all the traditional recipes from Chioggia",
135-
config=graph_config
147+
config=graph_config,
148+
schema=schema
136149
)
137150
138151
# Run the graph
@@ -169,6 +182,7 @@ It will fetch the data from the source, extract the information based on the pro
169182
prompt="Make a detailed audio summary of the projects.",
170183
source="https://perinim.github.io/projects/",
171184
config=graph_config,
185+
schema=schema
172186
)
173187
174188
result = speech_graph.run()

pyproject.toml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ authors = [
1212
{ name = "Lorenzo Padoan", email = "[email protected]" }
1313
]
1414
dependencies = [
15-
# python = ">=3.9, <3.12"
1615
"langchain==0.1.15",
1716
"langchain-openai==0.1.6",
1817
"langchain-google-genai==1.0.3",
@@ -32,8 +31,6 @@ dependencies = [
3231
"playwright==1.43.0",
3332
"google==3.0.0",
3433
"yahoo-search-py==0.3",
35-
"networkx==3.3",
36-
"pyvis==0.3.2",
3734
"undetected-playwright==0.3.0",
3835
]
3936

0 commit comments

Comments
 (0)