You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/scrapers/graphs.rst
+22-8Lines changed: 22 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -3,21 +3,29 @@ Graphs
3
3
4
4
Graphs are scraping pipelines aimed at solving specific tasks. They are composed by nodes which can be configured individually to address different aspects of the task (fetching data, extracting information, etc.).
5
5
6
-
There are three types of graphs available in the library:
6
+
There are several types of graphs available in the library, each with its own purpose and functionality. The most common ones are:
7
7
8
-
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information from using LLM.
8
+
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information using LLM.
9
+
- **SmartScraperMultiGraph**: multi-page scraper that requires a user-defined prompt and a list of URLs (or local files) to extract information using LLM. It is built on top of SmartScraperGraph.
9
10
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
10
11
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
12
+
- **ScriptCreatorGraph**: script generator that creates a Python script to scrape a website using the specified library (e.g. BeautifulSoup). It requires a user-defined prompt and a URL (or local file).
11
13
12
14
With the introduction of `GPT-4o`, two new powerful graphs have been created:
13
15
14
16
- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
15
17
- **OmniSearchGraph**: similar to `SearchGraph`, but with the ability to scrape images and describe them.
16
18
19
+
17
20
.. note::
18
21
19
22
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
20
23
24
+
25
+
.. note::
26
+
27
+
We can pass an optional `schema` parameter to the graph constructor to specify the output schema. If not provided or set to `None`, the schema will be generated by the LLM itself.
28
+
21
29
OmniScraperGraph
22
30
^^^^^^^^^^^^^^^^
23
31
@@ -41,7 +49,8 @@ It will fetch the data from the source and extract the information based on the
41
49
omni_scraper_graph = OmniScraperGraph(
42
50
prompt="List me all the projects with their titles and image links and descriptions.",
43
51
source="https://perinim.github.io/projects",
44
-
config=graph_config
52
+
config=graph_config,
53
+
schema=schema
45
54
)
46
55
47
56
result = omni_scraper_graph.run()
@@ -70,15 +79,16 @@ It will create a search query, fetch the first n results from the search engine,
70
79
# Create the OmniSearchGraph instance
71
80
omni_search_graph = OmniSearchGraph(
72
81
prompt="List me all Chioggia's famous dishes and describe their pictures.",
73
-
config=graph_config
82
+
config=graph_config,
83
+
schema=schema
74
84
)
75
85
76
86
# Run the graph
77
87
result = omni_search_graph.run()
78
88
print(result)
79
89
80
-
SmartScraperGraph
81
-
^^^^^^^^^^^^^^^^^
90
+
SmartScraperGraph & SmartScraperMultiGraph
91
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
82
92
83
93
.. image:: ../../assets/smartscrapergraph.png
84
94
:align:center
@@ -100,12 +110,14 @@ It will fetch the data from the source and extract the information based on the
100
110
smart_scraper_graph = SmartScraperGraph(
101
111
prompt="List me all the projects with their descriptions",
102
112
source="https://perinim.github.io/projects",
103
-
config=graph_config
113
+
config=graph_config,
114
+
schema=schema
104
115
)
105
116
106
117
result = smart_scraper_graph.run()
107
118
print(result)
108
119
120
+
**SmartScraperMultiGraph** is similar to SmartScraperGraph, but it can handle multiple sources. We define the graph configuration, create an instance of the SmartScraperMultiGraph class, and run the graph.
109
121
110
122
SearchGraph
111
123
^^^^^^^^^^^
@@ -132,7 +144,8 @@ It will create a search query, fetch the first n results from the search engine,
132
144
# Create the SearchGraph instance
133
145
search_graph = SearchGraph(
134
146
prompt="List me all the traditional recipes from Chioggia",
135
-
config=graph_config
147
+
config=graph_config,
148
+
schema=schema
136
149
)
137
150
138
151
# Run the graph
@@ -169,6 +182,7 @@ It will fetch the data from the source, extract the information based on the pro
169
182
prompt="Make a detailed audio summary of the projects.",
0 commit comments