Skip to content

Commit 650c3aa

Browse files
committed
docs(scriptcreator): enhance documentation
1 parent a10b060 commit 650c3aa

File tree

5 files changed

+81
-6
lines changed

5 files changed

+81
-6
lines changed

docs/assets/scriptcreatorgraph.png

53.7 KB
Loading

docs/source/scrapers/graphs.rst

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,15 @@ Graphs are scraping pipelines aimed at solving specific tasks. They are composed
66
There are several types of graphs available in the library, each with its own purpose and functionality. The most common ones are:
77

88
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information using LLM.
9-
- **SmartScraperMultiGraph**: multi-page scraper that requires a user-defined prompt and a list of URLs (or local files) to extract information using LLM. It is built on top of SmartScraperGraph.
109
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
1110
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
1211
- **ScriptCreatorGraph**: script generator that creates a Python script to scrape a website using the specified library (e.g. BeautifulSoup). It requires a user-defined prompt and a URL (or local file).
1312

13+
There are also two additional graphs that can handle multiple sources:
14+
15+
- **SmartScraperMultiGraph**: similar to `SmartScraperGraph`, but with the ability to handle multiple sources.
16+
- **ScriptCreatorMultiGraph**: similar to `ScriptCreatorGraph`, but with the ability to handle multiple sources.
17+
1418
With the introduction of `GPT-4o`, two new powerful graphs have been created:
1519

1620
- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
@@ -186,4 +190,37 @@ It will fetch the data from the source, extract the information based on the pro
186190
)
187191
188192
result = speech_graph.run()
189-
print(result)
193+
print(result)
194+
195+
196+
ScriptCreatorGraph & ScriptCreatorMultiGraph
197+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
198+
199+
.. image:: ../../assets/scriptcreatorgraph.png
200+
:align: center
201+
:width: 90%
202+
:alt: ScriptCreatorGraph
203+
204+
First we define the graph configuration, which includes the LLM model and other parameters.
205+
Then we create an instance of the ScriptCreatorGraph class, passing the prompt, source, and configuration as arguments. Finally, we run the graph and print the result.
206+
207+
.. code-block:: python
208+
209+
from scrapegraphai.graphs import ScriptCreatorGraph
210+
211+
graph_config = {
212+
"llm": {...},
213+
"library": "beautifulsoup4"
214+
}
215+
216+
script_creator_graph = ScriptCreatorGraph(
217+
prompt="Create a Python script to scrape the projects.",
218+
source="https://perinim.github.io/projects/",
219+
config=graph_config,
220+
schema=schema
221+
)
222+
223+
result = script_creator_graph.run()
224+
print(result)
225+
226+
**ScriptCreatorMultiGraph** is similar to ScriptCreatorGraph, but it can handle multiple sources. We define the graph configuration, create an instance of the ScriptCreatorMultiGraph class, and run the graph.

requirements-dev.lock

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ anyio==4.3.0
3030
# via openai
3131
# via starlette
3232
# via watchfiles
33+
async-timeout==4.0.3
34+
# via aiohttp
35+
# via langchain
3336
attrs==23.2.0
3437
# via aiohttp
3538
# via jsonschema
@@ -47,7 +50,8 @@ boto3==1.34.113
4750
botocore==1.34.113
4851
# via boto3
4952
# via s3transfer
50-
burr==0.19.1
53+
burr==0.22.1
54+
# via burr
5155
# via scrapegraphai
5256
cachetools==5.3.3
5357
# via google-auth
@@ -63,6 +67,13 @@ click==8.1.7
6367
# via streamlit
6468
# via typer
6569
# via uvicorn
70+
colorama==0.4.6
71+
# via click
72+
# via loguru
73+
# via pytest
74+
# via sphinx
75+
# via tqdm
76+
# via uvicorn
6677
contourpy==1.2.1
6778
# via matplotlib
6879
cycler==0.12.1
@@ -82,6 +93,9 @@ docutils==0.19
8293
# via sphinx
8394
email-validator==2.1.1
8495
# via fastapi
96+
exceptiongroup==1.2.1
97+
# via anyio
98+
# via pytest
8599
faiss-cpu==1.8.0
86100
# via scrapegraphai
87101
fastapi==0.111.0
@@ -136,6 +150,7 @@ graphviz==0.20.3
136150
# via scrapegraphai
137151
greenlet==3.0.3
138152
# via playwright
153+
# via sqlalchemy
139154
groq==0.8.0
140155
# via langchain-groq
141156
grpcio==1.64.0
@@ -170,6 +185,10 @@ idna==3.7
170185
# via yarl
171186
imagesize==1.4.1
172187
# via sphinx
188+
importlib-metadata==7.1.0
189+
# via sphinx
190+
importlib-resources==6.4.0
191+
# via matplotlib
173192
iniconfig==2.0.0
174193
# via pytest
175194
jinja2==3.1.4
@@ -430,6 +449,8 @@ tokenizers==0.19.1
430449
# via anthropic
431450
toml==0.10.2
432451
# via streamlit
452+
tomli==2.0.1
453+
# via pytest
433454
toolz==0.12.1
434455
# via altair
435456
tornado==6.4
@@ -443,7 +464,9 @@ tqdm==4.66.4
443464
typer==0.12.3
444465
# via fastapi-cli
445466
typing-extensions==4.12.0
467+
# via altair
446468
# via anthropic
469+
# via anyio
447470
# via fastapi
448471
# via fastapi-pagination
449472
# via google-generativeai
@@ -455,9 +478,11 @@ typing-extensions==4.12.0
455478
# via pyee
456479
# via sf-hamilton
457480
# via sqlalchemy
481+
# via starlette
458482
# via streamlit
459483
# via typer
460484
# via typing-inspect
485+
# via uvicorn
461486
typing-inspect==0.9.0
462487
# via dataclasses-json
463488
# via sf-hamilton
@@ -475,11 +500,16 @@ urllib3==1.26.18
475500
uvicorn==0.29.0
476501
# via burr
477502
# via fastapi
478-
uvloop==0.19.0
479-
# via uvicorn
503+
watchdog==4.0.1
504+
# via streamlit
480505
watchfiles==0.21.0
481506
# via uvicorn
482507
websockets==12.0
483508
# via uvicorn
509+
win32-setctime==1.1.0
510+
# via loguru
484511
yarl==1.9.4
485512
# via aiohttp
513+
zipp==3.19.2
514+
# via importlib-metadata
515+
# via importlib-resources

requirements.lock

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ anyio==4.3.0
2222
# via groq
2323
# via httpx
2424
# via openai
25+
async-timeout==4.0.3
26+
# via aiohttp
27+
# via langchain
2528
attrs==23.2.0
2629
# via aiohttp
2730
beautifulsoup4==4.12.3
@@ -40,6 +43,8 @@ certifi==2024.2.2
4043
# via requests
4144
charset-normalizer==3.3.2
4245
# via requests
46+
colorama==0.4.6
47+
# via tqdm
4348
dataclasses-json==0.6.6
4449
# via langchain
4550
# via langchain-community
@@ -49,6 +54,8 @@ distro==1.9.0
4954
# via anthropic
5055
# via groq
5156
# via openai
57+
exceptiongroup==1.2.1
58+
# via anyio
5259
faiss-cpu==1.8.0
5360
# via scrapegraphai
5461
filelock==3.14.0
@@ -87,6 +94,7 @@ graphviz==0.20.3
8794
# via scrapegraphai
8895
greenlet==3.0.3
8996
# via playwright
97+
# via sqlalchemy
9098
groq==0.8.0
9199
# via langchain-groq
92100
grpcio==1.64.0
@@ -270,6 +278,7 @@ tqdm==4.66.4
270278
# via semchunk
271279
typing-extensions==4.12.0
272280
# via anthropic
281+
# via anyio
273282
# via google-generativeai
274283
# via groq
275284
# via huggingface-hub

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,5 @@ free-proxy==1.1.1
1616
langchain-groq==0.1.3
1717
playwright==1.43.0
1818
langchain-aws==0.1.2
19-
yahoo-search-py==0.3
2019
undetected-playwright==0.3.0
2120
semchunk==1.0.1

0 commit comments

Comments
 (0)