Skip to content

Pre/beta #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 153 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
b553602
Merge pull request #314 from VinciGit00/main
VinciGit00 May 29, 2024
4fcb990
fix: oneapi model
VinciGit00 May 29, 2024
6ea1d2c
ci(release): 1.5.3-beta.1 [skip ci]
semantic-release-bot May 29, 2024
1aa8c86
removed unused file
VinciGit00 May 29, 2024
4639f0c
fix: typo in prompt
May 30, 2024
e734830
Merge pull request #319 from jmfk/pre/beta
PeriniM May 30, 2024
b57bcef
ci(release): 1.5.3-beta.2 [skip ci]
semantic-release-bot May 30, 2024
cdba5ef
Create chinese.md
VinciGit00 May 30, 2024
6d1d91a
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 May 30, 2024
930f673
feat: removed rag node
VinciGit00 May 31, 2024
25352a5
Merge branch 'pre/beta' into temp
VinciGit00 May 31, 2024
25de33e
Merge pull request #320 from VinciGit00/temp
VinciGit00 May 31, 2024
38d138e
ci(release): 1.5.5-beta.1 [skip ci]
semantic-release-bot May 31, 2024
f5cbd80
feat: add pdf scraper multi graph
VinciGit00 Jun 1, 2024
4d42d7b
add example
VinciGit00 Jun 1, 2024
5bda918
feat: add json multiscraper
VinciGit00 Jun 1, 2024
fff1232
add rag node
VinciGit00 Jun 1, 2024
1fe4975
add openai and oneapi examples
VinciGit00 Jun 1, 2024
5cfc101
feat: add forcing format as json
VinciGit00 Jun 2, 2024
1d217e4
ci(release): 1.6.0-beta.1 [skip ci]
semantic-release-bot Jun 2, 2024
fa9722d
add examples
VinciGit00 Jun 2, 2024
b408655
feat: add csv scraper and xml scraper multi
VinciGit00 Jun 2, 2024
743dfe1
add all possible examples
VinciGit00 Jun 3, 2024
79ace11
Merge pull request #323 from VinciGit00/refactoring-pdf_scraper
PeriniM Jun 3, 2024
ed1dc0b
ci(release): 1.6.0-beta.2 [skip ci]
semantic-release-bot Jun 3, 2024
1dde43c
add new examples
VinciGit00 Jun 3, 2024
8de720d
feat: removed a bug
VinciGit00 Jun 3, 2024
b70cb37
ci(release): 1.6.0-beta.3 [skip ci]
semantic-release-bot Jun 3, 2024
c8d556d
feat: fix an if
VinciGit00 Jun 3, 2024
f36dd8b
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 3, 2024
08a14ef
ci(release): 1.6.0-beta.4 [skip ci]
semantic-release-bot Jun 3, 2024
55b4865
Merge pull request #338 from VinciGit00/main
VinciGit00 Jun 4, 2024
244aada
feat: refactoring of an in if
VinciGit00 Jun 4, 2024
dde0c7e
ci(release): 1.6.0-beta.5 [skip ci]
semantic-release-bot Jun 4, 2024
7a13a68
feat: refactoring of rag node
VinciGit00 Jun 4, 2024
acece72
Update cleanup_html.py
seyf97 Jun 4, 2024
7ed2fe8
feat: add dynamic caching
VinciGit00 Jun 4, 2024
4c0d0e9
Merge pull request #339 from seyf97/seyf97-link_extraction_patch
VinciGit00 Jun 4, 2024
f81442b
removed unused if
VinciGit00 Jun 4, 2024
58cd523
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 4, 2024
fff89f4
feat: refactoring of abstract graph
VinciGit00 Jun 4, 2024
ac8e7c1
ci(release): 1.6.0-beta.6 [skip ci]
semantic-release-bot Jun 4, 2024
376f758
feat(pydantic): added pydantic output schema
PeriniM Jun 4, 2024
f8b08e0
feat(append_node): append node to existing graph
PeriniM Jun 4, 2024
e96b701
Integrates with Burr's Forking/spawning ability
elijahbenizzy Jun 5, 2024
74fd530
Merge branch 'pre/beta' into 332-pydantic-schema-validation
VinciGit00 Jun 5, 2024
a7443a7
Merge pull request #341 from VinciGit00/332-pydantic-schema-validation
VinciGit00 Jun 5, 2024
cab5f68
ci(release): 1.6.0-beta.7 [skip ci]
semantic-release-bot Jun 5, 2024
5d20186
feat: add json as output
VinciGit00 Jun 5, 2024
7a6f016
ci(release): 1.6.0-beta.8 [skip ci]
semantic-release-bot Jun 5, 2024
450fde6
add get functions on the dictionary
VinciGit00 Jun 5, 2024
4f53b09
add examples for schema
VinciGit00 Jun 5, 2024
5c9843f
fix(schema): fixed json output
PeriniM Jun 5, 2024
5d1fbf8
feat(indexify-node): add example
PeriniM Jun 5, 2024
dd2b3a8
add examples
VinciGit00 Jun 5, 2024
d790361
feat: add caching
VinciGit00 Jun 6, 2024
543b487
add default folder for the cache
VinciGit00 Jun 7, 2024
f41a755
Merge pull request #356 from VinciGit00/321-integration-with-indexify
PeriniM Jun 7, 2024
ca8aff8
ci(release): 1.6.0-beta.9 [skip ci]
semantic-release-bot Jun 7, 2024
8696ade
docs: stylize badges in readme
iamgodot Jun 7, 2024
2000baa
Merge pull request #358 from iamgodot/docs
VinciGit00 Jun 8, 2024
e1f045b
feat: add new chunking function
VinciGit00 Jun 8, 2024
a6061cb
Merge pull request #344 from DAGWorks-Inc/burr-spawning-apps
PeriniM Jun 8, 2024
cfa1336
feat(version): update burr version
PeriniM Jun 8, 2024
8228225
Merge pull request #360 from VinciGit00/burr_integration
PeriniM Jun 8, 2024
4d0d8fa
ci(release): 1.6.0-beta.10 [skip ci]
semantic-release-bot Jun 8, 2024
1981230
add multi scraper integration
VinciGit00 Jun 8, 2024
cb00c4f
changed model
VinciGit00 Jun 8, 2024
c14fb88
add examples
VinciGit00 Jun 9, 2024
fe8083f
Update pdf_scraper_graph_haiku.py
VinciGit00 Jun 9, 2024
bde0249
add examples
VinciGit00 Jun 9, 2024
13f8ca5
"Refactor SearchLinkNode test: simplify setup, add patching for execu…
tejhande Jun 9, 2024
9326637
Merge branch 'main' into pre/beta
VinciGit00 Jun 9, 2024
3453ac0
ci(release): 1.6.0-beta.11 [skip ci]
semantic-release-bot Jun 9, 2024
8bec47e
Merge branch 'pre/beta' into all
VinciGit00 Jun 9, 2024
2f0568c
Merge pull request #364 from VinciGit00/all
VinciGit00 Jun 9, 2024
84a74b2
ci(release): 1.7.0-beta.1 [skip ci]
semantic-release-bot Jun 9, 2024
b0511ae
feat: Add tests for RobotsNode and update test setup
tejhande Jun 10, 2024
08f1be6
feat: Add tests for SmartScraperGraph using sample text and configura…
tejhande Jun 10, 2024
c286b16
feat: Add tests for SmartScraperGraph using sample text and configura…
tejhande Jun 10, 2024
9e7038c
feat: Add tests for SmartScraperGraph using sample text and configura…
tejhande Jun 10, 2024
c927145
feat: Add tests for SmartScraperGraph using sample text and configura…
tejhande Jun 10, 2024
40747c3
Merge branch 'main' into main
tejhande Jun 10, 2024
0748e10
Merge pull request #365 from tejhande/main
VinciGit00 Jun 10, 2024
e5bb5ae
ci(release): 1.7.0-beta.2 [skip ci]
semantic-release-bot Jun 10, 2024
8f405ff
Add the ability to specify load state
Jun 11, 2024
fa951b4
Merge pull request #368 from stevenmichaelthomas/wait-for-network-idle
VinciGit00 Jun 11, 2024
c881f64
fix(cache): correctly pass the node arguments and logging
PeriniM Jun 11, 2024
edddb68
docs(cache): added cache_path param
PeriniM Jun 11, 2024
589da1d
Merge pull request #351 from VinciGit00/faiss_integration
PeriniM Jun 11, 2024
893aadd
Merge pull request #359 from VinciGit00/semchunk_integration
PeriniM Jun 11, 2024
5d692bf
feat(schema): merge scripts to follow pydantic schema
PeriniM Jun 11, 2024
a10b060
Merge pull request #361 from VinciGit00/multi_scraper_implementation
PeriniM Jun 11, 2024
650c3aa
docs(scriptcreator): enhance documentation
PeriniM Jun 11, 2024
15421ef
feat(merge): add scriptcreatormulti, rag cache and semchunk
PeriniM Jun 11, 2024
ab00f23
fix(node): fixed generate answer node pydantic schema
PeriniM Jun 11, 2024
6f994ce
Merge pull request #370 from VinciGit00/dev
PeriniM Jun 11, 2024
85a75c8
ci(release): 1.7.0-beta.3 [skip ci]
semantic-release-bot Jun 11, 2024
6b4cdf9
fix: common params
PeriniM Jun 12, 2024
b4d7532
ci(release): 1.7.0-beta.4 [skip ci]
semantic-release-bot Jun 12, 2024
828bdee
Update smart_scraper_graph.py
supercoder-dev Jun 12, 2024
879c94a
Update cleanup_html.py
supercoder-dev Jun 12, 2024
d0e300a
Update fetch_node.py
supercoder-dev Jun 12, 2024
5065aa0
Merge branch 'pre/beta' into supercoder-327
VinciGit00 Jun 12, 2024
0145b8f
Merge pull request #372 from supercoder-dev/supercoder-327
VinciGit00 Jun 12, 2024
1e7f334
feat: update fetch node
VinciGit00 Jun 12, 2024
9952d98
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 12, 2024
79b8326
ci(release): 1.7.0-beta.5 [skip ci]
semantic-release-bot Jun 12, 2024
e6c7940
feat: add Parse_Node
VinciGit00 Jun 12, 2024
58a257f
update model tokens
VinciGit00 Jun 12, 2024
e45f159
enhanced performance and readibility
VinciGit00 Jun 12, 2024
dc1340e
Update generate_answer_pdf_node.py
VinciGit00 Jun 12, 2024
1705046
Update pdf_scraper_graph.py
VinciGit00 Jun 12, 2024
071f3d1
docs: fix label&logo for github action badges
iamgodot Jun 12, 2024
cc9f5cc
Merge pull request #375 from iamgodot/docs
VinciGit00 Jun 12, 2024
17dd936
test: fix tests for fetch node with proper mock&refactor
iamgodot Jun 13, 2024
2a9ab69
Strip out the scheme from the server address URI
Kshitij-Jande Jun 13, 2024
9dd2a63
Merge pull request #379 from iamgodot/test
VinciGit00 Jun 13, 2024
49c7e0e
fix: test for fetch node
VinciGit00 Jun 13, 2024
dae3158
ci(release): 1.7.0-beta.6 [skip ci]
semantic-release-bot Jun 13, 2024
a6757ac
Merge pull request #380 from Kshitij-Jande/main
VinciGit00 Jun 13, 2024
283b61f
docs: better logging
PeriniM Jun 13, 2024
7a34562
refactoring of merging answers nodes
VinciGit00 Jun 13, 2024
49f1795
Merge pull request #381 from VinciGit00/improve_generate_answer_nodes
PeriniM Jun 14, 2024
91c5b5a
fix(multi): updated multi pdf scraper with schema
PeriniM Jun 14, 2024
203de83
fix(pdf): correctly read .pdf files
PeriniM Jun 14, 2024
12f4386
Merge branch 'pre/beta' into 349-problem-with-scrapegraphaigraphspdf_…
PeriniM Jun 14, 2024
1ae7c6a
Merge pull request #373 from VinciGit00/349-problem-with-scrapegrapha…
PeriniM Jun 14, 2024
7da6cd2
ci(release): 1.7.0-beta.7 [skip ci]
semantic-release-bot Jun 14, 2024
9b0e627
changed source to text
PeriniM Jun 14, 2024
d8a99e9
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
PeriniM Jun 14, 2024
09cb6e9
refactor: add missing schemas and renamed files
PeriniM Jun 14, 2024
62b372b
fix: shallow copy config of create_embedder
Jun 15, 2024
c31706f
fixed tests
VinciGit00 Jun 15, 2024
3d34ea6
Merge pull request #384 from liaoliaojun/fix/merge
VinciGit00 Jun 16, 2024
a87702f
ci(release): 1.7.0-beta.8 [skip ci]
semantic-release-bot Jun 16, 2024
2419003
fix: fix robot node
VinciGit00 Jun 16, 2024
0c5d6e2
ci(release): 1.7.0-beta.9 [skip ci]
semantic-release-bot Jun 16, 2024
4c8becc
overwrite common params to affect nodes config
PeriniM Jun 16, 2024
d8d5cd2
Update abstract_graph.py
shubihu Jun 17, 2024
da93162
Merge branch 'pre/beta' into main
VinciGit00 Jun 17, 2024
d7cd1df
Merge pull request #387 from shubihu/main
VinciGit00 Jun 17, 2024
7f3b907
ci(release): 1.7.0-beta.10 [skip ci]
semantic-release-bot Jun 17, 2024
6a753f2
add smart_scraper_openai_test
VinciGit00 Jun 17, 2024
93342b4
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 17, 2024
080a318
feat(telemetry): add telemetry module
PeriniM Jun 17, 2024
39bf4c9
docs: refactor graph section and added telemetry
PeriniM Jun 17, 2024
d9559ef
Merge pull request #389 from VinciGit00/telemetry
PeriniM Jun 17, 2024
c016efd
ci(release): 1.7.0-beta.11 [skip ci]
semantic-release-bot Jun 17, 2024
03ffebc
fix: add chinese embedding model
VinciGit00 Jun 17, 2024
48ff87f
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 Jun 17, 2024
a794405
ci(release): 1.7.0-beta.12 [skip ci]
semantic-release-bot Jun 17, 2024
a8251bd
add new lock files
VinciGit00 Jun 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ docs/source/_static/
venv/
.venv/
.vscode/
.conda/

# exclude pdf, mp3
*.pdf
Expand All @@ -38,3 +39,6 @@ lib/
*.html
.idea

# extras
cache/
run_smart_scraper.py
295 changes: 288 additions & 7 deletions CHANGELOG.md

Large diffs are not rendered by default.

17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
| [русский](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/russian.md)


[![Downloads](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai)
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
[![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
[![CodeQL](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://img.shields.io/pepy/dt/scrapegraphai?style=for-the-badge)](https://pepy.tech/project/scrapegraphai)
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen?style=for-the-badge)](https://github.com/pylint-dev/pylint)
[![Pylint](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/pylint.yml?label=Pylint&logo=github&style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
[![CodeQL](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/codeql.yml?label=CodeQL&logo=github&style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)

ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.).
Expand Down Expand Up @@ -46,11 +46,14 @@ The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.r
Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).

## πŸ’» Usage
There are three main scraping pipelines that can be used to extract information from a website (or local file):
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file):
- `SmartScraperGraph`: single-page scraper that only needs a user prompt and an input source;
- `SearchGraph`: multi-page scraper that extracts information from the top n search results of a search engine;
- `SpeechGraph`: single-page scraper that extracts information from a website and generates an audio file.
- `SmartScraperMultiGraph`: multiple page scraper given a single prompt
- `ScriptCreatorGraph`: single-page scraper that extracts information from a website and generates a Python script.

- `SmartScraperMultiGraph`: multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources;
- `ScriptCreatorMultiGraph`: multi-page scraper that generates a Python script for extracting information from multiple pages given a single prompt and a list of sources.

It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.

Expand Down
Binary file added docs/assets/scriptcreatorgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,7 @@
"source_repository": "https://github.com/VinciGit00/Scrapegraph-ai/",
"source_branch": "main",
"source_directory": "docs/source/",
}
'navigation_with_keys': True,
'sidebar_hide_name': False,
}

3 changes: 0 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,6 @@
:caption: Scrapers

scrapers/graphs
scrapers/llm
scrapers/graph_config
scrapers/benchmarks

.. toctree::
:maxdepth: 2
Expand Down
1 change: 1 addition & 0 deletions docs/source/scrapers/graph_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Some interesting ones are:
- `loader_kwargs`: A dictionary with additional parameters to be passed to the `Loader` class, such as `proxy`.
- `burr_kwargs`: A dictionary with additional parameters to enable `Burr` graphical user interface.
- `max_images`: The maximum number of images to be analyzed. Useful in `OmniScraperGraph` and `OmniSearchGraph`.
- `cache_path`: The path where the cache files will be saved. If already exists, the cache will be loaded from this path.

.. _Burr:

Expand Down
192 changes: 8 additions & 184 deletions docs/source/scrapers/graphs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,187 +3,11 @@ Graphs

Graphs are scraping pipelines aimed at solving specific tasks. They are composed by nodes which can be configured individually to address different aspects of the task (fetching data, extracting information, etc.).

There are several types of graphs available in the library, each with its own purpose and functionality. The most common ones are:

- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information using LLM.
- **SmartScraperMultiGraph**: multi-page scraper that requires a user-defined prompt and a list of URLs (or local files) to extract information using LLM. It is built on top of SmartScraperGraph.
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
- **ScriptCreatorGraph**: script generator that creates a Python script to scrape a website using the specified library (e.g. BeautifulSoup). It requires a user-defined prompt and a URL (or local file).

With the introduction of `GPT-4o`, two new powerful graphs have been created:

- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
- **OmniSearchGraph**: similar to `SearchGraph`, but with the ability to scrape images and describe them.


.. note::

They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.


.. note::

We can pass an optional `schema` parameter to the graph constructor to specify the output schema. If not provided or set to `None`, the schema will be generated by the LLM itself.

OmniScraperGraph
^^^^^^^^^^^^^^^^

.. image:: ../../assets/omniscrapergraph.png
:align: center
:width: 90%
:alt: OmniScraperGraph
|

First we define the graph configuration, which includes the LLM model and other parameters. Then we create an instance of the OmniScraperGraph class, passing the prompt, source, and configuration as arguments. Finally, we run the graph and print the result.
It will fetch the data from the source and extract the information based on the prompt in JSON format.

.. code-block:: python

from scrapegraphai.graphs import OmniScraperGraph

graph_config = {
"llm": {...},
}

omni_scraper_graph = OmniScraperGraph(
prompt="List me all the projects with their titles and image links and descriptions.",
source="https://perinim.github.io/projects",
config=graph_config,
schema=schema
)

result = omni_scraper_graph.run()
print(result)

OmniSearchGraph
^^^^^^^^^^^^^^^

.. image:: ../../assets/omnisearchgraph.png
:align: center
:width: 80%
:alt: OmniSearchGraph
|

Similar to OmniScraperGraph, we define the graph configuration, create multiple of the OmniSearchGraph class, and run the graph.
It will create a search query, fetch the first n results from the search engine, run n OmniScraperGraph instances, and return the results in JSON format.

.. code-block:: python

from scrapegraphai.graphs import OmniSearchGraph

graph_config = {
"llm": {...},
}

# Create the OmniSearchGraph instance
omni_search_graph = OmniSearchGraph(
prompt="List me all Chioggia's famous dishes and describe their pictures.",
config=graph_config,
schema=schema
)

# Run the graph
result = omni_search_graph.run()
print(result)

SmartScraperGraph & SmartScraperMultiGraph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. image:: ../../assets/smartscrapergraph.png
:align: center
:width: 90%
:alt: SmartScraperGraph
|

First we define the graph configuration, which includes the LLM model and other parameters. Then we create an instance of the SmartScraperGraph class, passing the prompt, source, and configuration as arguments. Finally, we run the graph and print the result.
It will fetch the data from the source and extract the information based on the prompt in JSON format.

.. code-block:: python

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
"llm": {...},
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their descriptions",
source="https://perinim.github.io/projects",
config=graph_config,
schema=schema
)

result = smart_scraper_graph.run()
print(result)

**SmartScraperMultiGraph** is similar to SmartScraperGraph, but it can handle multiple sources. We define the graph configuration, create an instance of the SmartScraperMultiGraph class, and run the graph.

SearchGraph
^^^^^^^^^^^

.. image:: ../../assets/searchgraph.png
:align: center
:width: 80%
:alt: SearchGraph
|

Similar to SmartScraperGraph, we define the graph configuration, create an instance of the SearchGraph class, and run the graph.
It will create a search query, fetch the first n results from the search engine, run n SmartScraperGraph instances, and return the results in JSON format.


.. code-block:: python

from scrapegraphai.graphs import SearchGraph

graph_config = {
"llm": {...},
"embeddings": {...},
}

# Create the SearchGraph instance
search_graph = SearchGraph(
prompt="List me all the traditional recipes from Chioggia",
config=graph_config,
schema=schema
)

# Run the graph
result = search_graph.run()
print(result)


SpeechGraph
^^^^^^^^^^^

.. image:: ../../assets/speechgraph.png
:align: center
:width: 90%
:alt: SpeechGraph
|

Similar to SmartScraperGraph, we define the graph configuration, create an instance of the SpeechGraph class, and run the graph.
It will fetch the data from the source, extract the information based on the prompt, and generate an audio file with the answer, as well as the answer itself, in JSON format.

.. code-block:: python

from scrapegraphai.graphs import SpeechGraph

graph_config = {
"llm": {...},
"tts_model": {...},
}

# ************************************************
# Create the SpeechGraph instance and run it
# ************************************************

speech_graph = SpeechGraph(
prompt="Make a detailed audio summary of the projects.",
source="https://perinim.github.io/projects/",
config=graph_config,
schema=schema
)

result = speech_graph.run()
print(result)
.. toctree::
:maxdepth: 4

types
llm
graph_config
benchmarks
telemetry
72 changes: 72 additions & 0 deletions docs/source/scrapers/telemetry.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
===============
Usage Analytics
===============

ScrapeGraphAI collects **anonymous** usage data by default to improve the library and guide development efforts.

**Events Captured**

We capture events in the following scenarios:

1. When a ``Graph`` finishes running.
2. When an exception is raised in one of the nodes.

**Data Collected**

The data captured is limited to:

- Operating System and Python version
- A persistent UUID to identify the session, stored in ``~/.scrapegraphai.conf``

Additionally, the following properties are collected:

.. code-block:: python

properties = {
"graph_name": graph_name,
"llm_model": llm_model_name,
"embedder_model": embedder_model_name,
"source_type": source_type,
"execution_time": execution_time,
"error_node": error_node_name,
}

For more details, refer to the `telemetry.py <https://github.com/VinciGit00/Scrapegraph-ai/blob/main/scrapegraphai/telemetry/telemetry.py>`_ module.

**Opting Out**

If you prefer not to participate in telemetry, you can opt out using any of the following methods:

1. **Programmatically Disable Telemetry**:

Add the following code at the beginning of your script:

.. code-block:: python

from scrapegraphai import telemetry
telemetry.disable_telemetry()

2. **Configuration File**:

Set the ``telemetry_enabled`` key to ``false`` in ``~/.scrapegraphai.conf`` under the ``[DEFAULT]`` section:

.. code-block:: ini

[DEFAULT]
telemetry_enabled = False

3. **Environment Variable**:

- **For a Shell Session**:

.. code-block:: bash

export SCRAPEGRAPHAI_TELEMETRY_ENABLED=false

- **For a Single Command**:

.. code-block:: bash

SCRAPEGRAPHAI_TELEMETRY_ENABLED=false python my_script.py

By following any of these methods, you can easily opt out of telemetry and ensure your usage data is not collected.
Loading
Loading