Skip to content

Commit ced2bbc

Browse files
committed
docs(concurrent): refactor theme and added benchmarck searchgraph
1 parent c0d26d6 commit ced2bbc

File tree

12 files changed

+64
-11
lines changed

12 files changed

+64
-11
lines changed

docs/assets/searchgraph.png

3.14 KB
Loading

docs/assets/smartscrapergraph.png

1.46 KB
Loading

docs/assets/speechgraph.png

2.37 KB
Loading

docs/source/conf.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,36 @@
1414
# import all the modules
1515
sys.path.insert(0, os.path.abspath('../../'))
1616

17-
project = 'scrapegraphai'
18-
copyright = '2024, Marco Vinciguerra'
19-
author = 'Marco Vinciguerra'
17+
project = 'ScrapeGraphAI'
18+
copyright = '2024, ScrapeGraphAI'
19+
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
20+
21+
html_last_updated_fmt = "%b %d, %Y"
2022

2123
# -- General configuration ---------------------------------------------------
2224
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
2325

24-
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
26+
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']
2527

2628
templates_path = ['_templates']
2729
exclude_patterns = []
2830

2931
# -- Options for HTML output -------------------------------------------------
3032
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3133

32-
html_theme = 'sphinx_rtd_theme'
34+
# html_theme = 'sphinx_rtd_theme'
35+
html_theme = 'sphinx_wagtail_theme'
36+
37+
html_theme_options = dict(
38+
project_name = "ScrapeGraphAI",
39+
logo = "scrapegraphai_logo.png",
40+
logo_alt = "ScrapeGraphAI",
41+
logo_height = 59,
42+
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
43+
logo_width = 45,
44+
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
45+
footer_links = ",".join(
46+
["Landing Page|https://scrapegraphai.com/",
47+
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
48+
),
49+
)

docs/source/getting_started/installation.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ The library is available on PyPI, so it can be installed using the following com
2121
2222
pip install scrapegraphai
2323
24-
**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
24+
.. important::
25+
26+
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
2527

2628
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
2729

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
scrapers/graphs
2525
scrapers/llm
2626
scrapers/graph_config
27+
scrapers/benchmarks
2728

2829
.. toctree::
2930
:maxdepth: 2

docs/source/introduction/overview.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,14 @@ This flexibility ensures that scrapers remain functional even when website layou
2424
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
2525
as well as local models which can run on your machine using Ollama.
2626

27-
Diagram
28-
=======
27+
Library Diagram
28+
===============
29+
2930
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
3031
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
3132
Finally the scraped and processed data gets fed to an LLM which generates a response.
3233

3334
.. image:: ../../assets/project_overview_diagram.png
3435
:align: center
36+
:width: 70%
3537
:alt: ScrapegraphAI Overview

docs/source/scrapers/benchmarks.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Benchmarks
2+
==========
3+
4+
SearchGraph
5+
^^^^^^^^^^^
6+
7+
`SearchGraph` instantiates multiple `SmartScraperGraph` object for each URL and extract the data from the HTML.
8+
A concurrent approach is used to speed up the process and the following table shows the time required for a scraping task with different **batch sizes**.
9+
Only two results are taken into account.
10+
11+
.. list-table:: SearchGraph
12+
:header-rows: 1
13+
14+
* - Batch Size
15+
- Total Time (s)
16+
* - 1
17+
- 31.1
18+
* - 2
19+
- 33.52
20+
* - 4
21+
- 28.47
22+
* - 16
23+
- 21.80

docs/source/scrapers/graph_config.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _Configuration:
2+
13
Additional Parameters
24
=====================
35

docs/source/scrapers/graphs.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ There are currently three types of graphs available in the library:
99
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
1010
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
1111

12-
**Note:** they all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the `LLM`_ and `Configuration`_ sections.
12+
.. note::
13+
14+
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
1315

1416
SmartScraperGraph
1517
^^^^^^^^^^^^^^^^^

docs/source/scrapers/llm.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _llm:
2+
13
LLM
24
===
35

@@ -7,7 +9,9 @@ These models are specified inside the graph configuration dictionary and can be
79
- **Local Models**: These models are hosted on the local machine and can be used without any API key.
810
- **API-based Models**: These models are hosted on the cloud and require an API key to access them (eg. OpenAI, Groq, etc).
911

10-
**Note**: If the emebedding model is not specified, the library will use the default one for that LLM, if available.
12+
.. note::
13+
14+
If the emebedding model is not specified, the library will use the default one for that LLM, if available.
1115

1216
Local Models
1317
------------

requirements-dev.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
sphinx==7.1.2
2-
sphinx-rtd-theme==2.0.0
2+
sphinx-wagtail-theme-6.3.0
33
pytest==8.0.0

0 commit comments

Comments
 (0)