Skip to content

Commit e256b75

Browse files
committed
docs(refactor): added proxy-rotation usage and refactor readthedocs
1 parent 5d6d996 commit e256b75

File tree

16 files changed

+398
-230
lines changed

16 files changed

+398
-230
lines changed

docs/assets/searchgraph.png

50.2 KB
Loading

docs/assets/smartscrapergraph.png

58.2 KB
Loading

docs/assets/speechgraph.png

45.8 KB
Loading

docs/source/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,3 @@
3030
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3131

3232
html_theme = 'sphinx_rtd_theme'
33-
html_static_path = ['_static']

docs/source/getting_started/examples.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
Examples
22
========
33

4-
Here some example of the different ways to scrape with ScrapegraphAI
4+
Let's suppose you want to scrape a website to get a list of projects with their descriptions.
5+
You can use the `SmartScraperGraph` class to do that.
6+
The following examples show how to use the `SmartScraperGraph` class with OpenAI models and local models.
57

68
OpenAI models
79
^^^^^^^^^^^^^
@@ -78,7 +80,7 @@ After that, you can run the following code, using only your machine resources br
7880
# ************************************************
7981
8082
smart_scraper_graph = SmartScraperGraph(
81-
prompt="List me all the news with their description.",
83+
prompt="List me all the projects with their description.",
8284
# also accepts a string with the already downloaded HTML code
8385
source="https://perinim.github.io/projects",
8486
config=graph_config
@@ -87,3 +89,4 @@ After that, you can run the following code, using only your machine resources br
8789
result = smart_scraper_graph.run()
8890
print(result)
8991
92+
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!

docs/source/getting_started/installation.rst

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,35 @@ for this project.
77
Prerequisites
88
^^^^^^^^^^^^^
99

10-
- `Python 3.8+ <https://www.python.org/downloads/>`_
11-
- `pip <https://pip.pypa.io/en/stable/getting-started/>`
12-
- `ollama <https://ollama.com/>` *optional for local models
10+
- `Python >=3.9,<3.12 <https://www.python.org/downloads/>`_
11+
- `pip <https://pip.pypa.io/en/stable/getting-started/>`_
12+
- `Ollama <https://ollama.com/>`_ (optional for local models)
1313

1414

1515
Install the library
1616
^^^^^^^^^^^^^^^^^^^^
1717

18+
The library is available on PyPI, so it can be installed using the following command:
19+
1820
.. code-block:: bash
1921
2022
pip install scrapegraphai
2123
24+
**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
25+
26+
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
27+
28+
.. code-block:: bash
29+
30+
poetry install
31+
2232
Additionally on Windows when using WSL
2333
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2434

35+
If you are using Windows Subsystem for Linux (WSL) and you are facing issues with the installation of the library, you might need to install the following packages:
36+
2537
.. code-block:: bash
2638
2739
sudo apt-get -y install libnss3 libnspr4 libgbm1 libasound2
2840
29-
As simple as that! You are now ready to scrape gnamgnamgnam 👿👿👿
30-
31-
3241

docs/source/index.rst

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,6 @@
33
You can adapt this file completely to your liking, but it should at least
44
contain the root `toctree` directive.
55
6-
Welcome to scrapegraphai-ai's documentation!
7-
=======================================
8-
9-
Here you will find all the information you need to get started.
10-
The following sections will guide you through the installation process and the usage of the library.
11-
126
.. toctree::
137
:maxdepth: 2
148
:caption: Introduction
@@ -22,6 +16,19 @@ The following sections will guide you through the installation process and the u
2216

2317
getting_started/installation
2418
getting_started/examples
19+
20+
.. toctree::
21+
:maxdepth: 2
22+
:caption: Scrapers
23+
24+
scrapers/graphs
25+
scrapers/llm
26+
scrapers/graph_config
27+
28+
.. toctree::
29+
:maxdepth: 2
30+
:caption: Modules
31+
2532
modules/modules
2633

2734
Indices and tables

docs/source/introduction/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Contributing
22
============
33

44
Hey, you want to contribute? Awesome!
5-
Just fork the repo, make your changes, and send me a pull request.
5+
Just fork the repo, make your changes, and send a pull request.
66
If you're not sure if it's a good idea, open an issue and we'll discuss it.
77

88
Go and check out the `contributing guidelines <https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md>`__ for more information.

docs/source/introduction/overview.rst

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,25 @@
1+
.. image:: ../../assets/scrapegraphai_logo.png
2+
:align: center
3+
:width: 50%
4+
:alt: ScrapegraphAI
5+
16
Overview
27
========
38

4-
In a world where web pages are constantly changing and in a data-hungry world there is a need for a new generation of scrapers, and this is where ScrapegraphAI was born.
5-
An opensource library with the aim of starting a new era of scraping tools that are more flexible and require less maintenance by developers, with the use of LLMs.
9+
ScrapeGraphAI is a open-source web scraping python library designed to usher in a new era of scraping tools.
10+
In today's rapidly evolving and data-intensive digital landscape, this library stands out by integrating LLM and
11+
direct graph logic to automate the creation of scraping pipelines for websites and various local documents, including XML,
12+
HTML, JSON, and more.
613

7-
.. image:: ../../assets/scrapegraphai_logo.png
8-
:align: center
9-
:width: 100px
10-
:alt: ScrapegraphAI
14+
Simply specify the information you need to extract, and ScrapeGraphAI handles the rest,
15+
providing a more flexible and low-maintenance solution compared to traditional scraping tools.
1116

1217
Why ScrapegraphAI?
1318
==================
1419

15-
ScrapegraphAI in our vision represents a significant step forward in the field of web scraping, offering an open-source solution designed to meet the needs of a constantly evolving web landscape. Here's why ScrapegraphAI stands out:
16-
17-
Flexibility and Adaptability
18-
^^^^^^^^^^^^^^^^^^^^^^^^^^^
19-
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages. ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
20+
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages.
21+
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
2022
This flexibility ensures that scrapers remain functional even when website layouts change.
23+
24+
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
25+
as well as local models which can run on your machine using Ollama.

docs/source/modules/modules.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
scrapegraphai
2-
=============
3-
41
.. toctree::
52
:maxdepth: 4
63

docs/source/modules/yosoai.graphs.rst

Lines changed: 0 additions & 29 deletions
This file was deleted.

docs/source/modules/yosoai.nodes.rst

Lines changed: 0 additions & 61 deletions
This file was deleted.

docs/source/modules/yosoai.rst

Lines changed: 0 additions & 110 deletions
This file was deleted.

0 commit comments

Comments
 (0)