Skip to content

Commit ffe8cd8

Browse files
committed
up
1 parent 60f673d commit ffe8cd8

File tree

1 file changed

+20
-26
lines changed

1 file changed

+20
-26
lines changed

README.md

Lines changed: 20 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
1-
# 🕷️ ScrapeGraphAI: You Only Scrape Once
21

2+
# 🕷️ ScrapeGraphAI: You Only Scrape Once
33
[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) | [日本語](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/japanese.md)
44
| [한국어](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/korean.md)
55
| [Русский](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/russian.md) | [Türkçe](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/turkish.md)
66

7+
78
[![Downloads](https://img.shields.io/pepy/dt/scrapegraphai?style=for-the-badge)](https://pepy.tech/project/scrapegraphai)
89
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen?style=for-the-badge)](https://github.com/pylint-dev/pylint)
910
[![Pylint](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/pylint.yml?label=Pylint&logo=github&style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
1011
[![CodeQL](https://img.shields.io/github/actions/workflow/status/VinciGit00/Scrapegraph-ai/codeql.yml?label=CodeQL&logo=github&style=for-the-badge)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
1112
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
1213
[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)
1314

14-
ScrapeGraphAI is a _web scraping_ python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
15+
ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
1516

1617
Just say which information you want to extract and the library will do it for you!
1718

@@ -38,11 +39,9 @@ Additional dependecies can be added while installing the library:
3839
- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
3940

4041
This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
41-
4242
```bash
4343
pip install scrapegraphai[other-language-models]
4444
```
45-
4645
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.
4746

4847
```bash
@@ -57,12 +56,13 @@ Additional dependecies can be added while installing the library:
5756

5857
</details>
5958

60-
## 💻 Usage
6159

60+
## 💻 Usage
6261
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
6362

6463
The most common one is the `SmartScraperGraph`, which extracts information from a single page given a user prompt and a source URL.
6564

65+
6666
```python
6767
import json
6868
from scrapegraphai.graphs import SmartScraperGraph
@@ -98,17 +98,16 @@ The output will be a dictionary like the following:
9898
"contact_email": "[email protected]"
9999
}
100100
```
101-
102101
There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
103102

104-
| Pipeline Name | Description |
105-
| ----------------------- | ------------------------------------------------------------------------------------------------------------- |
106-
| SmartScraperGraph | Single-page scraper that only needs a user prompt and an input source. |
107-
| SearchGraph | Multi-page scraper that extracts information from the top n search results of a search engine. |
108-
| SpeechGraph | Single-page scraper that extracts information from a website and generates an audio file. |
109-
| ScriptCreatorGraph | Single-page scraper that extracts information from a website and generates a Python script. |
110-
| SmartScraperMultiGraph | Multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources. |
111-
| ScriptCreatorMultiGraph | Multi-page scraper that generates a Python script for extracting information from multiple pages and sources. |
103+
| Pipeline Name | Description |
104+
|-------------------------|------------------------------------------------------------------------------------------------------------------|
105+
| SmartScraperGraph | Single-page scraper that only needs a user prompt and an input source. |
106+
| SearchGraph | Multi-page scraper that extracts information from the top n search results of a search engine. |
107+
| SpeechGraph | Single-page scraper that extracts information from a website and generates an audio file. |
108+
| ScriptCreatorGraph | Single-page scraper that extracts information from a website and generates a Python script. |
109+
| SmartScraperMultiGraph | Multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources. |
110+
| ScriptCreatorMultiGraph | Multi-page scraper that generates a Python script for extracting information from multiple pages and sources. |
112111

113112
For each of these graphs there is the multi version. It allows to make calls of the LLM in parallel.
114113

@@ -117,7 +116,6 @@ It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**,
117116
Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.
118117

119118
## 🔍 Demo
120-
121119
Official streamlit demo:
122120

123121
[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)
@@ -133,7 +131,6 @@ The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.r
133131
Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).
134132

135133
## 🏆 Sponsors
136-
137134
<div style="text-align: center;">
138135
<a href="https://2ly.link/1zaXG">
139136
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/browserbase_logo.png" alt="Browserbase" style="width: 10%;">
@@ -159,18 +156,15 @@ Please see the [contributing guidelines](https://github.com/VinciGit00/Scrapegra
159156
[![My Skills](https://skillicons.dev/icons?i=linkedin)](https://www.linkedin.com/company/scrapegraphai/)
160157
[![My Skills](https://skillicons.dev/icons?i=twitter)](https://twitter.com/scrapegraphai)
161158

162-
## 📈 Telemetry
163-
159+
## 📈 Telemetry
164160
We collect anonymous usage metrics to enhance our package's quality and user experience. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the environment variable SCRAPEGRAPHAI_TELEMETRY_ENABLED=false. For more information, please refer to the documentation [here](https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/telemetry.html).
165161

166-
## ❤️ Contributors
167162

163+
## ❤️ Contributors
168164
[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors)
169165

170166
## 🎓 Citations
171-
172167
If you have used our library for research purposes please quote us with the following reference:
173-
174168
```text
175169
@misc{scrapegraph-ai,
176170
author = {Marco Perini, Lorenzo Padoan, Marco Vinciguerra},
@@ -187,11 +181,11 @@ If you have used our library for research purposes please quote us with the foll
187181
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/logo_authors.png" alt="Authors_logos">
188182
</p>
189183

190-
| | Contact Info |
191-
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
192-
| Marco Vinciguerra | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/marco-vinciguerra-7ba365242/) |
193-
| Marco Perini | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/perinim/) |
194-
| Lorenzo Padoan | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/lorenzo-padoan-4521a2154/) |
184+
| | Contact Info |
185+
|--------------------|----------------------|
186+
| Marco Vinciguerra | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/marco-vinciguerra-7ba365242/) |
187+
| Marco Perini | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/perinim/) |
188+
| Lorenzo Padoan | [![Linkedin Badge](https://img.shields.io/badge/-Linkedin-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/lorenzo-padoan-4521a2154/) |
195189

196190
## 📜 License
197191

0 commit comments

Comments
 (0)