Skip to content

reallignment #246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 118 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
51aa109
feat: add turboscraper (alfa)
VinciGit00 May 6, 2024
67d5fbf
feat: new search_graph
VinciGit00 May 6, 2024
cc28d5a
docs: fixed unused param and install
PeriniM May 8, 2024
ae5655f
docs(readme): improve main readme
PeriniM May 8, 2024
4bf90f3
docs: fixed speechgraphexample
PeriniM May 8, 2024
bd8afaf
Fixed "NameError: name 'GoogleGenerativeAIEmbeddings' is not defined"
arjuuuuunnnnn May 9, 2024
13238f4
Merge pull request #185 from arjuuuuunnnnn/main
VinciGit00 May 9, 2024
0bb68d1
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 9, 2024
5449ebf
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 9, 2024
7b07fdf
add groq example
VinciGit00 May 9, 2024
772e064
docs: Update README.md
lurenss May 10, 2024
82318b9
add sponsor
VinciGit00 May 10, 2024
7ee5078
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 10, 2024
67b8a14
Update README.md
VinciGit00 May 10, 2024
f8d8d71
docs: updated sponsor logo
PeriniM May 10, 2024
702e913
Update README.md
VinciGit00 May 10, 2024
198420c
docs: update instructions to use with LocalAI
mudler May 10, 2024
a433399
Merge pull request #205 from mudler/patch-1
VinciGit00 May 10, 2024
86be41e
Revert "docs: update instructions to use with LocalAI"
PeriniM May 10, 2024
b5c1a7b
Merge pull request #206 from VinciGit00/revert-205-patch-1
PeriniM May 10, 2024
23b1e5f
Merge branch 'main' into docs
PeriniM May 10, 2024
b8a8ebb
Merge pull request #207 from VinciGit00/docs
PeriniM May 10, 2024
2f4fd45
fix(pytest): add dependency for mocking testing functions
DiTo97 May 10, 2024
db2234b
feat(webdriver-backend): add dynamic import scripts from module and file
DiTo97 May 10, 2024
2170131
feat(proxy-rotation): add parse (IP address) or search (from broker) …
DiTo97 May 10, 2024
768719c
feat(safe-web-driver): enchanced the original `AsyncChromiumLoader` w…
DiTo97 May 10, 2024
fc2aa3a
Merge branch 'pre/beta' of https://github.com/DiTo97/Scrapegraph-ai i…
DiTo97 May 10, 2024
67d8fec
Minor typo fix for clarity
epage480 May 10, 2024
627cbee
feat(parallel-exeuction): add asyncio event loop dispatcher with sema…
DiTo97 May 10, 2024
4088474
Added parse_html option in parse_node
epage480 May 10, 2024
aac51ba
Removed dead code, allows GenerateScraperNode to generate scraper with
epage480 May 10, 2024
24c3b05
Removed nonfunctional RAG node from ScriptCreatorGraph
epage480 May 10, 2024
0683e78
Merge branch 'pre/beta' into fix-GenerateScraperGraph
epage480 May 10, 2024
300fd5d
Fetch links in the page while parsing html
mayurdb May 11, 2024
1fa77e5
Merge pull request #215 from epage480/fix-GenerateScraperGraph
VinciGit00 May 11, 2024
b752499
Merge pull request #217 from mayurdb/fetchLinkFix
VinciGit00 May 11, 2024
04a4d84
Update serp_api_logo.png
VinciGit00 May 11, 2024
78f2174
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 11, 2024
2563773
fix: crash asyncio due dependency version
lurenss May 11, 2024
d359814
ci(release): 0.10.1 [skip ci]
semantic-release-bot May 11, 2024
dc91719
Update cleanup_html.py
VinciGit00 May 11, 2024
b54d984
fix(chromium-loader): ensure it subclasses langchain's base loader
DiTo97 May 11, 2024
76b0e39
update tests
VinciGit00 May 11, 2024
13ae918
docs: add diagram showing general structure/flow of the library
daniele-roncaglioni May 11, 2024
df271b6
Add search link node that can find out relevant links in the webpage
mayurdb May 11, 2024
8f1fbe7
minor changes
mayurdb May 11, 2024
ea3b545
Merge branch 'pre/beta' into deepScrape
mayurdb May 11, 2024
9a67a26
Update documentation
mayurdb May 11, 2024
dd29c16
Merge branch 'deepScrape' of github.com:mayurdb/Scrapegraph-ai into d…
mayurdb May 11, 2024
d8ed76b
Merge pull request #221 from mayurdb/deepScrape
VinciGit00 May 11, 2024
b441b30
docs: update overview diagram with more models
daniele-roncaglioni May 11, 2024
3b9ec9b
Merge pull request #220 from daniele-roncaglioni/102-library-overview…
VinciGit00 May 11, 2024
156b67b
feat: add support for deepseek-chat
f-aguzzi May 11, 2024
e004c7c
Merge pull request #223 from f-aguzzi/pre/beta
VinciGit00 May 12, 2024
106fb12
ci(release): 0.11.0-beta.3 [skip ci]
semantic-release-bot May 12, 2024
e2350ed
feat: add new prompt info
VinciGit00 May 12, 2024
f359d5c
Merge pull request #224 from VinciGit00/fixing-prompts
VinciGit00 May 12, 2024
4ccddda
ci(release): 0.11.0-beta.4 [skip ci]
semantic-release-bot May 12, 2024
1e9a564
fix(proxy-rotation): removed duplicated arg and passed the loader_kwa…
PeriniM May 12, 2024
30758b4
Create smart_scarper_deepseek.py
VinciGit00 May 12, 2024
5d6d996
fix(proxy-rotation): removed max_shape duplicate
PeriniM May 13, 2024
e256b75
docs(refactor): added proxy-rotation usage and refactor readthedocs
PeriniM May 13, 2024
0c36a7e
feat: added proxy rotation
PeriniM May 13, 2024
7e8acd8
Merge branch 'pre/beta' into fix/fetch-node-proxybroker
PeriniM May 13, 2024
b8079f8
Merge pull request #211 from DiTo97/fix/fetch-node-proxybroker
PeriniM May 13, 2024
fc56d6b
Update README.md
VinciGit00 May 13, 2024
353382b
ci(release): 0.11.0-beta.5 [skip ci]
semantic-release-bot May 13, 2024
0c15947
fix(fetch-node): removed isSoup from default
PeriniM May 13, 2024
2724d3d
ci(release): 0.11.0-beta.6 [skip ci]
semantic-release-bot May 13, 2024
c7ec114
docs(refactor): changed example
PeriniM May 13, 2024
60ed80f
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
PeriniM May 13, 2024
7c91f9f
add examples for deepseek
VinciGit00 May 13, 2024
39be38f
Fixed anthropic/bedrock conflict; Removed duplicate class Claude; Upd…
JGalego May 13, 2024
d0167de
fix: bug for claude
VinciGit00 May 13, 2024
f0f7373
ci(release): 0.11.0-beta.7 [skip ci]
semantic-release-bot May 13, 2024
f3d44c0
Merge pull request #228 from JGalego/fix/bedrock-support
VinciGit00 May 13, 2024
dedc733
fix(asyncio): replaced deepcopy with copy due to serialization problems
PeriniM May 13, 2024
859c5d5
Refactored to include custom AWS client for bedrock; Added missing An…
JGalego May 13, 2024
28ab8da
Merge pull request #229 from JGalego/feat/custom-aws-creds
VinciGit00 May 13, 2024
c0d26d6
ad bedrocl
VinciGit00 May 13, 2024
d9752b1
chore: update models_tokens.py with new model configurations
arsaboo May 13, 2024
a8d5e7d
feat(batchsize): tested different batch sizes and systems
PeriniM May 13, 2024
367dea5
Merge branch 'pre/beta' into feat/parallel-node-execution
PeriniM May 13, 2024
62a74a5
Merge pull request #213 from DiTo97/feat/parallel-node-execution
PeriniM May 13, 2024
df918fa
Merge pull request #231 from arsaboo/models
PeriniM May 13, 2024
fa4edb4
ci(release): 0.11.0-beta.8 [skip ci]
semantic-release-bot May 13, 2024
ced2bbc
docs(concurrent): refactor theme and added benchmarck searchgraph
PeriniM May 14, 2024
4fd8a39
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
PeriniM May 14, 2024
d6f5ca8
Merge branch 'main' into pre/beta
VinciGit00 May 14, 2024
5914fa8
Update poetry.lock
VinciGit00 May 14, 2024
d2877d8
ci(release): 0.11.0-beta.9 [skip ci]
semantic-release-bot May 14, 2024
52a4a3b
feat: add gpt-4o
f-aguzzi May 14, 2024
8e46799
Merge pull request #235 from f-aguzzi/pre/beta
PeriniM May 14, 2024
218b8ed
ci(release): 0.11.0-beta.10 [skip ci]
semantic-release-bot May 14, 2024
90955ca
feat(gpt-4o): image to text single node test
PeriniM May 14, 2024
a296927
feat(omni-scraper): working OmniScraperGraph with images
PeriniM May 14, 2024
fcb3abb
feat(omni-search): added omni search graph and updated docs
PeriniM May 14, 2024
a6e1813
fix(fetch_node): bug in handling local files
PeriniM May 14, 2024
a458ec4
Update the prompt for the search_link_node
mayurdb May 14, 2024
d76badd
Merge pull request #239 from mayurdb/deepScrapeFix
VinciGit00 May 14, 2024
932df8d
Merge pull request #238 from VinciGit00/gpt4-omni
VinciGit00 May 14, 2024
8727d03
ci(release): 0.11.0-beta.11 [skip ci]
semantic-release-bot May 14, 2024
2a57940
Merge pull request #234 from VinciGit00/pre/beta
VinciGit00 May 14, 2024
c55a3b1
ci(release): 0.11.0 [skip ci]
semantic-release-bot May 14, 2024
b0a67ba
fix(docs): requirements-dev
PeriniM May 14, 2024
6effe25
ci(release): 0.11.1 [skip ci]
semantic-release-bot May 14, 2024
78d1940
docs(main-readme): fixed some typos
PeriniM May 15, 2024
8fc2510
chore(package manager)!: move from poetry to rye
f-aguzzi May 15, 2024
672bd29
Merge pull request #244 from f-aguzzi/main
VinciGit00 May 15, 2024
c0b6f02
ci(release): 1.0.0 [skip ci]
semantic-release-bot May 15, 2024
24d56af
Update pyproject.toml
VinciGit00 May 15, 2024
096b665
fix(searchgraph): used shallow copy to serialize obj
PeriniM May 15, 2024
a81d2b7
ci(release): 1.0.1 [skip ci]
semantic-release-bot May 15, 2024
7ccd51a
add rye update script bash
VinciGit00 May 15, 2024
694d3ab
Merge branch 'main' of https://github.com/VinciGit00/Scrapegraph-ai
VinciGit00 May 15, 2024
efb781f
docs(rye): replaced poetry with rye
PeriniM May 15, 2024
22cd9e3
Merge branch 'search_link_context' into main
VinciGit00 May 15, 2024
51200d6
ci(release): 1.1.0 [skip ci]
semantic-release-bot May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,8 @@ jobs:
run: |
sudo apt update
sudo apt install -y git
- name: Install Python Env and Poetry
uses: actions/setup-python@v5
with:
python-version: '3.9'
- run: pip install poetry
- name: Install the latest version of rye
uses: eifinger/setup-rye@v3
- name: Install Node Env
uses: actions/setup-node@v4
with:
Expand All @@ -30,8 +27,8 @@ jobs:
persist-credentials: false
- name: Build app
run: |
poetry install
poetry build
rye sync --no-lock
rye build
id: build_cache
if: success()
- name: Cache build
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,6 @@ examples/graph_examples/ScrapeGraphAI_generated_graph
examples/**/result.csv
examples/**/result.json
main.py
*.python-version
*.lock

1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.9.19
237 changes: 237 additions & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

184 changes: 59 additions & 125 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)


ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph logic to create scraping pipelines for websites, documents and XML files.
ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.).

Just say which information you want to extract and the library will do it for you!

<p align="center">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapegraphai_logo.png" alt="Scrapegraph-ai Logo" style="width: 50%;">
</p>


## 🚀 Quick install

The reference page for Scrapegraph-ai is available on the official page of pypy: [pypi](https://pypi.org/project/scrapegraphai/).
Expand All @@ -39,20 +39,23 @@ Try it directly on the web using Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)

Follow the procedure on the following link to setup your OpenAI API key: [link](https://scrapegraph-ai.readthedocs.io/en/latest/index.html).

## 📖 Documentation

The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).

Check out also the docusaurus [documentation](https://scrapegraph-doc.onrender.com/).
Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).

## 💻 Usage
You can use the `SmartScraper` class to extract information from a website using a prompt.
There are three main scraping pipelines that can be used to extract information from a website (or local file):
- `SmartScraperGraph`: single-page scraper that only needs a user prompt and an input source;
- `SearchGraph`: multi-page scraper that extracts information from the top n search results of a search engine;
- `SpeechGraph`: single-page scraper that extracts information from a website and generates an audio file.

It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.

### Case 1: SmartScraper using Local Models

The `SmartScraper` class is a direct graph implementation that uses the most common nodes present in a web scraping pipeline. For more information, please see the [documentation](https://scrapegraph-ai.readthedocs.io/en/latest/).
### Case 1: Extracting information using Ollama
Remember to download the model on Ollama separately!
Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command.

```python
from scrapegraphai.graphs import SmartScraperGraph
Expand All @@ -67,11 +70,12 @@ graph_config = {
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434", # set Ollama URL
}
},
"verbose": True,
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the articles",
prompt="List me all the projects with their descriptions",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config
Expand All @@ -82,160 +86,86 @@ print(result)

```

### Case 2: Extracting information using Docker
The output will be a list of projects with their descriptions like the following:

Note: before using the local model remember to create the docker container!
```text
docker-compose up -d
docker exec -it ollama ollama pull stablelm-zephyr
```
You can use which models available on Ollama or your own model instead of stablelm-zephyr
```python
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 0,
"format": "json", # Ollama needs the format to be specified explicitly
# "model_tokens": 2000, # set context length arbitrarily
},
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the articles",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)
{'projects': [{'title': 'Rotary Pendulum RL', 'description': 'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'}, {'title': 'DQN Implementation from scratch', 'description': 'Developed a Deep Q-Network algorithm to train a simple and double pendulum'}, ...]}
```

### Case 2: SearchGraph using Mixed Models

### Case 3: Extracting information using Openai model
```python
from scrapegraphai.graphs import SmartScraperGraph
OPENAI_API_KEY = "YOUR_API_KEY"

graph_config = {
"llm": {
"api_key": OPENAI_API_KEY,
"model": "gpt-3.5-turbo",
},
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the articles",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config
)
We use **Groq** for the LLM and **Ollama** for the embeddings.

result = smart_scraper_graph.run()
print(result)
```

### Case 4: Extracting information using Groq
```python
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

groq_key = os.getenv("GROQ_APIKEY")
from scrapegraphai.graphs import SearchGraph

# Define the configuration for the graph
graph_config = {
"llm": {
"model": "groq/gemma-7b-it",
"api_key": groq_key,
"api_key": "GROQ_API_KEY",
"temperature": 0
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"temperature": 0,
"base_url": "http://localhost:11434",
"base_url": "http://localhost:11434", # set ollama URL arbitrarily
},
"headless": False
"max_results": 5,
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description and the author.",
source="https://perinim.github.io/projects",
# Create the SearchGraph instance
search_graph = SearchGraph(
prompt="List me all the traditional recipes from Chioggia",
config=graph_config
)

result = smart_scraper_graph.run()
# Run the graph
result = search_graph.run()
print(result)
```

The output will be a list of recipes like the following:

### Case 5: Extracting information using Azure
```python
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings

lm_model_instance = AzureChatOpenAI(
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"]
)

embedder_model_instance = AzureOpenAIEmbeddings(
azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)
graph_config = {
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
}

smart_scraper_graph = SmartScraperGraph(
prompt="""List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
source="https://www.hmhco.com/event",
config=graph_config
)
{'recipes': [{'name': 'Sarde in Saòre'}, {'name': 'Bigoli in salsa'}, {'name': 'Seppie in umido'}, {'name': 'Moleche frite'}, {'name': 'Risotto alla pescatora'}, {'name': 'Broeto'}, {'name': 'Bibarasse in Cassopipa'}, {'name': 'Risi e bisi'}, {'name': 'Smegiassa Ciosota'}]}
```
### Case 3: SpeechGraph using OpenAI

You just need to pass the OpenAI API key and the model name.

### Case 6: Extracting information using Gemini
```python
from scrapegraphai.graphs import SmartScraperGraph
GOOGLE_APIKEY = "YOUR_API_KEY"
from scrapegraphai.graphs import SpeechGraph

# Define the configuration for the graph
graph_config = {
"llm": {
"api_key": GOOGLE_APIKEY,
"model": "gemini-pro",
"api_key": "OPENAI_API_KEY",
"model": "gpt-3.5-turbo",
},
"tts_model": {
"api_key": "OPENAI_API_KEY",
"model": "tts-1",
"voice": "alloy"
},
"output_path": "audio_summary.mp3",
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the articles",
source="https://perinim.github.io/projects",
config=graph_config
# ************************************************
# Create the SpeechGraph instance and run it
# ************************************************

speech_graph = SpeechGraph(
prompt="Make a detailed audio summary of the projects.",
source="https://perinim.github.io/projects/",
config=graph_config,
)

result = smart_scraper_graph.run()
result = speech_graph.run()
print(result)
```

The output for all 3 the cases will be a dictionary with the extracted information, for example:

```bash
{
'titles': [
'Rotary Pendulum RL'
],
'descriptions': [
'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'
]
}
```

The output will be an audio file with the summary of the projects on the page.

## 🤝 Contributing

Feel free to contribute and join our Discord server to discuss with us improvements and give us suggestions!
Expand All @@ -253,6 +183,10 @@ Wanna visualize the roadmap in a more interactive way? Check out the [markmap](h

## ❤️ Contributors
[![Contributors](https://contrib.rocks/image?repo=VinciGit00/Scrapegraph-ai)](https://github.com/VinciGit00/Scrapegraph-ai/graphs/contributors)
## Sponsors
<p align="center">
<a href="https://serpapi.com?utm_source=scrapegraphai"><img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;"></a>
</p>

## 🎓 Citations
If you have used our library for research purposes please quote us with the following reference:
Expand All @@ -269,7 +203,7 @@ If you have used our library for research purposes please quote us with the foll
## Authors

<p align="center">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/logo_authors.png" alt="Authors Logos">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/logo_authors.png" alt="Authors_logos">
</p>

| | Contact Info |
Expand Down
Binary file added docs/assets/omniscrapergraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/omnisearchgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/project_overview_diagram.fig
Binary file not shown.
Binary file added docs/assets/project_overview_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/searchgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/serp_api_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/smartscrapergraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/speechgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 22 additions & 6 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,36 @@
# import all the modules
sys.path.insert(0, os.path.abspath('../../'))

project = 'scrapegraphai'
copyright = '2024, Marco Vinciguerra'
author = 'Marco Vinciguerra'
project = 'ScrapeGraphAI'
copyright = '2024, ScrapeGraphAI'
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'

html_last_updated_fmt = "%b %d, %Y"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']

templates_path = ['_templates']
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
# html_theme = 'sphinx_rtd_theme'
html_theme = 'sphinx_wagtail_theme'

html_theme_options = dict(
project_name = "ScrapeGraphAI",
logo = "scrapegraphai_logo.png",
logo_alt = "ScrapeGraphAI",
logo_height = 59,
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
logo_width = 45,
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
footer_links = ",".join(
["Landing Page|https://scrapegraphai.com/",
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
),
)
7 changes: 5 additions & 2 deletions docs/source/getting_started/examples.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Examples
========

Here some example of the different ways to scrape with ScrapegraphAI
Let's suppose you want to scrape a website to get a list of projects with their descriptions.
You can use the `SmartScraperGraph` class to do that.
The following examples show how to use the `SmartScraperGraph` class with OpenAI models and local models.

OpenAI models
^^^^^^^^^^^^^
Expand Down Expand Up @@ -78,7 +80,7 @@ After that, you can run the following code, using only your machine resources br
# ************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the news with their description.",
prompt="List me all the projects with their description.",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config
Expand All @@ -87,3 +89,4 @@ After that, you can run the following code, using only your machine resources br
result = smart_scraper_graph.run()
print(result)

To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!
Loading