ScrapeGraphAI
diff --git a/‎.github/update-requirements.yml
Lines changed: 26 additions & 0 deletions b/‎.github/update-requirements.yml
Lines changed: 26 additions & 0 deletions
diff --git a/‎CHANGELOG.md
Lines changed: 65 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 65 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 40 additions & 115 deletions b/‎README.md
Lines changed: 40 additions & 115 deletions
diff --git a/‎docs/assets/sgai-hero.png
66.9 KB b/‎docs/assets/sgai-hero.png
66.9 KB
diff --git a/‎examples/nemotron/csv_scraper_graph_multi_nemotron.py
Lines changed: 55 additions & 0 deletions b/‎examples/nemotron/csv_scraper_graph_multi_nemotron.py
Lines changed: 55 additions & 0 deletions
@@ -0,0 +1,26 @@
+name: Update requirements
+on:
+  push:
+    paths:
+      - 'pyproject.toml'
+      - '.github/workflows/update-requirements.yml'
+
+jobs:
+  update:
+    name: Update requirements
+    runs-on: ubuntu-latest
+    steps:
+      - name: Install the latest version of rye
+        uses: eifinger/setup-rye@v3
+      - name: Build app
+        run: rye run update-requirements
+  commit:
+    name: Commit changes
+    run: |
+      git config --global user.name 'github-actions'
+      git config --global user.email 'github-actions[bot]@users.noreply.github.com'
+      git add .
+      git commit -m "ci: update requirements.txt [skip ci]"
+      git push
+    env:
+      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -1,8 +1,65 @@
+## [1.11.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.4...v1.11.0-beta.1) (2024-07-23)
+
+
+### Features
+
+* add new toml ([fcb3220](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fcb3220868e7ef1127a7a47f40d0379be282e6eb))
+* add nvidia connection ([fc0dadb](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fc0dadb8f812dfd636dec856921a971b58695ce3))
+
+
+### Bug Fixes
+
+* **md_conversion:** add absolute links md, added missing dependency ([12b5ead](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/12b5eada6ea783770afd630ede69b8cf867a7ded))
+
+
+### chore
+
+* **dependecies:** add script to auto-update requirements ([3289c7b](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3289c7bf5ec58ac3d04e9e5e8e654af9abcee228))
+* **ci:** set up workflow for requirements auto-update ([295fc28](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/295fc28ceb02c78198f7fbe678352503b3259b6b))
+* update requirements.txt ([c7bac98](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c7bac98d2e79e5ab98fa65d7efa858a2cdda1622))
+* upgrade dependencies and scripts ([74d142e](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/74d142eaae724b087eada9c0c876b40a2ccc7cae))
+* **pyproject:** upgrade dependencies ([0425124](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0425124c570f765b98fcf67ba6649f4f9fe76b15))
+
+
+### Docs
+
+* add hero image ([4182e23](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/4182e23e3b8d8f141b119b6014ae3ff20b3892e3))
+* updated readme ([c377ae0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c377ae0544a78ebdc0d15f8d23b3846c26876c8c))
+
+
+### CI
+
+* **release:** 1.10.0-beta.6 [skip ci] ([254bde7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/254bde7008b41ffa434925e3ae84340c53a565bd))
+* **release:** 1.10.0-beta.7 [skip ci] ([1756e85](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/1756e8522f3874de8afbef9ac327f9b3f1a49d07))
+* **release:** 1.10.0-beta.8 [skip ci] ([255e569](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/255e569172b1029bc2a723b2ec66bcf3d3ee3791))
+
+## [1.10.0-beta.8](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.7...v1.10.0-beta.8) (2024-07-23)
+
 ## [1.10.4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.3...v1.10.4) (2024-07-22)
 
 
+
 ### Bug Fixes
 
+
+* **md_conversion:** add absolute links md, added missing dependency ([12b5ead](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/12b5eada6ea783770afd630ede69b8cf867a7ded))
+
+## [1.10.0-beta.7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.6...v1.10.0-beta.7) (2024-07-23)
+
+
+### Features
+
+* add nvidia connection ([fc0dadb](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fc0dadb8f812dfd636dec856921a971b58695ce3))
+
+
+### chore
+
+* **dependecies:** add script to auto-update requirements ([3289c7b](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3289c7bf5ec58ac3d04e9e5e8e654af9abcee228))
+* **ci:** set up workflow for requirements auto-update ([295fc28](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/295fc28ceb02c78198f7fbe678352503b3259b6b))
+* update requirements.txt ([c7bac98](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c7bac98d2e79e5ab98fa65d7efa858a2cdda1622))
+
+## [1.10.0-beta.6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.5...v1.10.0-beta.6) (2024-07-22)
+
 * parse node ([09256f7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/09256f7b11a7a1c2aba01cf8de70401af1e86fe4))
 
 ## [1.10.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.2...v1.10.3) (2024-07-22)
@@ -29,8 +86,12 @@
 ## [1.10.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.9.2...v1.10.0) (2024-07-20)
 
 
+
 ### Features
 
+
+* add new toml ([fcb3220](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fcb3220868e7ef1127a7a47f40d0379be282e6eb))
+
 * add gpt4o omni ([431edb7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/431edb7bb2504f4c1335c3ae3ce2f91867fa7222))
 * add searchngx integration ([5c92186](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/5c9218608140bf694fbfd96aa90276bc438bb475))
 * refactoring_to_md function ([602dd00](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/602dd00209ee1d72a1223fc4793759450921fcf9))
@@ -43,8 +104,11 @@
 * search link node ([cf3ab55](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cf3ab5564ae5c415c63d1771b32ea68f5169ca82))
 
 
+
 ### chore
 
+
+* **pyproject:** upgrade dependencies ([0425124](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0425124c570f765b98fcf67ba6649f4f9fe76b15))
 * correct search engine name ([7ba2f6a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/7ba2f6ae0b9d2e9336e973e1f57ab8355c739e27))
 * remove unused import ([fd1b7cb](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fd1b7cb24a7c252277607abde35826e3c58e34ef))
 * **ci:** upgrade lockfiles ([c7b05a4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c7b05a4993df14d6ed4848121a3cd209571232f7))
@@ -68,6 +132,7 @@
 * **release:** 1.9.0-beta.5 [skip ci] ([bb62439](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bb624399cfc3924825892dd48697fc298ad3b002))
 * **release:** 1.9.0-beta.6 [skip ci] ([54a69de](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/54a69de69e8077e02fd5584783ca62cc2e0ec5bb))
 
+
 ## [1.10.0-beta.5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.10.0-beta.4...v1.10.0-beta.5) (2024-07-20)
 
 
 
@@ -17,7 +17,7 @@ ScrapeGraphAI is a *web scraping* python library that uses LLM and direct graph
 Just say which information you want to extract and the library will do it for you!
 
 <p align="center">
-  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapegraphai_logo.png" alt="Scrapegraph-ai Logo" style="width: 50%;">
+  <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;">
 </p>
 
 ## 🚀 Quick install
@@ -26,159 +26,84 @@ The reference page for Scrapegraph-ai is available on the official page of PyPI:
 
 ```bash
 pip install scrapegraphai
+
+playwright install
 ```
 
 **Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
 
-## 🔍 Demo
-Official streamlit demo:
-
-[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)
-
-Try it directly on the web using Google Colab:
-
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)
-
-## 📖 Documentation
-
-The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
-
-Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).
-
 ## 💻 Usage
-There are multiple standard scraping pipelines that can be used to extract information from a website (or local file):
-- `SmartScraperGraph`: single-page scraper that only needs a user prompt and an input source;
-- `SearchGraph`: multi-page scraper that extracts information from the top n search results of a search engine;
-- `SpeechGraph`: single-page scraper that extracts information from a website and generates an audio file.
-- `ScriptCreatorGraph`: single-page scraper that extracts information from a website and generates a Python script.
+There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
 
-- `SmartScraperMultiGraph`: multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources;
-- `ScriptCreatorMultiGraph`: multi-page scraper that generates a Python script for extracting information from multiple pages given a single prompt and a list of sources.
-
-It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.
+The most common one is the `SmartScraperGraph`, which extracts information from a single page given a user prompt and a source URL.
 
-### Case 1: SmartScraper using Local Models
-
-Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command.
 
 ```python
+import json
 from scrapegraphai.graphs import SmartScraperGraph
 
+# Define the configuration for the scraping pipeline
 graph_config = {
     "llm": {
-        "model": "ollama/mistral",
-        "temperature": 0,
-        "format": "json",  # Ollama needs the format to be specified explicitly
-        "base_url": "http://localhost:11434",  # set Ollama URL
-    },
-    "embeddings": {
-        "model": "ollama/nomic-embed-text",
-        "base_url": "http://localhost:11434",  # set Ollama URL
+        "api_key": "YOUR_OPENAI_APIKEY",
+        "model": "gpt-4o-mini",
     },
     "verbose": True,
+    "headless": False,
 }
 
+# Create the SmartScraperGraph instance
 smart_scraper_graph = SmartScraperGraph(
-    prompt="List me all the projects with their descriptions",
-    # also accepts a string with the already downloaded HTML code
-    source="https://perinim.github.io/projects",
+    prompt="Find some information about what does the company do, the name and a contact email.",
+    source="https://scrapegraphai.com/",
     config=graph_config
 )
 
+# Run the pipeline
 result = smart_scraper_graph.run()
-print(result)
-
+print(json.dumps(result, indent=4))
 ```
 
-The output will be a list of projects with their descriptions like the following:
+The output will be a dictionary like the following:
 
 ```python
-{'projects': [{'title': 'Rotary Pendulum RL', 'description': 'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'}, {'title': 'DQN Implementation from scratch', 'description': 'Developed a Deep Q-Network algorithm to train a simple and double pendulum'}, ...]}
-```
-
-### Case 2: SearchGraph using Mixed Models
-
-We use **Groq** for the LLM and **Ollama** for the embeddings.
-
-```python
-from scrapegraphai.graphs import SearchGraph
-
-# Define the configuration for the graph
-graph_config = {
-    "llm": {
-        "model": "groq/gemma-7b-it",
-        "api_key": "GROQ_API_KEY",
-        "temperature": 0
-    },
-    "embeddings": {
-        "model": "ollama/nomic-embed-text",
-        "base_url": "http://localhost:11434",  # set ollama URL arbitrarily
-    },
-    "max_results": 5,
+{
+    "company": "ScrapeGraphAI",
+    "name": "ScrapeGraphAI Extracting content from websites and local documents using LLM",
+    "contact_email": "[email protected]"
 }
+```
 
-# Create the SearchGraph instance
-search_graph = SearchGraph(
-    prompt="List me all the traditional recipes from Chioggia",
-    config=graph_config
-)
+There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
 
-# Run the graph
-result = search_graph.run()
-print(result)
-```
+| Pipeline Name           | Description                                                                                                      |
+|-------------------------|------------------------------------------------------------------------------------------------------------------|
+| SmartScraperGraph       | Single-page scraper that only needs a user prompt and an input source.                                           |
+| SearchGraph             | Multi-page scraper that extracts information from the top n search results of a search engine.                  |
+| SpeechGraph             | Single-page scraper that extracts information from a website and generates an audio file.                       |
+| ScriptCreatorGraph      | Single-page scraper that extracts information from a website and generates a Python script.                     |
+| SmartScraperMultiGraph  | Multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources.    |
+| ScriptCreatorMultiGraph | Multi-page scraper that generates a Python script for extracting information from multiple pages and sources.     |
 
-The output will be a list of recipes like the following:
+It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.
 
-```python
-{'recipes': [{'name': 'Sarde in Saòre'}, {'name': 'Bigoli in salsa'}, {'name': 'Seppie in umido'}, {'name': 'Moleche frite'}, {'name': 'Risotto alla pescatora'}, {'name': 'Broeto'}, {'name': 'Bibarasse in Cassopipa'}, {'name': 'Risi e bisi'}, {'name': 'Smegiassa Ciosota'}]}
-```
-### Case 3: SpeechGraph using OpenAI
+Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.
 
-You just need to pass the OpenAI API key and the model name.
+## 🔍 Demo
+Official streamlit demo:
 
-```python
-from scrapegraphai.graphs import SpeechGraph
+[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)
 
-graph_config = {
-    "llm": {
-        "api_key": "OPENAI_API_KEY",
-        "model": "gpt-3.5-turbo",
-    },
-    "tts_model": {
-        "api_key": "OPENAI_API_KEY",
-        "model": "tts-1",
-        "voice": "alloy"
-    },
-    "output_path": "audio_summary.mp3",
-}
+Try it directly on the web using Google Colab:
 
-# ************************************************
-# Create the SpeechGraph instance and run it
-# ************************************************
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)
 
-speech_graph = SpeechGraph(
-    prompt="Make a detailed audio summary of the projects.",
-    source="https://perinim.github.io/projects/",
-    config=graph_config,
-)
+## 📖 Documentation
 
-result = speech_graph.run()
-print(result)
+The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
 
-```
+Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).
 
-The output will be an audio file with the summary of the projects on the page.
-
-## Sponsors
-<div style="text-align: center;">
-  <a href="https://serpapi.com?utm_source=scrapegraphai">
-    <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
-  </a>
-  <a href="https://dashboard.statproxies.com/?refferal=scrapegraph">
-    <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 15%;">
-  </a>
-</div>
 
 ## 🤝 Contributing
 
 
@@ -0,0 +1,55 @@
+"""
+Basic example of scraping pipeline using CSVScraperMultiGraph from CSV documents
+"""
+
+import os
+import pandas as pd
+from dotenv import load_dotenv
+from scrapegraphai.graphs import CSVScraperMultiGraph
+from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
+
+load_dotenv()
+# ************************************************
+# Read the CSV file
+# ************************************************
+
+FILE_NAME = "inputs/username.csv"
+curr_dir = os.path.dirname(os.path.realpath(__file__))
+file_path = os.path.join(curr_dir, FILE_NAME)
+
+text = pd.read_csv(file_path)
+
+# ************************************************
+# Define the configuration for the graph
+# ************************************************
+
+graph_config = {
+    "llm": {
+        "api_key": os.getenv("NEMOTRON_APIKEY"),
+        "model": "nvidia/meta/llama3-70b-instruct",
+    }
+}
+
+# ************************************************
+# Create the CSVScraperMultiGraph instance and run it
+# ************************************************
+
+csv_scraper_graph = CSVScraperMultiGraph(
+    prompt="List me all the last names",
+    source=[str(text), str(text)],
+    config=graph_config
+)
+
+result = csv_scraper_graph.run()
+print(result)
+
+# ************************************************
+# Get graph execution info
+# ************************************************
+
+graph_exec_info = csv_scraper_graph.get_execution_info()
+print(prettify_exec_info(graph_exec_info))
+
+# Save to json or csv
+convert_to_csv(result, "result")
+convert_to_json(result, "result")