Skip to content

Pre/beta #682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 146 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
49ae56c
Added screenshot preparation script for screenshot scraping
Santabot123 Aug 23, 2024
e11f0cd
Added text_detection.py and updated screenshot_preparation.py
Santabot123 Aug 24, 2024
b4f8ea4
add __init__.py and docstrings
Santabot123 Aug 26, 2024
0cf7c44
correct the typo and updated select_area_with_ipywidget()
Santabot123 Aug 26, 2024
7e23c3d
correct the typo
Santabot123 Aug 26, 2024
c0a0e69
remove some comments and image
Santabot123 Aug 27, 2024
92bec28
Updated requirements.txt
Santabot123 Aug 28, 2024
aa9e85f
remove some comments and image
Santabot123 Aug 26, 2024
90d7549
updated requirements.txt
Santabot123 Aug 28, 2024
6e9911c
Merge branch 'pre/beta' of https://github.com/Santabot123/Scrapegraph…
Santabot123 Aug 28, 2024
55a7727
Update requirements.txt
Santabot123 Aug 28, 2024
8f615ad
feat: add togheterai
VinciGit00 Aug 28, 2024
34942de
chore(examples): create Together AI examples
f-aguzzi Aug 28, 2024
5f604d1
Merge pull request #605 from ScrapeGraphAI/togheter_ai_integration
f-aguzzi Aug 28, 2024
d7f6036
ci(release): 1.16.0-beta.1 [skip ci]
semantic-release-bot Aug 28, 2024
c348f67
fix: update generate answernode
VinciGit00 Aug 30, 2024
735120d
Merge branch 'screenshot-scraper-fix' into pre/beta
VinciGit00 Aug 30, 2024
405f28e
Merge pull request #606 from Santabot123/pre/beta
VinciGit00 Aug 30, 2024
a0d2113
refactoring of folders
VinciGit00 Aug 30, 2024
388630c
fix: screenshot scraper
VinciGit00 Aug 30, 2024
a73573d
update version
VinciGit00 Aug 31, 2024
9fd6509
Update pyproject.toml
VinciGit00 Aug 31, 2024
9c2aefa
Merge pull request #612 from ScrapeGraphAI/598-1140+-pydantic-validat…
f-aguzzi Aug 31, 2024
1c37d5d
ci(release): 1.16.0-beta.2 [skip ci]
semantic-release-bot Aug 31, 2024
0e0b280
Merge branch 'pre/beta' into temp
VinciGit00 Sep 1, 2024
86f9442
Merge pull request #615 from ScrapeGraphAI/temp
VinciGit00 Sep 1, 2024
886c987
ci(release): 1.16.0-beta.3 [skip ci]
semantic-release-bot Sep 1, 2024
f51b155
add example
VinciGit00 Sep 1, 2024
8422463
feat:expose the search engine params to user
goasleep Sep 2, 2024
a8b0e4a
updated token calculation on parsenode
tm-robinson Sep 2, 2024
3d265a8
change GenerateScraperNode to only use first chunk
tm-robinson Sep 2, 2024
1bcc0bf
Merge pull request #620 from goasleep/feature/export_search_engine
VinciGit00 Sep 2, 2024
ba5c7ad
ci(release): 1.16.0-beta.4 [skip ci]
semantic-release-bot Sep 2, 2024
e741602
Merge branch 'pre/beta' into 543-ScriptCreatorGraph-only-use-first-chunk
VinciGit00 Sep 2, 2024
fd0a902
Merge pull request #619 from tm-robinson/543-ScriptCreatorGraph-only-…
VinciGit00 Sep 2, 2024
13efd4e
ci(release): 1.17.0-beta.1 [skip ci]
semantic-release-bot Sep 2, 2024
ef2db0c
Update pyproject.toml
VinciGit00 Sep 2, 2024
74dfc69
fix(DeepSeek): proper model initialization
f-aguzzi Sep 2, 2024
398b2c5
fix(Ollama): instance model from correct package
f-aguzzi Sep 2, 2024
1e466cd
Merge branch 'pre/beta' into screenshot-scraper-fix
VinciGit00 Sep 2, 2024
3ff69cb
Merge pull request #614 from ScrapeGraphAI/screenshot-scraper-fix
VinciGit00 Sep 2, 2024
89b1f10
Merge pull request #621 from ScrapeGraphAI/609-fix-deepseek-instancing
VinciGit00 Sep 2, 2024
08afc92
ci(release): 1.17.0-beta.2 [skip ci]
semantic-release-bot Sep 2, 2024
66a3b6d
fix: Parse Node scraping link and img urls allowing OmniScraper to work
LorenzoPaleari Sep 2, 2024
57337a0
fix: Removed link_urls and img_ulrs from FetchNode output
LorenzoPaleari Sep 2, 2024
b8ef937
fix(ScreenshotScraper): impose dynamic imports
f-aguzzi Sep 2, 2024
5242166
fix(SmartScraper): pass llm_model to ParseNode
f-aguzzi Sep 2, 2024
aed5452
Merge pull request #624 from ScrapeGraphAI/fix-import-errors
VinciGit00 Sep 2, 2024
fc55418
ci(release): 1.17.0-beta.3 [skip ci]
semantic-release-bot Sep 2, 2024
81af62d
Merge pull request #622 from LorenzoPaleari/pre/beta
VinciGit00 Sep 2, 2024
5e99071
ci(release): 1.17.0-beta.4 [skip ci]
semantic-release-bot Sep 2, 2024
8e74ac5
fix: correctly parsing output when using structured_output
LorenzoPaleari Sep 2, 2024
8442700
Merge pull request #626 from LorenzoPaleari/598-fix-pydantic-errors
VinciGit00 Sep 2, 2024
16ab1bf
ci(release): 1.17.0-beta.5 [skip ci]
semantic-release-bot Sep 2, 2024
52fe441
fix(ScreenShotScraper): static import of optional dependencies
f-aguzzi Sep 4, 2024
e477a44
Merge pull request #631 from ScrapeGraphAI/627-PIL-import-error
VinciGit00 Sep 4, 2024
50c9c6b
ci(release): 1.17.0-beta.6 [skip ci]
semantic-release-bot Sep 4, 2024
bd4b26d
feat: ConcatNode.py added for heavy merge operations
ekinsenler Sep 4, 2024
f83c3d1
add example for gemini
ekinsenler Sep 4, 2024
c0339d9
fix file name
ekinsenler Sep 4, 2024
63a5d18
fix(AbstractGraph): Bedrock init issues
f-aguzzi Sep 5, 2024
31aff6b
Merge pull request #636 from ScrapeGraphAI/633-bedrock-support-fix
VinciGit00 Sep 5, 2024
4347afb
ci(release): 1.17.0-beta.7 [skip ci]
semantic-release-bot Sep 5, 2024
2859fb7
feat(AbstractGraph): add adjustable rate limit
f-aguzzi Sep 5, 2024
c382b9d
Merge pull request #630 from ScrapeGraphAI/595-rate-limit-error
VinciGit00 Sep 6, 2024
85c374e
ci(release): 1.17.0-beta.8 [skip ci]
semantic-release-bot Sep 6, 2024
8b02cb4
Merge pull request #632 from ekinsenler/concat_node
VinciGit00 Sep 6, 2024
77d0fd3
ci(release): 1.17.0-beta.9 [skip ci]
semantic-release-bot Sep 6, 2024
9e9c775
add examples multi concat
VinciGit00 Sep 6, 2024
94e69a0
feat: add scrape_do_integration
VinciGit00 Sep 6, 2024
f5e7a8b
fix of the bug for fetching the code
VinciGit00 Sep 6, 2024
8883bce
asdd proxy integratrion
VinciGit00 Sep 6, 2024
167f970
feat: fetch_node improved
VinciGit00 Sep 7, 2024
afb6eb7
feat: return urls in searchgraph
VinciGit00 Sep 7, 2024
ef7a589
fix: screenshot_scraper
VinciGit00 Sep 7, 2024
af28885
ci(release): 1.17.0-beta.10 [skip ci]
semantic-release-bot Sep 7, 2024
9016bb5
Merge pull request #639 from ScrapeGraphAI/scrape_do_integration
f-aguzzi Sep 7, 2024
a73fec5
ci(release): 1.17.0-beta.11 [skip ci]
semantic-release-bot Sep 7, 2024
fc738ca
Update parse_node.py
VinciGit00 Sep 8, 2024
14c5e6b
Merge branch 'pre/beta' into temp
VinciGit00 Sep 8, 2024
9f52602
Merge pull request #646 from ScrapeGraphAI/temp
VinciGit00 Sep 8, 2024
eddcb79
ci(release): 1.19.0-beta.1 [skip ci]
semantic-release-bot Sep 8, 2024
f2bb22d
fix: temporary fix for parse_node
VinciGit00 Sep 9, 2024
8a0d46b
Merge pull request #641 from ScrapeGraphAI/urls_search_graph
f-aguzzi Sep 9, 2024
32a102a
Merge pull request #648 from ScrapeGraphAI/637-it-can´t-scrape-urls-f…
f-aguzzi Sep 9, 2024
23a260c
ci(release): 1.19.0-beta.2 [skip ci]
semantic-release-bot Sep 9, 2024
947ebd2
fix: parse node
VinciGit00 Sep 10, 2024
4c14fd7
Merge pull request #650 from ScrapeGraphAI/637-it-can´t-scrape-urls-f…
f-aguzzi Sep 10, 2024
38cba96
ci(release): 1.19.0-beta.3 [skip ci]
semantic-release-bot Sep 10, 2024
380174d
add chunking functionn
VinciGit00 Sep 10, 2024
1a7f21f
feat: removed semchunk and used tikton
VinciGit00 Sep 10, 2024
24c38f9
ci(release): 1.19.0-beta.4 [skip ci]
semantic-release-bot Sep 10, 2024
4ee7753
Merge pull request #654 from ScrapeGraphAI/main
VinciGit00 Sep 10, 2024
7621a7c
ci(release): 1.19.0-beta.5 [skip ci]
semantic-release-bot Sep 10, 2024
fe3aa28
refactoring of the code
VinciGit00 Sep 11, 2024
57a58e1
docs: Updated the graph_config in the documentation.
shenghongtw Sep 12, 2024
9eb40e1
Update script_generator_openai.py
VinciGit00 Sep 12, 2024
ca31bd9
Merge pull request #658 from shenghongtw/docs/Updated_the_graph_confi…
VinciGit00 Sep 12, 2024
18277c1
Merge branch 'pre/beta' into temp
VinciGit00 Sep 12, 2024
5ff8cc7
Merge pull request #659 from ScrapeGraphAI/temp
VinciGit00 Sep 12, 2024
ed8e173
ci(release): 1.19.0-beta.6 [skip ci]
semantic-release-bot Sep 12, 2024
da9726f
updates to tokenization for #651 to implement for mistral and ollama
tm-robinson Sep 12, 2024
dc4a76b
use semchunk by default as the other code is causing tokenizers to be…
tm-robinson Sep 12, 2024
c64ce88
refactoting of imports
VinciGit00 Sep 12, 2024
4a16f14
Merge pull request #660 from tm-robinson/651-add-tokenization-for-oll…
VinciGit00 Sep 12, 2024
b805aea
fix: pyproject.toml dependencies
VinciGit00 Sep 12, 2024
4ab26a2
ci(release): 1.19.0-beta.7 [skip ci]
semantic-release-bot Sep 12, 2024
ec6b164
feat: refactoring of the tokenization function
VinciGit00 Sep 12, 2024
88b2c46
ci(release): 1.19.0-beta.8 [skip ci]
semantic-release-bot Sep 12, 2024
c3d1b7c
fix: OmniScraerGraph working.
LorenzoPaleari Sep 12, 2024
039ba2e
fix: Fixed pydantic error on SearchGraphs
LorenzoPaleari Sep 12, 2024
66ea166
fix: Added support for nested structure
LorenzoPaleari Sep 13, 2024
a92dddb
fix: update all nodes that were using MergeNode or IteratorNode
LorenzoPaleari Sep 13, 2024
da827a7
Merge pull request #662 from LorenzoPaleari/580-omni-scraper-not-working
VinciGit00 Sep 13, 2024
7ad6f21
ci(release): 1.19.0-beta.9 [skip ci]
semantic-release-bot Sep 13, 2024
2ae26e9
Merge pull request #664 from LorenzoPaleari/598-fix-pydantic-validati…
VinciGit00 Sep 13, 2024
92f5df2
ci(release): 1.19.0-beta.10 [skip ci]
semantic-release-bot Sep 13, 2024
9e3171b
feat: add copy for smart_scraper_multi_concat
VinciGit00 Sep 13, 2024
40f54cb
Merge branch 'pre/beta' of https://github.com/ScrapeGraphAI/Scrapegra…
VinciGit00 Sep 13, 2024
edfb185
ci(release): 1.19.0-beta.11 [skip ci]
semantic-release-bot Sep 13, 2024
063dd1a
Merge pull request #667 from ScrapeGraphAI/main
VinciGit00 Sep 13, 2024
e657113
fix: Refactor code to use CustomOpenAiCallbackManager for exclusive a…
LorenzoPaleari Sep 14, 2024
d7afdb1
Merge pull request #670 from LorenzoPaleari/576-exec-info-misses-nest…
VinciGit00 Sep 14, 2024
bd2afef
ci(release): 1.19.0-beta.12 [skip ci]
semantic-release-bot Sep 14, 2024
5d1fe68
Merge branch 'pre/beta' into temp
VinciGit00 Sep 14, 2024
0eec93e
Merge pull request #672 from ScrapeGraphAI/temp
VinciGit00 Sep 14, 2024
cc8392e
ci(release): 1.20.0-beta.1 [skip ci]
semantic-release-bot Sep 14, 2024
7681a45
fix: Add mistral-common dependency
LorenzoPaleari Sep 14, 2024
cb505ce
Merge pull request #673 from LorenzoPaleari/fix-no-mistral_common-error
VinciGit00 Sep 15, 2024
c717bb6
Merge branch 'main' into pre/beta
VinciGit00 Sep 16, 2024
3f45c17
fix: fetch_node condition
VinciGit00 Sep 16, 2024
5b5cb5b
fix: Error in pyproject dependencies
LorenzoPaleari Sep 17, 2024
1b85f54
Merge pull request #677 from LorenzoPaleari/fix-pyproject-error
VinciGit00 Sep 17, 2024
4f8b55d
ci(release): 1.20.0-beta.2 [skip ci]
semantic-release-bot Sep 17, 2024
28b85a3
refactor: Output parser code
LorenzoPaleari Sep 17, 2024
eb89549
feat: updated pydantic to v2
LorenzoPaleari Sep 17, 2024
8a37c6b
feat: added Bedrock and Mistral to exec info
LorenzoPaleari Sep 18, 2024
e6e2ce6
Merge pull request #679 from LorenzoPaleari/output-parser-update
VinciGit00 Sep 18, 2024
cca783c
ci(release): 1.20.0-beta.3 [skip ci]
semantic-release-bot Sep 18, 2024
932412e
fix: update pyproject.toml
VinciGit00 Sep 18, 2024
1079553
Merge branch 'pre/beta' of https://github.com/ScrapeGraphAI/Scrapegra…
VinciGit00 Sep 18, 2024
c81f970
ci(release): 1.20.0-beta.4 [skip ci]
semantic-release-bot Sep 18, 2024
95a5ee2
Merge pull request #680 from LorenzoPaleari/exec-info-enhanced
VinciGit00 Sep 18, 2024
6ae2028
Update CONTRIBUTING.md
VinciGit00 Sep 18, 2024
b0fef3f
ci(release): 1.20.0-beta.5 [skip ci]
semantic-release-bot Sep 18, 2024
0cdd47e
Merge branch 'main' into temp
VinciGit00 Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
339 changes: 336 additions & 3 deletions CHANGELOG.md

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,10 @@ If you encounter any issues or have suggestions for improvements, please open an
ScrapeGraphAI is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more information.
By contributing to this project, you agree to license your contributions under the same license.

ScrapeGraphAI uses code from the Langchain
frameworks. You find their original licenses below.

LANGCHAIN LICENSE
https://github.com/langchain-ai/langchain/blob/master/LICENSE

Can't wait to see your contributions! :smile:
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,10 @@ Additional dependecies can be added while installing the library:

- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.

```bash
pip install scrapegraphai[other-language-models]
```

This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
```bash
pip install scrapegraphai[other-language-models]

- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.

Expand All @@ -58,6 +59,13 @@ Additional dependecies can be added while installing the library:



### Installing "More Browser Options"

This group includes an ocr scraper for websites
```bash
pip install scrapegraphai[screenshot_scraper]
```

## 💻 Usage
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).

Expand Down
2 changes: 1 addition & 1 deletion docs/chinese.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ from scrapegraphai.graphs import SpeechGraph
graph_config = {
"llm": {
"api_key": "OPENAI_API_KEY",
"model": "gpt-3.5-turbo",
"model": "openai/gpt-3.5-turbo",
},
"tts_model": {
"api_key": "OPENAI_API_KEY",
Expand Down
2 changes: 1 addition & 1 deletion docs/japanese.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ from scrapegraphai.graphs import SpeechGraph
graph_config = {
"llm": {
"api_key": "OPENAI_API_KEY",
"model": "gpt-3.5-turbo",
"model": "openai/gpt-3.5-turbo",
},
"tts_model": {
"api_key": "OPENAI_API_KEY",
Expand Down
2 changes: 1 addition & 1 deletion docs/korean.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ from scrapegraphai.graphs import SpeechGraph
graph_config = {
"llm": {
"api_key": "OPENAI_API_KEY",
"model": "gpt-3.5-turbo",
"model": "openai/gpt-3.5-turbo",
},
"tts_model": {
"api_key": "OPENAI_API_KEY",
Expand Down
2 changes: 1 addition & 1 deletion docs/russian.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ from scrapegraphai.graphs import SpeechGraph
graph_config = {
"llm": {
"api_key": "OPENAI_API_KEY",
"model": "gpt-3.5-turbo",
"model": "openai/gpt-3.5-turbo",
},
"tts_model": {
"api_key": "OPENAI_API_KEY",
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ OpenAI models
graph_config = {
"llm": {
"api_key": openai_key,
"model": "gpt-3.5-turbo",
"model": "openai/gpt-3.5-turbo",
},
}

Expand Down
2 changes: 1 addition & 1 deletion examples/anthropic/custom_graph_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

fetch_node = FetchNode(
input="url | local_dir",
output=["doc", "link_urls", "img_urls"],
output=["doc"],
node_config={
"verbose": True,
"headless": True,
Expand Down
48 changes: 48 additions & 0 deletions examples/anthropic/rate_limit_haiku.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
"""
Basic example of scraping pipeline using SmartScraper while setting an API rate limit.
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info


# required environment variables in .env
# ANTHROPIC_API_KEY
load_dotenv()

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "anthropic/claude-3-haiku-20240307",
"rate_limit": {
"requests_per_second": 1
}
},
}

smart_scraper_graph = SmartScraperGraph(
prompt="""Don't say anything else. Output JSON only. List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
# also accepts a string with the already downloaded HTML code
source="https://www.hmhco.com/event",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
2 changes: 1 addition & 1 deletion examples/anthropic/search_graph_schema_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os
from typing import List
from dotenv import load_dotenv
from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from scrapegraphai.graphs import SearchGraph

load_dotenv()
Expand Down
39 changes: 39 additions & 0 deletions examples/anthropic/smart_scraper_multi_concat_haiku.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""
Basic example of scraping pipeline using SmartScraper
"""

import os
import json
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperMultiConcatGraph

load_dotenv()

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "anthropic/claude-3-haiku-20240307",
},
}


# *******************************************************
# Create the SmartScraperMultiGraph instance and run it
# *******************************************************

multiple_search_graph = SmartScraperMultiConcatGraph(
prompt="Who is Marco Perini?",
source= [
"https://perinim.github.io/",
"https://perinim.github.io/cv/"
],
schema=None,
config=graph_config
)

result = multiple_search_graph.run()
print(json.dumps(result, indent=4))
2 changes: 1 addition & 1 deletion examples/anthropic/smart_scraper_schema_haiku.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import os
from typing import List
from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
Expand Down
57 changes: 57 additions & 0 deletions examples/azure/rate_limit_azure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Basic example of scraping pipeline using SmartScraper with a custom rate limit
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info


# required environment variable in .env
# AZURE_OPENAI_ENDPOINT
# AZURE_OPENAI_CHAT_DEPLOYMENT_NAME
# MODEL_NAME
# AZURE_OPENAI_API_KEY
# OPENAI_API_TYPE
# AZURE_OPENAI_API_VERSION
# AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
load_dotenv()


# ************************************************
# Initialize the model instances
# ************************************************

graph_config = {
"llm": {
"api_key": os.environ["AZURE_OPENAI_KEY"],
"model": "azure_openai/gpt-3.5-turbo",
"rate_limit": {
"requests_per_second": 1
},
},
"verbose": True,
"headless": False
}

smart_scraper_graph = SmartScraperGraph(
prompt="""List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
# also accepts a string with the already downloaded HTML code
source="https://www.hmhco.com/event",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
2 changes: 1 addition & 1 deletion examples/azure/search_graph_schema_azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from scrapegraphai.graphs import SearchGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info

from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from typing import List

# ************************************************
Expand Down
4 changes: 2 additions & 2 deletions examples/azure/smart_scraper_multi_azure.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""
Basic example of scraping pipeline using SmartScraper
"""

import os, json
import os
import json
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperMultiGraph

Expand Down
39 changes: 39 additions & 0 deletions examples/azure/smart_scraper_multi_concat_azure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""
Basic example of scraping pipeline using SmartScraper
"""

import os
import json
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperMultiConcatGraph

load_dotenv()

# ************************************************
# Define the configuration for the graph
# ************************************************
graph_config = {
"llm": {
"api_key": os.environ["AZURE_OPENAI_KEY"],
"model": "azure_openai/gpt-3.5-turbo",
},
"verbose": True,
"headless": False
}

# *******************************************************
# Create the SmartScraperMultiGraph instance and run it
# *******************************************************

multiple_search_graph = SmartScraperMultiConcatGraph(
prompt="Who is Marco Perini?",
source= [
"https://perinim.github.io/",
"https://perinim.github.io/cv/"
],
schema=None,
config=graph_config
)

result = multiple_search_graph.run()
print(json.dumps(result, indent=4))
2 changes: 1 addition & 1 deletion examples/azure/smart_scraper_schema_azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os
import json
from typing import List
from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph

Expand Down
2 changes: 1 addition & 1 deletion examples/bedrock/custom_graph_bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@

fetch_node = FetchNode(
input="url | local_dir",
output=["doc", "link_urls", "img_urls"],
output=["doc"],
node_config={
"verbose": True,
"headless": True,
Expand Down
47 changes: 47 additions & 0 deletions examples/bedrock/rate_limit_bedrock.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""
Basic example of scraping pipeline using SmartScraper with a custom rate limit
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

load_dotenv()


# ************************************************
# Define the configuration for the graph
# ************************************************

graph_config = {
"llm": {
"client": "client_name",
"model": "bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
"temperature": 0.0,
"rate_limit": {
"requests_per_second": 1
},
}
}

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects/",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
2 changes: 1 addition & 1 deletion examples/bedrock/search_graph_schema_bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from scrapegraphai.graphs import SearchGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info

from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from typing import List

# ************************************************
Expand Down
Loading
Loading