Skip to content

Commit 14c5e6b

Browse files
authored
Merge branch 'pre/beta' into temp
2 parents c5ffdef + a73fec5 commit 14c5e6b

File tree

125 files changed

+3165
-447
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+3165
-447
lines changed

CHANGELOG.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,15 @@
88
## [1.18.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0...v1.18.0) (2024-09-08)
99

1010

11+
1112
### Features
1213

1314
* **browser_base_fetch:** add async_mode to support both synchronous and asynchronous execution ([d56253d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d56253d183969584cacc0cb164daa0152462f21c))
1415

1516
## [1.17.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.16.0...v1.17.0) (2024-09-08)
1617

1718

19+
1820
### Features
1921

2022
* **docloaders:** Enhance browser_base_fetch function flexibility ([57fd01f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/57fd01f9a76ea8ea69ec04b7238ab58ca72ac8f4))
@@ -24,9 +26,84 @@
2426

2527
* **sponsor:** 🅱️ Browserbase sponsor 🅱️ ([a540139](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a5401394cc939d9a5fc58b8a9145141c2f047bab))
2628

29+
* **AbstractGraph:** add adjustable rate limit ([2859fb7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/2859fb72d699f26b617ed2f949cdcfca1671c5c8))
30+
31+
## [1.17.0-beta.7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.6...v1.17.0-beta.7) (2024-09-05)
32+
33+
34+
### Bug Fixes
35+
36+
* **AbstractGraph:** Bedrock init issues ([63a5d18](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/63a5d18486789ce1b4a8f5ea661fc83779fceca2)), closes [#633](https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/633)
37+
38+
## [1.17.0-beta.6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.5...v1.17.0-beta.6) (2024-09-04)
39+
40+
41+
### Bug Fixes
42+
43+
* **ScreenShotScraper:** static import of optional dependencies ([52fe441](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/52fe441c5af9c728983a2c3cd880fe9afcb5d428))
44+
45+
## [1.17.0-beta.5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.4...v1.17.0-beta.5) (2024-09-02)
46+
47+
48+
### Bug Fixes
49+
50+
* correctly parsing output when using structured_output ([8e74ac5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/8e74ac55a16ca012b52affbc754e4b04130e65db))
51+
52+
## [1.17.0-beta.4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.3...v1.17.0-beta.4) (2024-09-02)
53+
54+
55+
### Bug Fixes
56+
57+
* Parse Node scraping link and img urls allowing OmniScraper to work ([66a3b6d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/66a3b6d6a3efdf1ee72b802fc9bf8175482c45bd))
58+
* Removed link_urls and img_ulrs from FetchNode output ([57337a0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/57337a0a8c86fb28c9ccbd70d41acfc9abea11f0))
59+
60+
## [1.17.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.2...v1.17.0-beta.3) (2024-09-02)
61+
62+
63+
### Bug Fixes
64+
65+
* **ScreenshotScraper:** impose dynamic imports ([b8ef937](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b8ef93738ec4ae48c361fe5650df5194e845a2b1))
66+
* **SmartScraper:** pass llm_model to ParseNode ([5242166](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/52421665759032bcfad80ce540efebe5f47310f6))
67+
68+
## [1.17.0-beta.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.1...v1.17.0-beta.2) (2024-09-02)
69+
70+
71+
### Bug Fixes
72+
73+
* **Ollama:** instance model from correct package ([398b2c5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/398b2c556faf518ca28ccc284bc8761a16281cf7))
74+
* **DeepSeek:** proper model initialization ([74dfc69](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/74dfc693f6e487d20da58704284fe9f492d2b2aa))
75+
* screenshot scraper ([388630c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/388630c0ffa2850c3d5ea47e62b71b41795203d8))
76+
77+
## [1.17.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.16.0...v1.17.0-beta.1) (2024-09-02)
78+
79+
80+
### Features
81+
82+
* add togheterai ([8f615ad](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/8f615adef320dacdd214a184981384dd05df8171))
83+
84+
85+
### Bug Fixes
86+
87+
* update generate answernode ([c348f67](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/c348f674ad0caae4f4dc04e194fae9634e01b621))
88+
89+
90+
### chore
91+
92+
* **examples:** create Together AI examples ([34942de](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/34942deca514df53e8aa1c7f96f812ee78b994bf))
93+
94+
95+
### CI
96+
97+
* **release:** 1.16.0-beta.1 [skip ci] ([d7f6036](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d7f6036f907eda8d1faa0944da4d1d168ca4c40e))
98+
* **release:** 1.16.0-beta.2 [skip ci] ([1c37d5d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/1c37d5db1c637f791133df254838a0deade6d6be))
99+
* **release:** 1.16.0-beta.3 [skip ci] ([886c987](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/886c987172bb57fb59863e4d7b494797bba16980))
100+
* **release:** 1.16.0-beta.4 [skip ci] ([ba5c7ad](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ba5c7adcea138d993005377f4cfe438795e1b124))
101+
102+
27103
## [1.16.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.2...v1.16.0) (2024-09-01)
28104

29105

106+
30107
### Features
31108

32109
* add deepcopy error ([71b22d4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/71b22d48804c462798109bb47ec792a5a3c70b6e))
@@ -40,10 +117,28 @@
40117
## [1.15.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.1...v1.15.2) (2024-09-01)
41118

42119

120+
## [1.16.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.16.0-beta.2...v1.16.0-beta.3) (2024-09-01)
121+
122+
43123
### Bug Fixes
44124

45125
* pyproject.toml ([360ce1c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/360ce1c0e468c959e63555120ac7cecf55563846))
46126

127+
128+
### CI
129+
130+
* **release:** 1.15.2 [skip ci] ([d88730c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d88730ccc7190d09a54e6c24db1644512b576430))
131+
132+
## [1.15.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.1...v1.15.2) (2024-09-01)
133+
134+
135+
136+
137+
### Bug Fixes
138+
139+
* pyproject.toml ([360ce1c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/360ce1c0e468c959e63555120ac7cecf55563846))
140+
141+
47142
## [1.15.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0...v1.15.1) (2024-08-28)
48143

49144

README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,10 @@ Additional dependecies can be added while installing the library:
3838

3939
- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
4040

41-
```bash
42-
pip install scrapegraphai[other-language-models]
43-
```
41+
42+
This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
43+
```bash
44+
pip install scrapegraphai[other-language-models]
4445

4546
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.
4647

@@ -58,6 +59,13 @@ Additional dependecies can be added while installing the library:
5859

5960

6061

62+
### Installing "More Browser Options"
63+
64+
This group includes an ocr scraper for websites
65+
```bash
66+
pip install scrapegraphai[screenshot_scraper]
67+
```
68+
6169
## 💻 Usage
6270
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
6371

examples/anthropic/custom_graph_haiku.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040

4141
fetch_node = FetchNode(
4242
input="url | local_dir",
43-
output=["doc", "link_urls", "img_urls"],
43+
output=["doc"],
4444
node_config={
4545
"verbose": True,
4646
"headless": True,
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper while setting an API rate limit.
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from scrapegraphai.graphs import SmartScraperGraph
8+
from scrapegraphai.utils import prettify_exec_info
9+
10+
11+
# required environment variables in .env
12+
# ANTHROPIC_API_KEY
13+
load_dotenv()
14+
15+
# ************************************************
16+
# Create the SmartScraperGraph instance and run it
17+
# ************************************************
18+
19+
graph_config = {
20+
"llm": {
21+
"api_key": os.getenv("ANTHROPIC_API_KEY"),
22+
"model": "anthropic/claude-3-haiku-20240307",
23+
"rate_limit": {
24+
"requests_per_second": 1
25+
}
26+
},
27+
}
28+
29+
smart_scraper_graph = SmartScraperGraph(
30+
prompt="""Don't say anything else. Output JSON only. List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
31+
event_end_date, event_end_time, location, event_mode, event_category,
32+
third_party_redirect, no_of_days,
33+
time_in_hours, hosted_or_attending, refreshments_type,
34+
registration_available, registration_link""",
35+
# also accepts a string with the already downloaded HTML code
36+
source="https://www.hmhco.com/event",
37+
config=graph_config
38+
)
39+
40+
result = smart_scraper_graph.run()
41+
print(result)
42+
43+
# ************************************************
44+
# Get graph execution info
45+
# ************************************************
46+
47+
graph_exec_info = smart_scraper_graph.get_execution_info()
48+
print(prettify_exec_info(graph_exec_info))
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper
3+
"""
4+
5+
import os
6+
import json
7+
from dotenv import load_dotenv
8+
from scrapegraphai.graphs import SmartScraperMultiConcatGraph
9+
10+
load_dotenv()
11+
12+
# ************************************************
13+
# Create the SmartScraperGraph instance and run it
14+
# ************************************************
15+
16+
graph_config = {
17+
"llm": {
18+
"api_key": os.getenv("ANTHROPIC_API_KEY"),
19+
"model": "anthropic/claude-3-haiku-20240307",
20+
},
21+
}
22+
23+
24+
# *******************************************************
25+
# Create the SmartScraperMultiGraph instance and run it
26+
# *******************************************************
27+
28+
multiple_search_graph = SmartScraperMultiConcatGraph(
29+
prompt="Who is Marco Perini?",
30+
source= [
31+
"https://perinim.github.io/",
32+
"https://perinim.github.io/cv/"
33+
],
34+
schema=None,
35+
config=graph_config
36+
)
37+
38+
result = multiple_search_graph.run()
39+
print(json.dumps(result, indent=4))

examples/azure/rate_limit_azure.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper with a custom rate limit
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from scrapegraphai.graphs import SmartScraperGraph
8+
from scrapegraphai.utils import prettify_exec_info
9+
10+
11+
# required environment variable in .env
12+
# AZURE_OPENAI_ENDPOINT
13+
# AZURE_OPENAI_CHAT_DEPLOYMENT_NAME
14+
# MODEL_NAME
15+
# AZURE_OPENAI_API_KEY
16+
# OPENAI_API_TYPE
17+
# AZURE_OPENAI_API_VERSION
18+
# AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
19+
load_dotenv()
20+
21+
22+
# ************************************************
23+
# Initialize the model instances
24+
# ************************************************
25+
26+
graph_config = {
27+
"llm": {
28+
"api_key": os.environ["AZURE_OPENAI_KEY"],
29+
"model": "azure_openai/gpt-3.5-turbo",
30+
"rate_limit": {
31+
"requests_per_second": 1
32+
},
33+
},
34+
"verbose": True,
35+
"headless": False
36+
}
37+
38+
smart_scraper_graph = SmartScraperGraph(
39+
prompt="""List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
40+
event_end_date, event_end_time, location, event_mode, event_category,
41+
third_party_redirect, no_of_days,
42+
time_in_hours, hosted_or_attending, refreshments_type,
43+
registration_available, registration_link""",
44+
# also accepts a string with the already downloaded HTML code
45+
source="https://www.hmhco.com/event",
46+
config=graph_config
47+
)
48+
49+
result = smart_scraper_graph.run()
50+
print(result)
51+
52+
# ************************************************
53+
# Get graph execution info
54+
# ************************************************
55+
56+
graph_exec_info = smart_scraper_graph.get_execution_info()
57+
print(prettify_exec_info(graph_exec_info))

examples/azure/smart_scraper_multi_azure.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
"""
22
Basic example of scraping pipeline using SmartScraper
33
"""
4-
5-
import os, json
4+
import os
5+
import json
66
from dotenv import load_dotenv
77
from scrapegraphai.graphs import SmartScraperMultiGraph
88

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper
3+
"""
4+
5+
import os
6+
import json
7+
from dotenv import load_dotenv
8+
from scrapegraphai.graphs import SmartScraperMultiConcatGraph
9+
10+
load_dotenv()
11+
12+
# ************************************************
13+
# Define the configuration for the graph
14+
# ************************************************
15+
graph_config = {
16+
"llm": {
17+
"api_key": os.environ["AZURE_OPENAI_KEY"],
18+
"model": "azure_openai/gpt-3.5-turbo",
19+
},
20+
"verbose": True,
21+
"headless": False
22+
}
23+
24+
# *******************************************************
25+
# Create the SmartScraperMultiGraph instance and run it
26+
# *******************************************************
27+
28+
multiple_search_graph = SmartScraperMultiConcatGraph(
29+
prompt="Who is Marco Perini?",
30+
source= [
31+
"https://perinim.github.io/",
32+
"https://perinim.github.io/cv/"
33+
],
34+
schema=None,
35+
config=graph_config
36+
)
37+
38+
result = multiple_search_graph.run()
39+
print(json.dumps(result, indent=4))

examples/bedrock/custom_graph_bedrock.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555

5656
fetch_node = FetchNode(
5757
input="url | local_dir",
58-
output=["doc", "link_urls", "img_urls"],
58+
output=["doc"],
5959
node_config={
6060
"verbose": True,
6161
"headless": True,

0 commit comments

Comments
 (0)