Skip to content

Commit 5a67bca

Browse files
committed
Merge branch 'pre/beta' into pr/161
2 parents 2ac9e16 + ac6d200 commit 5a67bca

39 files changed

+628
-244
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,4 @@ poetry.lock
3636
# lock files
3737
*.lock
3838
poetry.lock
39+

CHANGELOG.md

Lines changed: 41 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,58 @@
1-
## [0.9.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.8.0...v0.9.0) (2024-05-04)
1+
## [0.9.0-beta.8](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.7...v0.9.0-beta.8) (2024-05-06)
22

33

44
### Features
55

6-
* Enable end users to pass model instances of HuggingFaceHub ([7599234](https://github.com/VinciGit00/Scrapegraph-ai/commit/7599234ab9563ca4ee9b7f5b2d0267daac621ecf))
6+
* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
7+
8+
## [0.9.0-beta.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.6...v0.9.0-beta.7) (2024-05-06)
79

810

911
### Bug Fixes
1012

11-
* trailing whitespace ([2878695](https://github.com/VinciGit00/Scrapegraph-ai/commit/2878695d5f35cc9d81f24e4844fdc1988d10cb26))
13+
* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))
1214

15+
## [0.9.0-beta.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.5...v0.9.0-beta.6) (2024-05-06)
1316

14-
### Build
1517

16-
* **deps:** bump tqdm from 4.66.1 to 4.66.3 ([0a17c74](https://github.com/VinciGit00/Scrapegraph-ai/commit/0a17c74e50d0457aec289e81183e9c779c735842))
17-
* **deps:** bump tqdm from 4.66.1 to 4.66.3 ([aff6f98](https://github.com/VinciGit00/Scrapegraph-ai/commit/aff6f983b02a37ced21826847a6ace5fb15ecf3d))
18+
### Features
1819

20+
* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
1921

20-
### CI
22+
## [0.9.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.4...v0.9.0-beta.5) (2024-05-06)
2123

22-
* **release:** 0.8.0-beta.1 [skip ci] ([d277b34](https://github.com/VinciGit00/Scrapegraph-ai/commit/d277b349a98848749a7e38ea3c511271bced3b71))
23-
* **release:** 0.8.0-beta.2 [skip ci] ([892500a](https://github.com/VinciGit00/Scrapegraph-ai/commit/892500afe93c4d96dcffe897b382977a22079b83))
24-
* **release:** 0.9.0-beta.1 [skip ci] ([14615a7](https://github.com/VinciGit00/Scrapegraph-ai/commit/14615a73c71bb5250772a75c415c57cb153660f8))
24+
25+
### Features
26+
27+
* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
28+
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
29+
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
30+
31+
## [0.9.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.3...v0.9.0-beta.4) (2024-05-05)
32+
33+
34+
### Features
35+
36+
* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
37+
38+
## [0.9.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.2...v0.9.0-beta.3) (2024-05-05)
39+
40+
41+
### Features
42+
43+
* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
44+
45+
## [0.9.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.1...v0.9.0-beta.2) (2024-05-05)
46+
47+
48+
### Features
49+
50+
* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))
51+
52+
53+
### Bug Fixes
54+
55+
* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
2556

2657
## [0.9.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.8.0...v0.9.0-beta.1) (2024-05-04)
2758

SECURITY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
## Reporting a Vulnerability
44

55
For reporting a vulnerability contact directly [email protected]
6+

examples/openai/custom_graph_openai.py

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
import os
66
from dotenv import load_dotenv
7+
8+
from langchain_openai import OpenAIEmbeddings
79
from scrapegraphai.models import OpenAI
810
from scrapegraphai.graphs import BaseGraph
911
from scrapegraphai.nodes import FetchNode, ParseNode, RAGNode, GenerateAnswerNode, RobotsNode
@@ -20,7 +22,7 @@
2022
"api_key": openai_key,
2123
"model": "gpt-3.5-turbo",
2224
"temperature": 0,
23-
"streaming": True
25+
"streaming": False
2426
},
2527
}
2628

@@ -29,33 +31,50 @@
2931
# ************************************************
3032

3133
llm_model = OpenAI(graph_config["llm"])
34+
embedder = OpenAIEmbeddings(api_key=llm_model.openai_api_key)
3235

3336
# define the nodes for the graph
3437
robot_node = RobotsNode(
3538
input="url",
3639
output=["is_scrapable"],
37-
node_config={"llm": llm_model}
40+
node_config={
41+
"llm_model": llm_model,
42+
"verbose": True,
43+
}
3844
)
3945

4046
fetch_node = FetchNode(
4147
input="url | local_dir",
4248
output=["doc"],
43-
node_config={"headless": True, "verbose": True}
49+
node_config={
50+
"verbose": True,
51+
"headless": True,
52+
}
4453
)
4554
parse_node = ParseNode(
4655
input="doc",
4756
output=["parsed_doc"],
48-
node_config={"chunk_size": 4096}
57+
node_config={
58+
"chunk_size": 4096,
59+
"verbose": True,
60+
}
4961
)
5062
rag_node = RAGNode(
5163
input="user_prompt & (parsed_doc | doc)",
5264
output=["relevant_chunks"],
53-
node_config={"llm": llm_model},
65+
node_config={
66+
"llm_model": llm_model,
67+
"embedder_model": embedder,
68+
"verbose": True,
69+
}
5470
)
5571
generate_answer_node = GenerateAnswerNode(
5672
input="user_prompt & (relevant_chunks | parsed_doc | doc)",
5773
output=["answer"],
58-
node_config={"llm": llm_model},
74+
node_config={
75+
"llm_model": llm_model,
76+
"verbose": True,
77+
}
5978
)
6079

6180
# ************************************************

examples/openai/search_graph_multi.py

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
"""
2+
Example of custom graph using existing nodes
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from langchain_openai import OpenAIEmbeddings
8+
from scrapegraphai.models import OpenAI
9+
from scrapegraphai.graphs import BaseGraph, SmartScraperGraph
10+
from scrapegraphai.nodes import SearchInternetNode, GraphIteratorNode, MergeAnswersNode
11+
load_dotenv()
12+
13+
# ************************************************
14+
# Define the configuration for the graph
15+
# ************************************************
16+
17+
openai_key = os.getenv("OPENAI_APIKEY")
18+
19+
graph_config = {
20+
"llm": {
21+
"api_key": openai_key,
22+
"model": "gpt-3.5-turbo",
23+
},
24+
}
25+
26+
# ************************************************
27+
# Create a SmartScraperGraph instance
28+
# ************************************************
29+
30+
smart_scraper_graph = SmartScraperGraph(
31+
prompt="",
32+
source="",
33+
config=graph_config
34+
)
35+
36+
# ************************************************
37+
# Define the graph nodes
38+
# ************************************************
39+
40+
llm_model = OpenAI(graph_config["llm"])
41+
embedder = OpenAIEmbeddings(api_key=llm_model.openai_api_key)
42+
43+
search_internet_node = SearchInternetNode(
44+
input="user_prompt",
45+
output=["urls"],
46+
node_config={
47+
"llm_model": llm_model,
48+
"max_results": 5, # num of search results to fetch
49+
"verbose": True,
50+
}
51+
)
52+
53+
graph_iterator_node = GraphIteratorNode(
54+
input="user_prompt & urls",
55+
output=["results"],
56+
node_config={
57+
"graph_instance": smart_scraper_graph,
58+
"verbose": True,
59+
}
60+
)
61+
62+
merge_answers_node = MergeAnswersNode(
63+
input="user_prompt & results",
64+
output=["answer"],
65+
node_config={
66+
"llm_model": llm_model,
67+
"verbose": True,
68+
}
69+
)
70+
71+
# ************************************************
72+
# Create the graph by defining the connections
73+
# ************************************************
74+
75+
graph = BaseGraph(
76+
nodes=[
77+
search_internet_node,
78+
graph_iterator_node,
79+
merge_answers_node
80+
],
81+
edges=[
82+
(search_internet_node, graph_iterator_node),
83+
(graph_iterator_node, merge_answers_node)
84+
],
85+
entry_point=search_internet_node
86+
)
87+
88+
# ************************************************
89+
# Execute the graph
90+
# ************************************************
91+
92+
result, execution_info = graph.execute({
93+
"user_prompt": "List me all the typical Chioggia dishes."
94+
})
95+
96+
# get the answer from the result
97+
result = result.get("answer", "No answer found.")
98+
print(result)

examples/openai/search_graph_openai.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,16 @@
1919
"api_key": openai_key,
2020
"model": "gpt-3.5-turbo",
2121
},
22+
"max_results": 5,
23+
"verbose": True,
2224
}
2325

2426
# ************************************************
2527
# Create the SearchGraph instance and run it
2628
# ************************************************
2729

2830
search_graph = SearchGraph(
29-
prompt="List me top 5 eyeliner products for a gift.",
31+
prompt="List me the best escursions near Trento",
3032
config=graph_config
3133
)
3234

examples/openai/smart_scraper_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
"api_key": openai_key,
2222
"model": "gpt-3.5-turbo",
2323
},
24-
"verbose": True,
24+
"verbose": False,
2525
}
2626

2727
# ************************************************

examples/single_node/robot_node.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
robots_node = RobotsNode(
2727
input="url",
2828
output=["is_scrapable"],
29-
node_config={"llm": llm_model,
29+
node_config={"llm_model": llm_model,
3030
"headless": False
3131
}
3232
)

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[tool.poetry]
22
name = "scrapegraphai"
33

4-
version = "0.9.0"
4+
version = "0.9.0b8"
55

66
description = "A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines."
77
authors = [
@@ -41,7 +41,8 @@ free-proxy = "1.1.1"
4141
langchain-groq = "0.1.3"
4242
playwright = "^1.43.0"
4343
langchain-aws = "^0.1.2"
44-
44+
langchain-anthropic = "^0.1.11"
45+
yahoo-search-py="^0.3"
4546

4647
[tool.poetry.dev-dependencies]
4748
pytest = "8.0.0"

requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,5 @@ free-proxy==1.1.1
1616
langchain-groq==0.1.3
1717
playwright==1.43.0
1818
langchain-aws==0.1.2
19+
langchain-anthropic==0.1.11
20+
yahoo-search-py==0.3

scrapegraphai/graphs/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
__init__.py file for graphs folder
33
"""
44

5+
from .abstract_graph import AbstractGraph
56
from .base_graph import BaseGraph
67
from .smart_scraper_graph import SmartScraperGraph
78
from .speech_graph import SpeechGraph

0 commit comments

Comments
 (0)