Skip to content

Commit b1d3804

Browse files
authored
Merge pull request #585 from ScrapeGraphAI/anthropic-refactoring
Anthropic refactoring
2 parents 88e76ce + 37a4a8a commit b1d3804

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+363
-232
lines changed

CHANGELOG.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,44 @@
1-
## [1.14.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.0...v1.14.1) (2024-08-24)
1+
## [1.15.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0-beta.2...v1.15.0-beta.3) (2024-08-24)
2+
3+
4+
5+
### Bug Fixes
6+
7+
* update abstract graph ([86fe5fc](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/86fe5fcaf1a6ba28786678874378f07fba1db40f))
8+
9+
## [1.15.0-beta.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.0-beta.1...v1.15.0-beta.2) (2024-08-23)
210

311

412
### Bug Fixes
513

6-
* add claude3.5 sonnet ([ee8f8b3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ee8f8b31ecfe4ffd311528d2f48cb055e4609d99))
14+
* abstract graph ([cf1fada](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cf1fada36a6716cb0e24bbc5da7509446a964145))
15+
716

817

918
### Docs
1019

1120
* added sponsors ([b3a2d0d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b3a2d0d65a41f6e645fac3fc84f702fdf64b951c))
1221

22+
## [1.15.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.1-beta.1...v1.15.0-beta.1) (2024-08-23)
23+
24+
25+
### Features
26+
27+
* ligthweigthing the library ([62f32e9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/62f32e994bcb748dfef4f7e1b2e5213a989c33cc))
28+
29+
30+
### Bug Fixes
31+
32+
* Azure OpenAI issue ([a92b9c6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a92b9c6970049a4ba9dbdf8eff3eeb7f98c6c639))
33+
34+
## [1.14.1-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.14.0...v1.14.1-beta.1) (2024-08-21)
35+
36+
37+
### Bug Fixes
38+
39+
* **models_tokens:** add llama2 and llama3 sizes explicitly ([b05ec16](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b05ec16b252d00c9c9ee7c6d4605b420851c7754))
40+
41+
1342
## [1.14.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.13.3...v1.14.0) (2024-08-20)
1443

1544

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,28 @@ playwright install
3232

3333
**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
3434

35+
By the way if you to use not mandatory modules it is necessary to install by yourself with the following command:
36+
37+
### Installing "Other Language Models"
38+
39+
This group allows you to use additional language models like Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
40+
```bash
41+
pip install scrapegraphai[other-language-models]
42+
43+
```
44+
### Installing "More Semantic Options"
45+
46+
This group includes tools for advanced semantic processing, such as Graphviz.
47+
```bash
48+
pip install scrapegraphai[more-semantic-options]
49+
```
50+
### Installing "More Browser Options"
51+
52+
This group includes additional browser management options, such as BrowserBase.
53+
```bash
54+
pip install scrapegraphai[more-browser-options]
55+
```
56+
3557
## 💻 Usage
3658
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
3759

docs/README.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,6 @@ markmap:
99

1010
## **Short-Term Goals**
1111

12-
- Integration with more llm APIs
13-
14-
- Test proxy rotation implementation
15-
16-
- Add more search engines inside the SearchInternetNode
17-
1812
- Improve the documentation (ReadTheDocs)
1913
- [Issue #102](https://github.com/VinciGit00/Scrapegraph-ai/issues/102)
2014

@@ -23,9 +17,6 @@ markmap:
2317
## **Medium-Term Goals**
2418

2519
- Node for handling API requests
26-
27-
- Improve SearchGraph to look into the first 5 results of the search engine
28-
2920
- Make scraping more deterministic
3021
- Create DOM tree of the website
3122
- HTML tag text embeddings with tags metadata
@@ -70,5 +61,3 @@ markmap:
7061
- Automatic generation of scraping pipelines from a given prompt
7162

7263
- Create API for the library
73-
74-
- Finetune a LLM for html content

docs/source/scrapers/llm.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,3 +194,35 @@ We can also pass a model instance for the chat model and the embedding model. Fo
194194
"model_instance": embedder_model_instance
195195
}
196196
}
197+
198+
Other LLM models
199+
^^^^^^^^^^^^^^^^
200+
201+
We can also pass a model instance for the chat model and the embedding model through the **model_instance** parameter.
202+
This feature enables you to utilize a Langchain model instance.
203+
You will discover the model you require within the provided list:
204+
205+
- `chat model list <https://python.langchain.com/v0.2/docs/integrations/chat/#all-chat-models>`_
206+
- `embedding model list <https://python.langchain.com/v0.2/docs/integrations/text_embedding/#all-embedding-models>`_.
207+
208+
For instance, consider **chat model** Moonshot. We can integrate it in the following manner:
209+
210+
.. code-block:: python
211+
212+
from langchain_community.chat_models.moonshot import MoonshotChat
213+
214+
# The configuration parameters are contingent upon the specific model you select
215+
llm_instance_config = {
216+
"model": "moonshot-v1-8k",
217+
"base_url": "https://api.moonshot.cn/v1",
218+
"moonshot_api_key": "MOONSHOT_API_KEY",
219+
}
220+
221+
llm_model_instance = MoonshotChat(**llm_instance_config)
222+
graph_config = {
223+
"llm": {
224+
"model_instance": llm_model_instance,
225+
"model_tokens": 5000
226+
},
227+
}
228+

examples/anthropic/csv_scraper_haiku.py renamed to examples/anthropic/csv_scraper_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
graph_config = {
3333
"llm": {
3434
"api_key": os.getenv("ANTHROPIC_API_KEY"),
35-
"model": "claude-3-haiku-20240307",
35+
"model": "anthropic/claude-3-haiku-20240307",
3636
"max_tokens": 4000
3737
},
3838
}

examples/anthropic/csv_scraper_graph_multi_haiku.py renamed to examples/anthropic/csv_scraper_graph_multi_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
graph_config = {
2727
"llm": {
2828
"api_key": os.getenv("ANTHROPIC_API_KEY"),
29-
"model": "claude-3-haiku-20240307",
29+
"model": "anthropic/claude-3-haiku-20240307",
3030
"max_tokens": 4000},
3131
}
3232

examples/anthropic/custom_graph_haiku.py renamed to examples/anthropic/custom_graph_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
graph_config = {
1919
"llm": {
2020
"api_key": os.getenv("ANTHROPIC_API_KEY"),
21-
"model": "claude-3-haiku-20240307",
21+
"model": "anthropic/claude-3-haiku-20240307",
2222
"max_tokens": 4000
2323
},
2424
}

examples/anthropic/json_scraper_haiku.py renamed to examples/anthropic/json_scraper_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
graph_config = {
2727
"llm": {
2828
"api_key": os.getenv("ANTHROPIC_API_KEY"),
29-
"model": "claude-3-haiku-20240307",
29+
"model": "anthropic/claude-3-haiku-20240307",
3030
"max_tokens": 4000
3131
},
3232
}

examples/anthropic/json_scraper_multi_haiku.py renamed to examples/anthropic/json_scraper_multi_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
graph_config = {
1212
"llm": {
1313
"api_key": os.getenv("ANTHROPIC_API_KEY"),
14-
"model": "claude-3-haiku-20240307",
14+
"model": "anthropic/claude-3-haiku-20240307",
1515
"max_tokens": 4000
1616
},
1717
}

examples/anthropic/pdf_scraper_graph_haiku.py renamed to examples/anthropic/pdf_scraper_graph_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
graph_config = {
1515
"llm": {
1616
"api_key": os.getenv("ANTHROPIC_API_KEY"),
17-
"model": "claude-3-haiku-20240307",
17+
"model": "anthropic/claude-3-haiku-20240307",
1818
"max_tokens": 4000
1919
},
2020
}

examples/anthropic/pdf_scraper_multi_haiku.py renamed to examples/anthropic/pdf_scraper_multi_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
graph_config = {
1212
"llm": {
1313
"api_key": os.getenv("ANTHROPIC_API_KEY"),
14-
"model": "claude-3-haiku-20240307",
14+
"model": "anthropic/claude-3-haiku-20240307",
1515
"max_tokens": 4000
1616
},
1717
}

examples/anthropic/scrape_plain_text_haiku.py renamed to examples/anthropic/scrape_plain_text_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
graph_config = {
2929
"llm": {
3030
"api_key": os.getenv("ANTHROPIC_API_KEY"),
31-
"model": "claude-3-haiku-20240307",
31+
"model": "anthropic/claude-3-haiku-20240307",
3232
"max_tokens": 4000
3333
},
3434
}

examples/anthropic/script_generator_haiku.py renamed to examples/anthropic/script_generator_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
graph_config = {
1717
"llm": {
1818
"api_key": os.getenv("ANTHROPIC_API_KEY"),
19-
"model": "claude-3-haiku-20240307",
19+
"model": "anthropic/claude-3-haiku-20240307",
2020
"max_tokens": 4000
2121
},
2222
}

examples/anthropic/script_multi_generator_haiku.py renamed to examples/anthropic/script_multi_generator_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
graph_config = {
1717
"llm": {
1818
"api_key": os.getenv("ANTHROPIC_API_KEY"),
19-
"model": "claude-3-haiku-20240307",
19+
"model": "anthropic/claude-3-haiku-20240307",
2020
"max_tokens": 4000
2121
},
2222
"library": "beautifulsoup"

examples/anthropic/search_graph_haiku.py renamed to examples/anthropic/search_graph_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
graph_config = {
1616
"llm": {
1717
"api_key": os.getenv("ANTHROPIC_API_KEY"),
18-
"model": "claude-3-haiku-20240307",
18+
"model": "anthropic/claude-3-haiku-20240307",
1919
"max_tokens": 4000
2020
},
2121
}

examples/anthropic/search_graph_schema_haiku.py renamed to examples/anthropic/search_graph_schema_anthropic.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ class Dishes(BaseModel):
2727
graph_config = {
2828
"llm": {
2929
"api_key": os.getenv("ANTHROPIC_API_KEY"),
30-
"model": "claude-3-haiku-20240307",
31-
"max_tokens": 4000},
30+
"model": "anthropic/claude-3-haiku-20240307",
31+
"max_tokens": 4000
32+
},
3233
}
3334

3435
# ************************************************

examples/anthropic/search_link_graph_haiku.py renamed to examples/anthropic/search_link_graph_anthropic.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,11 @@
2929
# ************************************************
3030

3131
graph_config = {
32-
"llm": {"model_instance": llm_model_instance},
33-
"embeddings": {"model_instance": embedder_model_instance}
32+
"llm": {
33+
"api_key": os.getenv("ANTHROPIC_API_KEY"),
34+
"model": "anthropic/claude-3-haiku-20240307",
35+
"max_tokens": 4000
36+
},
3437
}
3538

3639
# ************************************************

examples/anthropic/smart_scraper_haiku.py renamed to examples/anthropic/smart_scraper_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
graph_config = {
2020
"llm": {
2121
"api_key": os.getenv("ANTHROPIC_API_KEY"),
22-
"model": "claude-3-haiku-20240307",
22+
"model": "anthropic/claude-3-haiku-20240307",
2323
"max_tokens": 4000
2424
},
2525
}

examples/anthropic/smart_scraper_multi_haiku.py renamed to examples/anthropic/smart_scraper_multi_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
graph_config = {
1818
"llm": {
1919
"api_key": os.getenv("ANTHROPIC_API_KEY"),
20-
"model": "claude-3-haiku-20240307",
20+
"model": "anthropic/claude-3-haiku-20240307",
2121
"max_tokens": 4000
2222
},
2323
}

examples/anthropic/smart_scraper_schema_haiku.py renamed to examples/anthropic/smart_scraper_schema_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class Projects(BaseModel):
3333
graph_config = {
3434
"llm": {
3535
"api_key": os.getenv("ANTHROPIC_API_KEY"),
36-
"model": "claude-3-haiku-20240307",
36+
"model": "anthropic/claude-3-haiku-20240307",
3737
"max_tokens": 4000},
3838
}
3939

examples/anthropic/xml_scraper_haiku.py renamed to examples/anthropic/xml_scraper_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
graph_config = {
2727
"llm": {
2828
"api_key": os.getenv("ANTHROPIC_API_KEY"),
29-
"model": "claude-3-haiku-20240307",
29+
"model": "anthropic/claude-3-haiku-20240307",
3030
"max_tokens": 4000
3131
},
3232
}

examples/anthropic/xml_scraper_graph_multi_haiku.py renamed to examples/anthropic/xml_scraper_graph_multi_anthropic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
graph_config = {
2727
"llm": {
2828
"api_key": os.getenv("ANTHROPIC_API_KEY"),
29-
"model": "claude-3-haiku-20240307",
29+
"model": "anthropic/claude-3-haiku-20240307",
3030
"max_tokens": 4000},
3131
}
3232

examples/model_instance/.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
MOONLIGHT_API_KEY="YOUR MOONLIGHT API KEY"
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper and model_instace
3+
"""
4+
5+
import os, json
6+
from scrapegraphai.graphs import SmartScraperGraph
7+
from scrapegraphai.utils import prettify_exec_info
8+
from langchain_community.chat_models.moonshot import MoonshotChat
9+
from dotenv import load_dotenv
10+
load_dotenv()
11+
12+
# ************************************************
13+
# Define the configuration for the graph
14+
# ************************************************
15+
16+
17+
llm_instance_config = {
18+
"model": "moonshot-v1-8k",
19+
"base_url": "https://api.moonshot.cn/v1",
20+
"moonshot_api_key": os.getenv("MOONLIGHT_API_KEY"),
21+
}
22+
23+
24+
llm_model_instance = MoonshotChat(**llm_instance_config)
25+
26+
graph_config = {
27+
"llm": {
28+
"model_instance": llm_model_instance,
29+
"model_tokens": 10000
30+
},
31+
"verbose": True,
32+
"headless": True,
33+
}
34+
35+
# ************************************************
36+
# Create the SmartScraperGraph instance and run it
37+
# ************************************************
38+
39+
smart_scraper_graph = SmartScraperGraph(
40+
prompt="List me what does the company do, the name and a contact email.",
41+
source="https://scrapegraphai.com/",
42+
config=graph_config
43+
)
44+
45+
result = smart_scraper_graph.run()
46+
print(json.dumps(result, indent=4))
47+
48+
# ************************************************
49+
# Get graph execution info
50+
# ************************************************
51+
52+
graph_exec_info = smart_scraper_graph.get_execution_info()
53+
print(prettify_exec_info(graph_exec_info))

examples/moonshot/.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
MOONLIGHT_API_KEY="YOUR MOONLIGHT API KEY"

examples/moonshot/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This folder offer an example of how to use ScrapeGraph-AI with Moonshot and SmartScraperGraph. More usage examples can refer to openai exapmles.

0 commit comments

Comments
 (0)