You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -48,11 +48,16 @@ The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.r
48
48
Check out also the docusaurus [documentation](https://scrapegraph-doc.onrender.com/).
49
49
50
50
## 💻 Usage
51
-
You can use the `SmartScraper` class to extract information from a website using a prompt.
51
+
There are three main scraping pipelines that can be used to extract information from a website (or local file):
52
+
-`SmartScraperGraph`: single-page scraper that only needs a user prompt and an input source;
53
+
-`SearchGraph`: multi-page scraper that extracts information from the top n search results of a search engine;
54
+
-`SpeechGraph`: single-page scraper that extracts information from a website and generates an audio file.
52
55
53
-
The `SmartScraper` class is a direct graph implementation that uses the most common nodes present in a web scraping pipeline. For more information, please see the [documentation](https://scrapegraph-ai.readthedocs.io/en/latest/).
54
-
### Case 1: Extracting information using Ollama
55
-
Remember to download the model on Ollama separately!
56
+
It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.
57
+
58
+
### Case 1: SmartScraper using Local Models
59
+
60
+
Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command.
56
61
57
62
```python
58
63
from scrapegraphai.graphs import SmartScraperGraph
@@ -67,11 +72,12 @@ graph_config = {
67
72
"embeddings": {
68
73
"model": "ollama/nomic-embed-text",
69
74
"base_url": "http://localhost:11434", # set Ollama URL
70
-
}
75
+
},
76
+
"verbose": True,
71
77
}
72
78
73
79
smart_scraper_graph = SmartScraperGraph(
74
-
prompt="List me all the articles",
80
+
prompt="List me all the projects with their descriptions",
75
81
# also accepts a string with the already downloaded HTML code
76
82
source="https://perinim.github.io/projects",
77
83
config=graph_config
@@ -82,159 +88,77 @@ print(result)
82
88
83
89
```
84
90
85
-
### Case 2: Extracting information using Docker
91
+
The output will be a list of projects with their descriptions like the following:
86
92
87
-
Note: before using the local model remember to create the docker container!
You can use which models avaiable on Ollama or your own model instead of stablelm-zephyr
93
93
```python
94
-
from scrapegraphai.graphs import SmartScraperGraph
95
-
96
-
graph_config = {
97
-
"llm": {
98
-
"model": "ollama/mistral",
99
-
"temperature": 0,
100
-
"format": "json", # Ollama needs the format to be specified explicitly
101
-
# "model_tokens": 2000, # set context length arbitrarily
102
-
},
103
-
}
104
-
105
-
smart_scraper_graph = SmartScraperGraph(
106
-
prompt="List me all the articles",
107
-
# also accepts a string with the already downloaded HTML code
108
-
source="https://perinim.github.io/projects",
109
-
config=graph_config
110
-
)
111
-
112
-
result = smart_scraper_graph.run()
113
-
print(result)
94
+
{'projects': [{'title': 'Rotary Pendulum RL', 'description': 'Open Source project aimed at controlling a real life rotary pendulum using RL algorithms'}, {'title': 'DQN Implementation from scratch', 'description': 'Developed a Deep Q-Network algorithm to train a simple and double pendulum'}, ...]}
114
95
```
115
96
97
+
### Case 2: SearchGraph using Mixed Models
116
98
117
-
### Case 3: Extracting information using Openai model
118
-
```python
119
-
from scrapegraphai.graphs import SmartScraperGraph
120
-
OPENAI_API_KEY="YOUR_API_KEY"
121
-
122
-
graph_config = {
123
-
"llm": {
124
-
"api_key": OPENAI_API_KEY,
125
-
"model": "gpt-3.5-turbo",
126
-
},
127
-
}
99
+
We use **Groq** for the LLM and **Ollama** for the embeddings.
128
100
129
-
smart_scraper_graph = SmartScraperGraph(
130
-
prompt="List me all the articles",
131
-
# also accepts a string with the already downloaded HTML code
132
-
source="https://perinim.github.io/projects",
133
-
config=graph_config
134
-
)
135
-
136
-
result = smart_scraper_graph.run()
137
-
print(result)
138
-
```
139
-
140
-
### Case 4: Extracting information using Groq
141
101
```python
142
-
from scrapegraphai.graphs import SmartScraperGraph
143
-
from scrapegraphai.utils import prettify_exec_info
144
-
145
-
groq_key = os.getenv("GROQ_APIKEY")
102
+
from scrapegraphai.graphs import SearchGraph
146
103
104
+
# Define the configuration for the graph
147
105
graph_config = {
148
106
"llm": {
149
107
"model": "groq/gemma-7b-it",
150
-
"api_key": groq_key,
108
+
"api_key": "GROQ_API_KEY",
151
109
"temperature": 0
152
110
},
153
111
"embeddings": {
154
112
"model": "ollama/nomic-embed-text",
155
-
"temperature": 0,
156
-
"base_url": "http://localhost:11434",
113
+
"base_url": "http://localhost:11434", # set ollama URL arbitrarily
157
114
},
158
-
"headless": False
115
+
"max_results": 5,
159
116
}
160
117
161
-
smart_scraper_graph = SmartScraperGraph(
162
-
prompt="List me all the projects with their description and the author.",
163
-
source="https://perinim.github.io/projects",
118
+
# Create the SearchGraph instance
119
+
search_graph = SearchGraph(
120
+
prompt="List me all the traditional recipes from Chioggia",
164
121
config=graph_config
165
122
)
166
123
167
-
result = smart_scraper_graph.run()
124
+
# Run the graph
125
+
result = search_graph.run()
168
126
print(result)
169
127
```
170
128
129
+
The output will be a list of recipes like the following:
171
130
172
-
### Case 5: Extracting information using Azure
173
131
```python
174
-
from langchain_openai import AzureChatOpenAI
175
-
from langchain_openai import AzureOpenAIEmbeddings
0 commit comments