Skip to content

Async change #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7b88876
docs: added api reference
PeriniM Dec 16, 2024
6929a7a
docs: added api reference
PeriniM Dec 16, 2024
855c2e5
docs: api reference
PeriniM Dec 16, 2024
320de37
docs: github trending sdk
PeriniM Dec 17, 2024
c2fc1ef
docs: added open in colab badge
PeriniM Dec 17, 2024
7fad92c
docs: added zillow example
PeriniM Dec 18, 2024
3567034
chore: fix pyproject version
f-aguzzi Dec 18, 2024
f67769e
docs(cookbook): added two new examples
PeriniM Dec 18, 2024
479dbdb
docs: added langchain-scrapegraph examples
PeriniM Dec 19, 2024
8f3a87e
docs: added two langchain-scrapegraph examples
PeriniM Dec 19, 2024
ca8de3e
refactoring examples
VinciGit00 Dec 21, 2024
ec22ac8
refactoring images
VinciGit00 Dec 21, 2024
9f1e0cf
docs: added wired langgraph react agent
PeriniM Dec 21, 2024
e1bfd6a
docs: link typo
PeriniM Dec 21, 2024
e68c1bd
docs: added cookbook reference
PeriniM Dec 21, 2024
b353876
add examples
VinciGit00 Dec 21, 2024
a4fe204
Merge branch 'main' of https://github.com/ScrapeGraphAI/scrapegraph-sdk
VinciGit00 Dec 21, 2024
6e06afa
docs: research agent
PeriniM Dec 22, 2024
c02c411
feat: update doc readme
VinciGit00 Dec 23, 2024
c596c44
fix: houses examples and typos
lurenss Dec 27, 2024
1d0cb46
docs: updated new documentation urls
PeriniM Dec 27, 2024
dec1548
Update scrapegraph_llama_index.ipynb
VinciGit00 Dec 27, 2024
f860167
docs: fixed cookbook images and urls
PeriniM Dec 28, 2024
5fa2b42
docs: added two new examples
PeriniM Dec 29, 2024
cb05e8a
Update README.md
VinciGit00 Jan 3, 2025
6de5eb2
docs: llama-index @VinciGit00
PeriniM Jan 3, 2025
945b876
feat: add time varying timeout
VinciGit00 Jan 4, 2025
433e7d0
Update uv.lock
VinciGit00 Jan 4, 2025
701a4c1
chore: fix _make_request not using it
PeriniM Jan 8, 2025
49b8e4b
fix: make timeout optional
PeriniM Jan 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.env
# Ignore .DS_Store files anywhere in the repository
.DS_Store
**/.DS_Store
*.csv
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,21 @@
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python SDK](https://img.shields.io/badge/Python_SDK-Latest-blue)](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py)
[![JavaScript SDK](https://img.shields.io/badge/JavaScript_SDK-Latest-yellow)](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js)
[![Documentation](https://img.shields.io/badge/Documentation-Latest-green)](https://scrapegraphai.com/docs)
[![Documentation](https://img.shields.io/badge/Documentation-Latest-green)](https://docs.scrapegraphai.com)

<p align="left">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official SDKs for the ScrapeGraph AI API - Intelligent web scraping powered by AI. Extract structured data from any webpage with natural language prompts.

The credits can be bougth [here](https://scrapegraphai.com)!
Get your [API key](https://scrapegraphai.com)!

## 🚀 Quick Links

- [Python SDK Documentation](scrapegraph-py/README.md)
- [JavaScript SDK Documentation](scrapegraph-js/README.md)
- [API Documentation](https://scrapegraphai.com/docs)
- [API Documentation](https://docs.scrapegraphai.com)
- [Website](https://scrapegraphai.com)

## 📦 Installation
Expand Down Expand Up @@ -69,7 +73,7 @@ Extract information from a local HTML file using AI.
For detailed documentation and examples, visit:
- [Python SDK Guide](scrapegraph-py/README.md)
- [JavaScript SDK Guide](scrapegraph-js/README.md)
- [API Documentation](https://scrapegraphai.com/docs)
- [API Documentation](https://docs.scrapegraphai.com)

## 💬 Support & Feedback

Expand Down
687 changes: 687 additions & 0 deletions cookbook/chat-webpage-simple-rag/scrapegraph_burr_lancedb.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/company-info/scrapegraph_langchain.ipynb

Large diffs are not rendered by default.

1,807 changes: 1,807 additions & 0 deletions cookbook/company-info/scrapegraph_llama_index.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/company-info/scrapegraph_sdk.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/github-trending/scrapegraph_langchain.ipynb

Large diffs are not rendered by default.

999 changes: 999 additions & 0 deletions cookbook/github-trending/scrapegraph_llama_index.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/github-trending/scrapegraph_sdk.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/homes-forsale/scrapegraph_langchain.ipynb

Large diffs are not rendered by default.

799 changes: 799 additions & 0 deletions cookbook/homes-forsale/scrapegraph_llama_index.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/homes-forsale/scrapegraph_sdk.ipynb

Large diffs are not rendered by default.

1,302 changes: 1,302 additions & 0 deletions cookbook/research-agent/scrapegraph_langgraph_tavily.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/wired-news/scrapegraph_langchain.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/wired-news/scrapegraph_langgraph.ipynb

Large diffs are not rendered by default.

1,438 changes: 1,438 additions & 0 deletions cookbook/wired-news/scrapegraph_llama_index.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions cookbook/wired-news/scrapegraph_sdk.ipynb

Large diffs are not rendered by default.

11 changes: 6 additions & 5 deletions scrapegraph-js/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# 🌐 ScrapeGraph JavaScript SDK

[![npm version](https://badge.fury.io/js/scrapegraph-js.svg)](https://badge.fury.io/js/scrapegraph-js)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/ScrapeGraphAI/scrapegraph-sdk/actions/workflows/ci.yml/badge.svg)](https://github.com/ScrapeGraphAI/scrapegraph-sdk/actions)
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://docs.scrapegraphai.com)
[![npm version](https://badge.fury.io/js/scrapegraph-js.svg)](https://badge.fury.io/js/scrapegraph-js) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://docs.scrapegraphai.com)

<p align="left">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official JavaScript/TypeScript SDK for the ScrapeGraph AI API - Smart web scraping powered by AI.

Expand Down Expand Up @@ -215,7 +216,7 @@ Contributions are welcome! Please feel free to submit a Pull Request. For major
## 🔗 Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://scrapegraphai.com/documentation)
- [Documentation](https://docs.scrapegraphai.com)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

## 💬 Support
Expand Down
9 changes: 9 additions & 0 deletions scrapegraph-js/cookbook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## 📚 Official Cookbook

Looking for examples and guides? Then head over to the official ScrapeGraph SDK [Cookbook](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/cookbook)!

The cookbook provides step-by-step instructions, practical examples, and tips to help you get started and make the most out of ScrapeGraph SDK.

You will find some colab notebooks with our partners as well, including Langchain 🦜 and LlamaIndex 🦙

Happy scraping! 🚀
12 changes: 8 additions & 4 deletions scrapegraph-py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,13 @@
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://scrapegraph-py.readthedocs.io/en/latest/?badge=latest)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)

Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
<p align="left">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official [Python SDK ](https://scrapegraphai.com) for the ScrapeGraph API - Smart web scraping powered by AI.

## 📦 Installation

Expand Down Expand Up @@ -142,7 +146,7 @@ asyncio.run(main())

## 📖 Documentation

For detailed documentation, visit [scrapegraphai.com/docs](https://scrapegraphai.com/docs)
For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)

## 🛠️ Development

Expand Down Expand Up @@ -173,7 +177,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
## 🔗 Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://scrapegraphai.com/docs)
- [Documentation](https://docs.scrapegraphai.com)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

---
Expand Down
9 changes: 9 additions & 0 deletions scrapegraph-py/cookbook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## 📚 Official Cookbook

Looking for examples and guides? Then head over to the official ScrapeGraph SDK [Cookbook](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/cookbook)!

The cookbook provides step-by-step instructions, practical examples, and tips to help you get started and make the most out of ScrapeGraph SDK.

You will find some colab notebooks with our partners as well, such as Langchain 🦜 and LlamaIndex 🦙

Happy scraping! 🚀
2 changes: 1 addition & 1 deletion scrapegraph-py/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "scrapegraph_py"
version = "0.0.3"
version = "1.8.0"
description = "ScrapeGraph Python SDK for API"
authors = [
{ name = "Marco Vinciguerra", email = "[email protected]" },
Expand Down
171 changes: 81 additions & 90 deletions scrapegraph-py/scrapegraph_py/async_client.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import asyncio
from typing import Any, Optional

import aiohttp
from aiohttp import ClientSession, ClientTimeout, TCPConnector
from aiohttp.client_exceptions import ClientError
from pydantic import BaseModel
Expand All @@ -27,15 +26,15 @@ class AsyncClient:
def from_env(
cls,
verify_ssl: bool = True,
timeout: float = 120,
timeout: Optional[float] = None,
max_retries: int = 3,
retry_delay: float = 1.0,
):
"""Initialize AsyncClient using API key from environment variable.

Args:
verify_ssl: Whether to verify SSL certificates
timeout: Request timeout in seconds
timeout: Request timeout in seconds. None means no timeout (infinite)
max_retries: Maximum number of retry attempts
retry_delay: Delay between retries in seconds
"""
Expand All @@ -56,7 +55,7 @@ def __init__(
self,
api_key: str = None,
verify_ssl: bool = True,
timeout: float = 120,
timeout: Optional[float] = None,
max_retries: int = 3,
retry_delay: float = 1.0,
):
Expand All @@ -65,7 +64,7 @@ def __init__(
Args:
api_key: API key for authentication. If None, will try to load from environment
verify_ssl: Whether to verify SSL certificates
timeout: Request timeout in seconds
timeout: Request timeout in seconds. None means no timeout (infinite)
max_retries: Maximum number of retry attempts
retry_delay: Delay between retries in seconds
"""
Expand All @@ -91,7 +90,7 @@ def __init__(
self.retry_delay = retry_delay

ssl = None if verify_ssl else False
self.timeout = ClientTimeout(total=timeout)
self.timeout = ClientTimeout(total=timeout) if timeout is not None else None

self.session = ClientSession(
headers=self.headers, connector=TCPConnector(ssl=ssl), timeout=self.timeout
Expand Down Expand Up @@ -137,6 +136,33 @@ async def _make_request(self, method: str, url: str, **kwargs) -> Any:
logger.info(f"⏳ Waiting {retry_delay}s before retry {attempt + 2}")
await asyncio.sleep(retry_delay)

async def markdownify(self, website_url: str):
"""Send a markdownify request"""
logger.info(f"🔍 Starting markdownify request for {website_url}")

request = MarkdownifyRequest(website_url=website_url)
logger.debug("✅ Request validation passed")

result = await self._make_request(
"POST", f"{API_BASE_URL}/markdownify", json=request.model_dump()
)
logger.info("✨ Markdownify request completed successfully")
return result

async def get_markdownify(self, request_id: str):
"""Get the result of a previous markdownify request"""
logger.info(f"🔍 Fetching markdownify result for request {request_id}")

# Validate input using Pydantic model
GetMarkdownifyRequest(request_id=request_id)
logger.debug("✅ Request ID validation passed")

result = await self._make_request(
"GET", f"{API_BASE_URL}/markdownify/{request_id}"
)
logger.info(f"✨ Successfully retrieved result for request {request_id}")
return result

async def smartscraper(
self,
website_url: str,
Expand All @@ -154,17 +180,11 @@ async def smartscraper(
)
logger.debug("✅ Request validation passed")

try:
async with self.session.post(
f"{API_BASE_URL}/smartscraper", json=request.model_dump()
) as response:
response.raise_for_status()
result = await handle_async_response(response)
logger.info("✨ Smartscraper request completed successfully")
return result
except aiohttp.ClientError as e:
logger.error(f"❌ Smartscraper request failed: {str(e)}")
raise ConnectionError(f"Failed to connect to API: {str(e)}")
result = await self._make_request(
"POST", f"{API_BASE_URL}/smartscraper", json=request.model_dump()
)
logger.info("✨ Smartscraper request completed successfully")
return result

async def get_smartscraper(self, request_id: str):
"""Get the result of a previous smartscraper request"""
Expand All @@ -174,80 +194,8 @@ async def get_smartscraper(self, request_id: str):
GetSmartScraperRequest(request_id=request_id)
logger.debug("✅ Request ID validation passed")

async with self.session.get(
f"{API_BASE_URL}/smartscraper/{request_id}"
) as response:
result = await handle_async_response(response)
logger.info(f"✨ Successfully retrieved result for request {request_id}")
return result

async def get_credits(self):
"""Get credits information"""
logger.info("💳 Fetching credits information")

async with self.session.get(
f"{API_BASE_URL}/credits",
) as response:
result = await handle_async_response(response)
logger.info(
f"✨ Credits info retrieved: {result.get('remaining_credits')} credits remaining"
)
return result

async def submit_feedback(
self, request_id: str, rating: int, feedback_text: Optional[str] = None
):
"""Submit feedback for a request"""
logger.info(f"📝 Submitting feedback for request {request_id}")
logger.debug(f"⭐ Rating: {rating}, Feedback: {feedback_text}")

feedback = FeedbackRequest(
request_id=request_id, rating=rating, feedback_text=feedback_text
)
logger.debug("✅ Feedback validation passed")

async with self.session.post(
f"{API_BASE_URL}/feedback", json=feedback.model_dump()
) as response:
result = await handle_async_response(response)
logger.info("✨ Feedback submitted successfully")
return result

async def close(self):
"""Close the session to free up resources"""
logger.info("🔒 Closing AsyncClient session")
await self.session.close()
logger.debug("✅ Session closed successfully")

async def __aenter__(self):
return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()

async def markdownify(self, website_url: str):
"""Send a markdownify request"""
logger.info(f"🔍 Starting markdownify request for {website_url}")

request = MarkdownifyRequest(website_url=website_url)
logger.debug("✅ Request validation passed")

result = await self._make_request(
"POST", f"{API_BASE_URL}/markdownify", json=request.model_dump()
)
logger.info("✨ Markdownify request completed successfully")
return result

async def get_markdownify(self, request_id: str):
"""Get the result of a previous markdownify request"""
logger.info(f"🔍 Fetching markdownify result for request {request_id}")

# Validate input using Pydantic model
GetMarkdownifyRequest(request_id=request_id)
logger.debug("✅ Request ID validation passed")

result = await self._make_request(
"GET", f"{API_BASE_URL}/markdownify/{request_id}"
"GET", f"{API_BASE_URL}/smartscraper/{request_id}"
)
logger.info(f"✨ Successfully retrieved result for request {request_id}")
return result
Expand Down Expand Up @@ -288,3 +236,46 @@ async def get_localscraper(self, request_id: str):
)
logger.info(f"✨ Successfully retrieved result for request {request_id}")
return result

async def submit_feedback(
self, request_id: str, rating: int, feedback_text: Optional[str] = None
):
"""Submit feedback for a request"""
logger.info(f"📝 Submitting feedback for request {request_id}")
logger.debug(f"⭐ Rating: {rating}, Feedback: {feedback_text}")

feedback = FeedbackRequest(
request_id=request_id, rating=rating, feedback_text=feedback_text
)
logger.debug("✅ Feedback validation passed")

result = await self._make_request(
"POST", f"{API_BASE_URL}/feedback", json=feedback.model_dump()
)
logger.info("✨ Feedback submitted successfully")
return result

async def get_credits(self):
"""Get credits information"""
logger.info("💳 Fetching credits information")

result = await self._make_request(
"GET",
f"{API_BASE_URL}/credits",
)
logger.info(
f"✨ Credits info retrieved: {result.get('remaining_credits')} credits remaining"
)
return result

async def close(self):
"""Close the session to free up resources"""
logger.info("🔒 Closing AsyncClient session")
await self.session.close()
logger.debug("✅ Session closed successfully")

async def __aenter__(self):
return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()
Loading
Loading