Skip to content

Added Markdownify and Localscraper #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions scrapegraph-py/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## [1.7.0-beta.1](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.6.0...v1.7.0-beta.1) (2024-12-05)


### Features

* add markdownify and localscraper ([6296510](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/6296510b22ce511adde4265532ac6329a05967e0))

## [1.6.0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.5.0...v1.6.0) (2024-12-05)


Expand Down
31 changes: 28 additions & 3 deletions scrapegraph-py/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,36 @@ Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome con

## Getting Started

To get started with contributing, follow these steps:
### Development Setup

1. Fork the repository on GitHub **(FROM pre/beta branch)**.
2. Clone your forked repository to your local machine.
3. Install the necessary dependencies from requirements.txt or via pyproject.toml as you prefere :).
2. Clone your forked repository:
```bash
git clone https://github.com/ScrapeGraphAI/scrapegraph-sdk.git
cd scrapegraph-sdk/scrapegraph-py
```

3. Install dependencies using uv (recommended):
```bash
# Install uv if you haven't already
pip install uv

# Install dependencies
uv sync

# Install pre-commit hooks
uv run pre-commit install
```

4. Run tests:
```bash
# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_client.py
```

4. Make your changes or additions.
5. Test your changes thoroughly.
6. Commit your changes with descriptive commit messages.
Expand Down
203 changes: 107 additions & 96 deletions scrapegraph-py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,164 +6,175 @@
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://scrapegraph-py.readthedocs.io/en/latest/?badge=latest)

Official Python SDK for the ScrapeGraph AI API - Smart web scraping powered by AI.

## 🚀 Features

- ✨ Smart web scraping with AI
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging with emojis
- ⚡ Automatic retries and error handling
- 🔐 Secure API authentication
Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.

## 📦 Installation

### Using pip

```
```bash
pip install scrapegraph-py
```

### Using uv
## 🚀 Features

We recommend using [uv](https://docs.astral.sh/uv/) to install the dependencies and pre-commit hooks.
- 🤖 AI-powered web scraping
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
- ⚡ Automatic retries
- 🔐 Secure authentication

```
# Install uv if you haven't already
pip install uv
## 🎯 Quick Start

# Install dependencies
uv sync
```python
from scrapegraph_py import Client

# Install pre-commit hooks
uv run pre-commit install
client = Client(api_key="your-api-key-here")
```

## 🔧 Quick Start

> [!NOTE]
> If you prefer, you can use the environment variables to configure the API key and load them using `load_dotenv()`
> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`

```python
from scrapegraph_py import SyncClient
from scrapegraph_py.logger import get_logger
## 📚 Available Endpoints

### 🔍 SmartScraper

# Enable debug logging
logger = get_logger(level="DEBUG")
Scrapes any webpage using AI to extract specific information.

```python
from scrapegraph_py import Client

# Initialize client
sgai_client = SyncClient(api_key="your-api-key-here")
client = Client(api_key="your-api-key-here")

# Make a request
response = sgai_client.smartscraper(
# Basic usage
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)

print(response["result"])
```

## 🎯 Examples

### Async Usage

```python
import asyncio
from scrapegraph_py import AsyncClient

async def main():
async with AsyncClient(api_key="your-api-key-here") as sgai_client:
response = await sgai_client.smartscraper(
website_url="https://example.com",
user_prompt="Summarize the main content"
)
print(response["result"])

asyncio.run(main())
print(response)
```

<details>
<summary><b>With Output Schema</b></summary>
<summary>Output Schema (Optional)</summary>

```python
from pydantic import BaseModel, Field
from scrapegraph_py import SyncClient
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

class WebsiteData(BaseModel):
title: str = Field(description="The page title")
description: str = Field(description="The meta description")

sgai_client = SyncClient(api_key="your-api-key-here")
response = sgai_client.smartscraper(
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the title and description",
output_schema=WebsiteData
)

print(response["result"])
```

</details>

## 📚 Documentation
### 📝 Markdownify

For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)
Converts any webpage into clean, formatted markdown.

## 🛠️ Development
```python
from scrapegraph_py import Client

### Setup
client = Client(api_key="your-api-key-here")

1. Clone the repository:
```
git clone https://github.com/ScrapeGraphAI/scrapegraph-sdk.git
cd scrapegraph-sdk/scrapegraph-py
```
response = client.markdownify(
website_url="https://example.com"
)

2. Install dependencies:
```
uv sync
print(response)
```

3. Install pre-commit hooks:
```
uv run pre-commit install
```
### 💻 LocalScraper

### Running Tests
Extracts information from HTML content using AI.

```
# Run all tests
uv run pytest
```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: [email protected]</p>
</div>
</body>
</html>
"""

response = client.localscraper(
user_prompt="Extract the company description",
website_html=html_content
)

# Run specific test file
poetry run pytest tests/test_client.py
print(response)
```

## 📝 License
## ⚡ Async Support

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
All endpoints support async operations:

```python
import asyncio
from scrapegraph_py import AsyncClient

## 🤝 Contributing
async def main():
async with AsyncClient() as client:
response = await client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main content"
)
print(response)

asyncio.run(main())
```

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
## 📖 Documentation

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
For detailed documentation, visit [scrapegraphai.com/docs](https://scrapegraphai.com/docs)

## 🔗 Links
## 🛠️ Development

- [Website](https://scrapegraphai.com)
- [Documentation](https://scrapegraphai.com/documentation)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
For information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).

## 💬 Support
## 💬 Support & Feedback

- 📧 Email: [email protected]
- 💻 GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)
- 🌟 Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)
- ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

client.submit_feedback(
request_id="your-request-id",
rating=5,
feedback_text="Great results!"
)
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://scrapegraphai.com/docs)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

---

Expand Down
37 changes: 37 additions & 0 deletions scrapegraph-py/examples/async_markdownify_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import asyncio

from scrapegraph_py import AsyncClient
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")


async def main():
# Initialize async client
sgai_client = AsyncClient(api_key="your-api-key-here")

# Concurrent markdownify requests
urls = [
"https://scrapegraphai.com/",
"https://github.com/ScrapeGraphAI/Scrapegraph-ai",
]

tasks = [sgai_client.markdownify(website_url=url) for url in urls]

# Execute requests concurrently
responses = await asyncio.gather(*tasks, return_exceptions=True)

# Process results
for i, response in enumerate(responses):
if isinstance(response, Exception):
print(f"\nError for {urls[i]}: {response}")
else:
print(f"\nPage {i+1} Markdown:")
print(f"URL: {urls[i]}")
print(f"Result: {response['result']}")

await sgai_client.close()


if __name__ == "__main__":
asyncio.run(main())
31 changes: 31 additions & 0 deletions scrapegraph-py/examples/localscraper_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")

# Initialize the client
sgai_client = Client(api_key="your-api-key-here")

# Example HTML content
html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: [email protected]</p>
<p>Phone: (555) 123-4567</p>
</div>
</body>
</html>
"""

# LocalScraper request
response = sgai_client.localscraper(
user_prompt="Extract the company description and contact information",
website_html=html_content,
)

# Print the response
print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")
Loading