Skip to content

Refactoring Folders Structure #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
.DS_Store
.DS_Store?
._*
__pycache__/
**/__pycache__/
docs/build/
docs/source/_templates/
.env
venv/
65 changes: 65 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Contributing to AmazScraper

Thank you for your interest in contributing to **AmazScraper**! We welcome contributions from the community to help improve and grow the project. This document outlines the guidelines and steps for contributing.

## Table of Contents
- [Getting Started](#getting-started)
- [Contributing Guidelines](#contributing-guidelines)
- [Code Style](#code-style)
- [Submitting a Pull Request](#submitting-a-pull-request)
- [Reporting Issues](#reporting-issues)
- [License](#license)

## Getting Started
To get started with contributing, follow these steps:

1. Fork the repository on GitHub.
2. Clone your forked repository to your local machine.
3. Install the necessary dependencies.
4. Make your changes or additions.
5. Test your changes thoroughly.
6. Commit your changes with descriptive commit messages.
7. Push your changes to your forked repository.
8. Submit a pull request to the main repository.

## Contributing Guidelines
Please adhere to the following guidelines when contributing to AmazScraper:

- Follow the code style and formatting guidelines specified in the [Code Style](#code-style) section.
- Make sure your changes are well-documented and include any necessary updates to the project's documentation.
- Write clear and concise commit messages that describe the purpose of your changes.
- Be respectful and considerate towards other contributors and maintainers.

## Code Style
Please make sure to format your code accordingly before submitting a pull request.
### Python
- [Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
- [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/style/)

### Arduino
- [Arduino Style Guide for Writing Content](https://docs.arduino.cc/learn/contributions/arduino-writing-style-guide)
- [Arduino Style Guide for Creating Libraries](https://docs.arduino.cc/learn/contributions/arduino-library-style-guide)

### C++
- [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html)

## Submitting a Pull Request
To submit your changes for review, please follow these steps:

1. Ensure that your changes are pushed to your forked repository.
2. Go to the main repository on GitHub and navigate to the "Pull Requests" tab.
3. Click on the "New Pull Request" button.
4. Select your forked repository and the branch containing your changes.
5. Provide a descriptive title and detailed description for your pull request.
6. Reviewers will provide feedback and discuss any necessary changes.
7. Once your pull request is approved, it will be merged into the main repository.

## Reporting Issues
If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository. Provide a clear and detailed description of the problem or suggestion, along with any relevant information or steps to reproduce the issue.

## License
AmazScraper is licensed under the **Apache License 2.0**. See the [LICENSE](LICENSE) file for more information.
By contributing to this project, you agree to license your contributions under the same license.

Can't wait to see your contributions! :smile:
47 changes: 31 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,36 +16,51 @@ Follow the following steps:
1. ```bash
git clone https://github.com/VinciGit00/AmazScraper.git
```
2. ```bash
2. (Optional)
```bash
python -m venv venv
source ./venv/bin/activate
```
4. ```bash
pip install -r requirements.txt
```
3. Go to [https://openai.com](https://openai.com/) and login
4. Now you can access to [https://platform.openai.com/docs/overview](https://platform.openai.com/docs/overview)
5. Create a new API key and copy it
![Screenshot 2024-01-26 alle 17.10.10.png](docs/generate_api_key/step_1.png)
5. Go to [https://openai.com](https://openai.com/) and login
6. Now you can access to [https://platform.openai.com/docs/overview](https://platform.openai.com/docs/overview)
7. Create a new API key and copy it

<img src="docs/generate_api_key/step_1.png" alt="Step 1 Screenshot" width="60%"/>

<img src="docs/generate_api_key/step_2.png" alt="Step 2 Screenshot" width="60%"/>

<img src="docs/generate_api_key/step_3.png" alt="Step 3 Screenshot" width="60%"/>

<img src="docs/generate_api_key/step_4.png" alt="Step 4 Screenshot" width="60%"/>

![Screenshot 2024-01-26 alle 17.10.31.png](docs/generate_api_key/step_2.png)

![Screenshot 2024-01-26 alle 17.10.52.png](docs/generate_api_key/step_3.png)

![Screenshot 2024-01-26 alle 17.11.10.png](docs/generate_api_key/step_4.png)

6. Open the .env file inside main and paste the API key
7. Create a .env file inside the main and paste the API key

```config
API_KEY="your openai.com api key"
```

7. You are ready to go! 🚀

8. You are ready to go! 🚀
9. Try running the examples using:
```bash
python -m examples.html_scraping
```
or
```bash
python -m AmazScraper.examples.html_scraping
```

# Practical use

## Using AmazScraper as a library

```python
from AmazScraper.class_generator import Generator
from AmazScraper.classes.class_generator import Generator

from AmazScraper.getter import get_function, scraper
from AmazScraper.utils.getter import get_function, scraper

values = [
{
Expand All @@ -66,7 +81,7 @@ if __name__ == "__main__":

```python
import sys
from AmazScraper.class_generator import Generator
from AmazScraper.classes.class_generator import Generator

values = [
{
Expand Down
Empty file added __init__.py
Empty file.
Empty file added classes/__init__.py
Empty file.
4 changes: 2 additions & 2 deletions class_generator.py → classes/class_generator.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import os
from dotenv import load_dotenv
from AmazScraper.pydantic_class import _Response
from AmazScraper.class_creator import create_class
from classes.pydantic_class import _Response
from utils.class_creator import create_class
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import Field
Expand Down
2 changes: 1 addition & 1 deletion pydantic_class.py → classes/pydantic_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
from langchain_core.pydantic_v1 import BaseModel, Field

class _Response(BaseModel):
title: str = Field(description='Title of the news')
title: str = Field(description='Title of the items')
54 changes: 54 additions & 0 deletions examples/html_scraping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import sys
from classes.class_generator import Generator

values = [
{
"title": "title",
"type": "str",
"description": "Title of the news"
}
]

# Example using a HTML code
query_info = '''
Given this code extract all the information in a json format about the news.
<article class="c-card__wrapper aem_card_check_wrapper" data-cardindex="0">
<div class="c-card__content">
<h2 class="c-card__title">Booker show with 52 points, whoever has the most games over 50</h2>
<div class="c-card__label-wrapper c-label-wrapper">
<span class="c-label c-label--article-heading">Standings</span>
</div>
<p class="c-card__abstract">The Suns' No. 1 dominated the match won in New Orleans, scoring 52 points. It's about...</p>
<div class="c-card__info">
<time class="c-card__date" datetime="20 gen - 07:54">20 gen - 07:54</time>
<span class="c-card__content-data">
<i class="icon icon--media-outline icon--gallery-outline icon--xxsmall icon--c-neutral">
<svg width="80" height="80" viewBox="0 0 80 80" xmlns="http://www.w3.org/2000/svg" class="icon__svg icon__svg--gallery-outline">
<path d="M26.174 32.174v31.975h44.588V32.174H26.174zm-3.08-9.238h50.747A6.159 6.159 0 0 1 80 29.095v38.134a6.159 6.159 0 0 1-6.159 6.158H23.095a6.159 6.159 0 0 1-6.159-6.158V29.095a6.159 6.159 0 0 1 6.159-6.159zM9.239 55.665a4.619 4.619 0 0 1-9.238 0V16.777C0 10.825 4.825 6 10.777 6H64.08a4.619 4.619 0 1 1 0 9.238H10.777c-.85 0-1.54.69-1.54 1.54v38.887z" fill="currentColor" fill-rule="evenodd"></path>
</svg>
</i>
28 foto
</span>
</div>
</div>
<div class="c-card__img-wrapper">
<figure class="o-aspect-ratio o-aspect-ratio--16-10 ">
<img crossorigin="anonymous" class="c-card__img j-lazyload" alt="Partite con 50+ punti: Booker in Top-20" data-srcset="..." sizes="..." loading="lazy" data-src="...">
<noscript>
<img crossorigin="anonymous" class="c-card__img" alt="Partite con 50+ punti: Booker in Top-20" srcset="..." sizes="..." src="...">
</noscript>
</figure>
<i class="icon icon--media icon--gallery icon--medium icon--c-primary">
<svg width="80" height="80" viewBox="0 0 80 80" xmlns="http://www.w3.org/2000/svg" class="icon__svg icon__svg--gallery">
<path d="M17.005 20.221h60.211c1.538 0 2.784 1.28 2.784 2.858v48.317c0 1.578-1.246 2.858-2.784 2.858H17.005c-1.537 0-2.784-1.28-2.784-2.858V23.079c0-1.578 1.247-2.858 2.784-2.858zM5.873 11.873V60.62a2.937 2.937 0 0 1-5.873 0V11.286A5.286 5.286 0 0 1 5.286 6h61.08a2.937 2.937 0 1 1 0 5.873H5.873z"></path>
</svg>
</i>
</div>
</article>
'''

if __name__ == "__main__":

generator_instance = Generator(values, 0, "gpt-3.5-turbo")

generator_instance.invocation(query_info)
17 changes: 17 additions & 0 deletions examples/values_scraping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from classes.class_generator import Generator

from utils.getter import get_function, scraper

values = [
{
"title": "title",
"type": "str",
"description": "Title of the items"
}
]

if __name__ == "__main__":

generator_instance = Generator(values, 0, "gpt-3.5-turbo")

res = generator_instance.invocation(scraper("https://www.mockupworld.co", 4197))
20 changes: 20 additions & 0 deletions readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.9"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Specify the requirements file
python:
install:
- requirements: requirements.txt
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ langchain_core==0.1.16
langchain_openai==0.0.5
python-dotenv==1.0.1
Requests==2.31.0
pytest==8.0.0
6 changes: 3 additions & 3 deletions tests/test_amaz_scraper.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pytest
from AmazScraper.pydantic_class import Response
from AmazScraper.class_creator import create_class
from classes.pydantic_class import _Response
from utils.class_creator import create_class
from langchain_openai import ChatOpenAI

@pytest.fixture
Expand All @@ -19,5 +19,5 @@ def test_generator_invocation(generator):
def test_response_model():
# Test the Response Pydantic model
response_data = {"title_swebsite": "Test Title"}
response = Response(**response_data)
response = _Response(**response_data)
assert response.title_swebsite == "Test Title"
Empty file added utils/__init__.py
Empty file.
2 changes: 1 addition & 1 deletion class_creator.py → utils/class_creator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ def create_class(data_dict: dict):
global base_script
base_script = base_script + f" {elem['title']}: {elem['type']} = Field(description='{elem['description']}')\n"

with open("AmazScraper/pydantic_class.py", "w") as f:
with open("classes/pydantic_class.py", "w") as f:
f.write(base_script)
File renamed without changes.