Skip to content

Commit 95a978d

Browse files
3.2.0
1 parent 67f56a5 commit 95a978d

37 files changed

+1601
-576
lines changed

.github/workflows/python.yml

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,26 +13,19 @@ jobs:
1313
runs-on: ubuntu-latest
1414
strategy:
1515
matrix:
16-
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
16+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
1717

1818
steps:
19-
- uses: actions/checkout@v3
19+
- uses: actions/checkout@v4
2020
- name: Set up Python ${{ matrix.python-version }}
2121
uses: actions/setup-python@v4
2222
with:
2323
python-version: ${{ matrix.python-version }}
2424
- name: Install dependencies
2525
run: |
2626
python -m pip install --upgrade pip
27-
pip install flake8 pytest
28-
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
29-
if [ -f test-requirements.txt ]; then pip install -r test-requirements.txt; fi
30-
- name: Lint with flake8
31-
run: |
32-
# stop the build if there are Python syntax errors or undefined names
33-
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
34-
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
35-
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
27+
pip install -r requirements.txt
28+
pip install -r test-requirements.txt
3629
- name: Test with pytest
3730
run: |
38-
pytest
31+
pytest --cov={{packageName}}

.gitlab-ci.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,6 @@ stages:
1414
- pip install -r test-requirements.txt
1515
- pytest --cov=webscraping_ai
1616

17-
pytest-3.7:
18-
extends: .pytest
19-
image: python:3.7-alpine
2017
pytest-3.8:
2118
extends: .pytest
2219
image: python:3.8-alpine
@@ -29,3 +26,6 @@ pytest-3.10:
2926
pytest-3.11:
3027
extends: .pytest
3128
image: python:3.11-alpine
29+
pytest-3.12:
30+
extends: .pytest
31+
image: python:3.12-alpine

.openapi-generator/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
7.2.0
1+
7.11.0

.travis.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# ref: https://docs.travis-ci.com/user/languages/python
22
language: python
33
python:
4-
- "3.7"
54
- "3.8"
65
- "3.9"
76
- "3.10"
87
- "3.11"
8+
- "3.12"
99
# uncomment the following if needed
10-
#- "3.11-dev" # 3.11 development branch
10+
#- "3.12-dev" # 3.12 development branch
1111
#- "nightly" # nightly build
1212
# command to install dependencies
1313
install:

README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
# webscraping-ai
2-
WebScraping.AI scraping API provides GPT-powered tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing.
2+
WebScraping.AI scraping API provides LLM-powered tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing.
33

44
This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
55

6-
- API version: 3.1.3
7-
- Package version: 3.1.3
6+
- API version: 3.2.0
7+
- Package version: 3.2.0
8+
- Generator version: 7.11.0
89
- Build package: org.openapitools.codegen.languages.PythonClientCodegen
910
For more information, please visit [https://webscraping.ai](https://webscraping.ai)
1011

1112
## Requirements.
1213

13-
Python 3.7+
14+
Python 3.8+
1415

1516
## Installation & Usage
1617
### pip install
@@ -51,7 +52,6 @@ Please follow the [installation procedure](#installation--usage) and then run th
5152

5253
```python
5354

54-
import time
5555
import webscraping_ai
5656
from webscraping_ai.rest import ApiException
5757
from pprint import pprint
@@ -79,28 +79,27 @@ with webscraping_ai.ApiClient(configuration) as api_client:
7979
# Create an instance of the API class
8080
api_instance = webscraping_ai.AIApi(api_client)
8181
url = 'https://example.com' # str | URL of the target page.
82-
question = 'What is the summary of this page content?' # str | Question or instructions to ask the LLM model about the target page. (optional)
83-
context_limit = 4000 # int | Maximum number of tokens to use as context for the LLM model (4000 by default). (optional) (default to 4000)
84-
response_tokens = 100 # int | Maximum number of tokens to return in the LLM model response. The total context size (context_limit) includes the question, the target page content and the response, so this parameter reserves tokens for the response (see also on_context_limit). (optional) (default to 100)
85-
on_context_limit = 'error' # str | What to do if the context_limit parameter is exceeded (truncate by default). The context is exceeded when the target page content is too long. (optional) (default to 'error')
82+
fields = {'key': '{\"title\":\"Main product title\",\"price\":\"Current product price\",\"description\":\"Full product description\"}'} # Dict[str, str] | Object describing fields to extract from the page and their descriptions
8683
headers = {'key': '{\"Cookie\":\"session=some_id\"}'} # Dict[str, str] | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}). (optional)
8784
timeout = 10000 # int | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). (optional) (default to 10000)
8885
js = True # bool | Execute on-page JavaScript using a headless browser (true by default). (optional) (default to True)
8986
js_timeout = 2000 # int | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. (optional) (default to 2000)
90-
proxy = 'datacenter' # str | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. (optional) (default to 'datacenter')
91-
country = 'us' # str | Country of the proxy to use (US by default). Only available on Startup and Custom plans. (optional) (default to 'us')
92-
device = 'desktop' # str | Type of device emulation. (optional) (default to 'desktop')
87+
wait_for = 'wait_for_example' # str | CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout. (optional)
88+
proxy = datacenter # str | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. (optional) (default to datacenter)
89+
country = us # str | Country of the proxy to use (US by default). (optional) (default to us)
90+
custom_proxy = 'custom_proxy_example' # str | Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (<a target=\"_blank\" href=\"https://webscraping.ai/proxies/smartproxy\">Smartproxy</a> for example). (optional)
91+
device = desktop # str | Type of device emulation. (optional) (default to desktop)
9392
error_on_404 = False # bool | Return error on 404 HTTP status on the target page (false by default). (optional) (default to False)
9493
error_on_redirect = False # bool | Return error on redirect on the target page (false by default). (optional) (default to False)
95-
js_script = 'document.querySelector('button').click();' # str | Custom JavaScript code to execute on the target page. (optional)
94+
js_script = 'document.querySelector(\'button\').click();' # str | Custom JavaScript code to execute on the target page. (optional)
9695

9796
try:
98-
# Get an answer to a question about a given web page
99-
api_response = api_instance.get_question(url, question=question, context_limit=context_limit, response_tokens=response_tokens, on_context_limit=on_context_limit, headers=headers, timeout=timeout, js=js, js_timeout=js_timeout, proxy=proxy, country=country, device=device, error_on_404=error_on_404, error_on_redirect=error_on_redirect, js_script=js_script)
100-
print("The response of AIApi->get_question:\n")
97+
# Extract structured data fields from a web page
98+
api_response = api_instance.get_fields(url, fields, headers=headers, timeout=timeout, js=js, js_timeout=js_timeout, wait_for=wait_for, proxy=proxy, country=country, custom_proxy=custom_proxy, device=device, error_on_404=error_on_404, error_on_redirect=error_on_redirect, js_script=js_script)
99+
print("The response of AIApi->get_fields:\n")
101100
pprint(api_response)
102101
except ApiException as e:
103-
print("Exception when calling AIApi->get_question: %s\n" % e)
102+
print("Exception when calling AIApi->get_fields: %s\n" % e)
104103

105104
```
106105

@@ -110,6 +109,7 @@ All URIs are relative to *https://api.webscraping.ai*
110109

111110
Class | Method | HTTP request | Description
112111
------------ | ------------- | ------------- | -------------
112+
*AIApi* | [**get_fields**](docs/AIApi.md#get_fields) | **GET** /ai/fields | Extract structured data fields from a web page
113113
*AIApi* | [**get_question**](docs/AIApi.md#get_question) | **GET** /ai/question | Get an answer to a question about a given web page
114114
*AccountApi* | [**account**](docs/AccountApi.md#account) | **GET** /account | Information about your account calls quota
115115
*HTMLApi* | [**get_html**](docs/HTMLApi.md#get_html) | **GET** /html | Page HTML by URL

0 commit comments

Comments
 (0)