Skip to content

allignement #646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,30 @@
## [1.17.0-beta.11](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.10...v1.17.0-beta.11) (2024-09-07)
## [1.18.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.18.0...v1.18.1) (2024-09-08)


### Features
### Bug Fixes

* add scrape_do_integration ([94e69a0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/94e69a051591aeec1e7268bf0d5e0338f90e9539))
* fetch_node improved ([167f970](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/167f97040f081867cecff542c3af8aa122499ce8))
* **browser_base_fetch:** correct function signature and async_mode handling ([007ff08](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/007ff084c68d419fac040d9b5cca3980458cfabc))

## [1.17.0-beta.10](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.9...v1.17.0-beta.10) (2024-09-07)
## [1.18.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0...v1.18.0) (2024-09-08)


### Bug Fixes

* screenshot_scraper ([ef7a589](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ef7a5891dcb1b4ed8a97947f5563fa78af917ecb))
### Features

* **browser_base_fetch:** add async_mode to support both synchronous and asynchronous execution ([d56253d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d56253d183969584cacc0cb164daa0152462f21c))

## [1.17.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.16.0...v1.17.0) (2024-09-08)

## [1.17.0-beta.9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.8...v1.17.0-beta.9) (2024-09-06)


### Features

* ConcatNode.py added for heavy merge operations ([bd4b26d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bd4b26d7d7c1a7953d1bc9d78b436007880028c9))
* **docloaders:** Enhance browser_base_fetch function flexibility ([57fd01f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/57fd01f9a76ea8ea69ec04b7238ab58ca72ac8f4))

## [1.17.0-beta.8](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.17.0-beta.7...v1.17.0-beta.8) (2024-09-06)

### Docs

### Features
* **sponsor:** 🅱️ Browserbase sponsor 🅱️ ([a540139](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a5401394cc939d9a5fc58b8a9145141c2f047bab))

* **AbstractGraph:** add adjustable rate limit ([2859fb7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/2859fb72d699f26b617ed2f949cdcfca1671c5c8))

Expand Down Expand Up @@ -98,6 +99,7 @@
* **release:** 1.16.0-beta.3 [skip ci] ([886c987](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/886c987172bb57fb59863e4d7b494797bba16980))
* **release:** 1.16.0-beta.4 [skip ci] ([ba5c7ad](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ba5c7adcea138d993005377f4cfe438795e1b124))


## [1.16.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.15.2...v1.16.0) (2024-09-01)


Expand Down
34 changes: 21 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,27 +32,32 @@ playwright install

**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

By the way if you to use not mandatory modules it is necessary to install by yourself with the following command:
<details>
<summary><b>Optional Dependencies</b></summary>
Additional dependecies can be added while installing the library:

- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.

### Installing "Other Language Models"

This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
```bash
pip install scrapegraphai[other-language-models]

```
### Installing "More Semantic Options"
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.

```bash
pip install scrapegraphai[more-semantic-options]
```

- <b>Browsers Options</b>: this group includes additional browser management tools/services, such as Browserbase.

```bash
pip install scrapegraphai[more-browser-options]
```

</details>

This group includes tools for advanced semantic processing, such as Graphviz.
```bash
pip install scrapegraphai[more-semantic-options]
```
### Installing "More Browser Options"

This group includes additional browser management options, such as BrowserBase.
```bash
pip install scrapegraphai[more-browser-options]
```

### Installing "More Browser Options"

Expand Down Expand Up @@ -135,6 +140,9 @@ Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).

## 🏆 Sponsors
<div style="text-align: center;">
<a href="https://2ly.link/1zaXG">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/browserbase_logo.png" alt="Browserbase" style="width: 10%;">
</a>
<a href="https://2ly.link/1zNiz">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
</a>
Expand Down
Binary file added docs/assets/browserbase_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions docs/source/introduction/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ FAQ
Sponsors
========

.. image:: ../../assets/browserbase_logo.png
:width: 10%
:alt: Browserbase
:target: https://www.browserbase.com/

.. image:: ../../assets/serp_api_logo.png
:width: 10%
:alt: Serp API
Expand Down
22 changes: 19 additions & 3 deletions scrapegraphai/docloaders/browser_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""
from typing import List

def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[str]:
def browser_base_fetch(api_key: str, project_id: str, link: List[str], text_content: bool = True, async_mode: bool = False) -> List[str]:
"""
BrowserBase Fetch

Expand All @@ -13,6 +13,8 @@ def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[s
- `api_key`: The API key provided by BrowserBase.
- `project_id`: The ID of the project on BrowserBase where you want to fetch data from.
- `link`: The URL or link that you want to fetch data from.
- `text_content`: A boolean flag to specify whether to return only the text content (True) or the full HTML (False).
- `async_mode`: A boolean flag that determines whether the function runs asynchronously (True) or synchronously (False, default).

It initializes a Browserbase object with the given API key and project ID,
then uses this object to load the specified link.
Expand All @@ -35,6 +37,8 @@ def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[s
api_key (str): The API key provided by BrowserBase.
project_id (str): The ID of the project on BrowserBase where you want to fetch data from.
link (str): The URL or link that you want to fetch data from.
text_content (bool): Whether to return only the text content (True) or the full HTML (False). Defaults to True.
async_mode (bool): Whether to run the function asynchronously (True) or synchronously (False). Defaults to False.

Returns:
object: The result of the loading operation.
Expand All @@ -49,7 +53,19 @@ def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[s
browserbase = Browserbase(api_key=api_key, project_id=project_id)

result = []
for l in link:
result.append(browserbase.load(l, text_content=True))
async def _async_fetch_link(l):
return await asyncio.to_thread(browserbase.load, l, text_content=text_content)

if async_mode:
async def _async_browser_base_fetch():
for l in link:
result.append(await _async_fetch_link(l))
return result

result = asyncio.run(_async_browser_base_fetch())
else:
for l in link:
result.append(browserbase.load(l, text_content=text_content))


return result
Loading