Skip to content

Commit d56253d

Browse files
committed
feat(browser_base_fetch): add async_mode to support both synchronous and asynchronous execution
- Introduced an async_mode flag to allow users to choose between synchronous and asynchronous fetching using Browserbase. - Refactored common logic (browserbase initialization and result list) to avoid redundancy. - Added internal async handling with asyncio.to_thread() for non-blocking execution in async_mode. - Maintained backward compatibility for existing synchronous functionality.
1 parent a540139 commit d56253d

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

scrapegraphai/docloaders/browser_base.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[s
1313
- `api_key`: The API key provided by BrowserBase.
1414
- `project_id`: The ID of the project on BrowserBase where you want to fetch data from.
1515
- `link`: The URL or link that you want to fetch data from.
16+
- `text_content`: A boolean flag to specify whether to return only the text content (True) or the full HTML (False).
17+
- `async_mode`: A boolean flag that determines whether the function runs asynchronously (True) or synchronously (False, default).
1618
1719
It initializes a Browserbase object with the given API key and project ID,
1820
then uses this object to load the specified link.
@@ -35,6 +37,8 @@ def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[s
3537
api_key (str): The API key provided by BrowserBase.
3638
project_id (str): The ID of the project on BrowserBase where you want to fetch data from.
3739
link (str): The URL or link that you want to fetch data from.
40+
text_content (bool): Whether to return only the text content (True) or the full HTML (False). Defaults to True.
41+
async_mode (bool): Whether to run the function asynchronously (True) or synchronously (False). Defaults to False.
3842
3943
Returns:
4044
object: The result of the loading operation.
@@ -49,7 +53,22 @@ def browser_base_fetch(api_key: str, project_id: str, link: List[str]) -> List[s
4953
browserbase = Browserbase(api_key=api_key, project_id=project_id)
5054

5155
result = []
52-
for l in link:
53-
result.append(browserbase.load(l, text_content=True))
56+
# Define the async fetch logic for individual links
57+
async def _async_fetch_link(l):
58+
return await asyncio.to_thread(browserbase.load, l, text_content=text_content)
59+
60+
if async_mode:
61+
# Asynchronously process each link
62+
async def _async_browser_base_fetch():
63+
for l in link:
64+
result.append(await _async_fetch_link(l))
65+
return result
66+
67+
# Run the async fetch function
68+
result = asyncio.run(_async_browser_base_fetch())
69+
else:
70+
# Synchronous logic
71+
for l in link:
72+
result.append(browserbase.load(l, text_content=text_content))
5473

5574
return result

0 commit comments

Comments
 (0)