Screenshot scraper integration #558

VinciGit00 · 2024-08-18T18:54:56Z

No description provided.

DiTo97 · 2024-08-18T22:47:53Z

scrapegraphai/nodes/fetch_screen_node.py

+            browser.close()
+
+        for screenshot_data in screenshot_data_list:
+            screenshots.append(screenshot_data)


why having duplicate data structures screenshots and screenshot_data_list?

keep only one of the two, and drop the other

DiTo97 · 2024-08-18T22:51:27Z

scrapegraphai/nodes/fetch_screen_node.py

+
+            capture_screenshot(0, screenshot_counter)
+            screenshot_counter += 1
+            capture_screenshot(viewport_height, screenshot_counter)


infinite-scrolling web pages (with dynamic JS rendering) may have undefined behavior with the viewport height, as that is bound to change as well as you scroll through the page in the browser.

looks fine for now, but I would maybe add a disclaimer in the docs saying screenshot capturing might not work as well for those; collecting only two screenshots is fine as well, but that number might better be a function of the viewport height, so that we don't miss parts of the page content in case a page is very very long (although not infinite-scrolling)

DiTo97 · 2024-08-18T22:54:12Z

scrapegraphai/nodes/generate_answer_from_image_node.py

+
+        api_key = self.node_config.get("config", {}).get("llm", {}).get("api_key", "")
+
+        supported_models = ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]


all open VLMs would work fine as well if run with ollama or similar, same goes for gemini-pro

why such a strict set of models?

P.S. - turn it into a set instead of a list

DiTo97 · 2024-08-18T22:56:33Z

scrapegraphai/nodes/generate_answer_from_image_node.py

+            }
+
+            payload = {
+                "model": "gpt-4o-mini",


why hardcoding the mini version if a support model was passed by the user?

DiTo97 · 2024-08-18T22:56:59Z

scrapegraphai/nodes/generate_answer_from_image_node.py

+                             is not supported. Supported models are: 
+                             {', '.join(supported_models)}.""")
+
+        for image_data in images:


better off using the async API and processing the images in parallel so not to waste time for nothing

DiTo97 · 2024-08-18T22:59:49Z

scrapegraphai/graphs/screenshot_scraper_graph.py

+        """
+        fetch_screen_node = FetchScreenNode(
+            input="url",
+            output=["imgs"],


why putting the key here if the node doesn't use it and hardcodes it as screenshots?

DiTo97 · 2024-08-18T23:00:12Z

scrapegraphai/graphs/screenshot_scraper_graph.py

+        )
+        generate_answer_from_image_node = GenerateAnswerFromImageNode(
+            input="doc",
+            output=["parsed_doc"],


same goes here, neither the input nor output arguments are really used by the node

DiTo97 · 2024-08-19T11:03:07Z

scrapegraphai/nodes/generate_answer_from_image_node.py

-            }
-
-            return state
+    def execute(self, state: dict) -> dict:


wrap the function with a safer guard on the running event loop, e.g.:

def execute(self, state: dict) -> dict: """ Wrapper to run the asynchronous execute_async function in a synchronous context. """ try: eventloop = asyncio.get_event_loop() except RuntimeError: eventloop = None if eventloop and eventloop.is_running(): state = eventloop.run_until_complete(self.execute_async(state)) else: state = asyncio.run(self.execute_async(state)) return state

github-actions · 2024-08-20T09:35:30Z

🎉 This issue has been resolved in version 1.14.0-beta.13 🎉

The release is available on:

v1.14.0-beta.13
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2024-08-20T19:14:37Z

🎉 This issue has been resolved in version 1.14.0 🎉

The release is available on:

v1.14.0
GitHub release

Your semantic-release bot 📦🚀

VinciGit00 added 3 commits August 18, 2024 19:39

add screenshot scraper

8e3d5de

feat: refactoring of the code

5eb3cff

Update generate_answer_from_image_node.py

103c21c

DiTo97 requested changes Aug 18, 2024

View reviewed changes

VinciGit00 added 4 commits August 19, 2024 01:15

refactoring of the nodes

c72c077

add if and cool stuff

79fa3f6

Update generate_answer_from_image_node.py

0bf79b5

feat: add async call

f60aa3a

DiTo97 reviewed Aug 19, 2024

View reviewed changes

VinciGit00 added 2 commits August 19, 2024 14:24

add try catch and robust integration

f774fe4

Update screenshot_scraper_graph.py

fee77d1

DiTo97 approved these changes Aug 19, 2024

View reviewed changes

Merge branch 'pre/beta' into screenshot_scraper

d248646

VinciGit00 merged commit 860fde8 into pre/beta Aug 20, 2024
3 checks passed

VinciGit00 deleted the screenshot_scraper branch August 20, 2024 09:34

github-actions bot added the released on @dev label Aug 20, 2024

github-actions bot added the released on @stable label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Screenshot scraper integration #558

Screenshot scraper integration #558

Uh oh!

VinciGit00 commented Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 18, 2024

Uh oh!

DiTo97 Aug 19, 2024

Uh oh!

Uh oh!

github-actions bot commented Aug 20, 2024

Uh oh!

github-actions bot commented Aug 20, 2024

Uh oh!

Uh oh!


		api_key = self.node_config.get("config", {}).get("llm", {}).get("api_key", "")

		supported_models = ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]

Uh oh!

Screenshot scraper integration #558

Screenshot scraper integration #558

Uh oh!

Conversation

VinciGit00 commented Aug 18, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 20, 2024

Uh oh!

github-actions bot commented Aug 20, 2024

Uh oh!

Uh oh!