Skip to content

feat(n-level deep scrape): Modify SearchLinkNode to find out the relevant links from the webpage #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 11, 2024

Conversation

mayurdb
Copy link
Contributor

@mayurdb mayurdb commented May 11, 2024

This is part 2 of the code changes to support n-level deep web search by following the links in the input page to accomplish a task given in the prompt. Part 1

Here we modify the SearchLinkNode to filter out the relevant links for the prompts from the input page.

Also added a top-level DeepScrapeGraph which is still a work in progress.

(scrape311) ➜  Scrapegraph-ai git:(deepScrape) ✗ python examples/openai/deep_scraper_openai.py
--- Executing Fetch Node ---
--- Executing Parse Node ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- (tokens compressed and vector stored) ---
--- Executing GenerateLinks Node ---
Processing chunks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:30<00:00, 30.32s/it]
No answer found.
['https://www.google.com/about/careers/applications/jobs/results/jobs/results/111220536071070406-test-engineer-youtube?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/125813937294713542-cad-methodology-engineer-frontend?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/124531687150232262-revenue-operations-global-process-manager?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/138699807697838790-ux-designer-youtube?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/115775200712106694-software-engineer-machine-learning-google-assistant?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/85012766688977606-design-verification-engineer-silicon?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/137757227457880774-data-scientist-product-google-one?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/117989475658670790-product-manager-google-classroom?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/94438708343644870-technical-program-manager-pixel-modem-software?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/81784805567406790-cpu-design-manager-google-cloud?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/116916360363025094-ux-researcher-core-data-ux?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/82532812776710854-cpu-design-verification-lead-google-cloud?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/97066731454767814-embedded-software-test-engineering-manager-silicon?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/78553083537171142-program-manager-vendor-quality-operations?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/132615928540996294-software-engineer-iii-engineering-productivity-youtube?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/103198964695605958-senior-strategist-trust-and-safety?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/112895752088232646-compliance-operations-analyst?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/97334649568535238-quality-program-manager-youtube-global-vendor-operations?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/130013938568831686-front-end-engineering-manager-security-command-center?location=Bangalore+India', 'https://www.google.com/about/careers/applications/jobs/results/jobs/results/100681580478898886-customer-solutions-engineer-gtech?location=Bangalore+India']
       node_name  total_tokens  prompt_tokens  completion_tokens  successful_requests  total_cost_USD  exec_time
0          Fetch             0              0                  0                    0         0.00000   2.591492
1          Parse             0              0                  0                    0         0.00000   0.232289
2            RAG             0              0                  0                    0         0.00000   2.737711
3  GenerateLinks          7083           6278                805                    1         0.23664  30.344680
4   TOTAL RESULT          7083           6278                805                    1         0.23664  35.906173

@VinciGit00 VinciGit00 merged commit d8ed76b into ScrapeGraphAI:pre/beta May 11, 2024
Copy link

🎉 This PR is included in version 0.11.0-beta.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

🎉 This PR is included in version 0.11.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants