Skip to content

Add the ability to specify load state #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

stevenmichaelthomas
Copy link

Problem

Some sites that rely on SPA type of architecture are returning HTML content before the page is properly populated with client side fetched data.

Solution

When working with playwright, depending on the architecture of the scraped site, we may want to wait for different page events before considering a page "loaded" and ready to parse.

This adds the ability to specify load_state, which defaults to domcontentloaded but can be overridden with other playwright load states: https://playwright.dev/python/docs/api/class-page#page-wait-for-load-state

@VinciGit00 VinciGit00 changed the base branch from main to pre/beta June 11, 2024 17:59
@VinciGit00 VinciGit00 merged commit fa951b4 into ScrapeGraphAI:pre/beta Jun 11, 2024
2 checks passed
@VinciGit00
Copy link
Collaborator

hi, thank you, if you have idea to how to implement the authentication please contact us

@stevenmichaelthomas
Copy link
Author

Thanks for the merge @VinciGit00 ! Is there an issue for authentication? Happy to take a look

@VinciGit00
Copy link
Collaborator

VinciGit00 commented Jun 11, 2024

this one url

Copy link

🎉 This PR is included in version 1.7.0-beta.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

🎉 This PR is included in version 1.6.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants