Skip to content

The smart_scraper_multi_graph method is too expensive #756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 20, 2024

Conversation

shenghongtw
Copy link
Contributor

@shenghongtw shenghongtw commented Oct 16, 2024

The smart_scraper_multi_graph method uses LLMs to answer questions both in the smart_scraper_graph and at the MergeAnswer node, which leads to higher costs. I created a new method called SmartScraperMultiParseMergeFirstGraph, which only uses LLMs at the MergeAnswer node. This reduces the cost by half for the same use case and achieve similar results, and the execution time is also shortened (as shown in the figure).
image
image
image

…pes a list of URLs and merge the content first and finally generates answers to a given prompt.

(Different from the SmartScraperMultiGraph is that in this case the content is merged before to be processed by the llm.)
…craper_multi_parse_merge_first_graph_openai_test.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please instead of creating a new graph modify the original one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify the smart_scraper_multi_graph.py?

Copy link
Contributor Author

@shenghongtw shenghongtw Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But original method, called the raptor method, is useful for deep extraction or in more complex situation.
My idea is to rename the original smart_scraper_multi_graph.py to smart_scraper_multi_abstract_graph.py, and then name my method smart_scraper_multi_graph.py.
What are your thoughts on this?
thx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it looks nice

@VinciGit00
Copy link
Collaborator

please can you add in the examples of openai the configuration?

@VinciGit00
Copy link
Collaborator

and please can you call smart_scraper_multi_lite instead of that?

@shenghongtw
Copy link
Contributor Author

and please can you call smart_scraper_multi_lite instead of that?

you mean rename my method name to smart_scraper_multi_lite?

@shenghongtw
Copy link
Contributor Author

please can you add in the examples of openai the configuration?

ok

@VinciGit00
Copy link
Collaborator

Yes please rename it

@shenghongtw
Copy link
Contributor Author

The original smart_scraper_multi_graph keeps its original name?

@VinciGit00
Copy link
Collaborator

The original smart_scraper_multi_graph keeps its original name?

yes, just change your scraper name, in this way it will be more understandable at the first look

@VinciGit00
Copy link
Collaborator

and please add in examples/openai your script
Screenshot 2024-10-17 alle 21 03 42

@VinciGit00
Copy link
Collaborator

VinciGit00 commented Oct 18, 2024

Hi @shenghongtw, would you like to create more tests on the test folder in another pull request please?

@shenghongtw
Copy link
Contributor Author

Hi @shenghongtw, would you like to create more tests on the test folder in another pull request please?

No problem.

@VinciGit00
Copy link
Collaborator

Hi @shenghongtw I will merge this pr now, when the tests will be ready I will be glad to merge it also them.
Thank you and have a nice day

@VinciGit00 VinciGit00 merged commit ffa1067 into ScrapeGraphAI:pre/beta Oct 20, 2024
1 check passed
Copy link

🎉 This PR is included in version 1.27.0-beta.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

🎉 This PR is included in version 1.27.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants