-
Notifications
You must be signed in to change notification settings - Fork 737
Improved progress reporting for Xet uploads #3096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improved progress reporting for Xet uploads #3096
Conversation
This PR adds detailed progress reporting for upload_files when hf_xet is used, showing both per-file progress and accurate total progress. Total progress speed, which includes both deduplication and data transfer, is also separated out into separate bars. Requries xet-core / hf_xet at commit 4faec0b or later.
Here is a video of the current version of it in action: Screen.Recording.2025-05-20.at.12.50.49.PM.mov |
@hanouticelina @Wauplin : @hoytak has worked on improving the UX with xet uploads, and this draft PR shows how more accurate reporting can be achieved on the In this he replaced The main motivation for changing the UX with xet is due to the additional step of processing / chunking files. As you have seen, folks have been confused by seeing only file-level progress, and the current UX has 'choppy' upload behavior (the progress bar looks stalled and then jumps). By having that tracked explicitly with this new implementation it should be a lot clearer to users when their overall transfer is complete (TOTAL line), and the processing step (). Before embarking on refactoring the download path with similar UX changes Hoyt wanted to get your feedback / approval with this direction of changes. He's also open to UX changes - naming - colors - whatever with this approach. cc: @bpronan Also, see Slack thread here: https://huggingface.slack.com/archives/C02V5EA0A95/p1747771966873299 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@bot /style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the work @hoytak ! I tested the PR both locally and in a colab to check how it behaves and needless to say that it's a great addition 🔥 And thanks again for removing any extra dependency!
Regarding the implementation itself, I did not review in depth the hf_xet/python binding as it seems to "just work" + the comments make it clear what's going on. I've left some comments to address (mostly cosmetic). Apart from that we should be good to merge.
EDIT: after making the changes, could you run make style
and commit the changes please? You can also run make quality
to make sure everything's fine.
Attaching some screenshots of my tests
Co-authored-by: Lucain <[email protected]>
…oad-progress-bars
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
…s-bars' into hoytak/250507-xet-upload-progress-bars
Great review! Thank you. However, these screenshots are for the download path, while these changes are part of the upload path. It should be similar, but just wanted to clarify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ready to be merged as soon as the CI is green! 🎉
This PR adds detailed progress reporting for upload_files when hf_xet is used, showing both per-file progress and accurate total progress. Total progress speed, which includes both deduplication and data transfer, is also separated out into separate bars.
Requries xet-core / hf_xet at commit 4faec0b or later, which can be obtained using
pip install --pre "hf-xet==1.1.3dev0"