Skip to content

Improved progress reporting for Xet uploads #3096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

hoytak
Copy link

@hoytak hoytak commented May 20, 2025

This PR adds detailed progress reporting for upload_files when hf_xet is used, showing both per-file progress and accurate total progress. Total progress speed, which includes both deduplication and data transfer, is also separated out into separate bars.

Requries xet-core / hf_xet at commit 4faec0b or later, which can be obtained using pip install --pre "hf-xet==1.1.3dev0"

hoytak added 2 commits May 20, 2025 11:37
This PR adds detailed progress reporting for upload_files when hf_xet
is used, showing both per-file progress and accurate total progress.
Total progress speed, which includes both deduplication and data
transfer, is also separated out into separate bars.

Requries xet-core / hf_xet at commit 4faec0b or later.
@hoytak
Copy link
Author

hoytak commented May 20, 2025

Here is a video of the current version of it in action:

Screen.Recording.2025-05-20.at.12.50.49.PM.mov

@rajatarya rajatarya requested review from hanouticelina and removed request for hanouticelina May 20, 2025 22:11
@rajatarya
Copy link
Contributor

@hanouticelina @Wauplin : @hoytak has worked on improving the UX with xet uploads, and this draft PR shows how more accurate reporting can be achieved on the upload_folder path.

In this he replaced tqdm with rich since it gave him more flexibility.

The main motivation for changing the UX with xet is due to the additional step of processing / chunking files. As you have seen, folks have been confused by seeing only file-level progress, and the current UX has 'choppy' upload behavior (the progress bar looks stalled and then jumps). By having that tracked explicitly with this new implementation it should be a lot clearer to users when their overall transfer is complete (TOTAL line), and the processing step ().

Before embarking on refactoring the download path with similar UX changes Hoyt wanted to get your feedback / approval with this direction of changes. He's also open to UX changes - naming - colors - whatever with this approach.

cc: @bpronan

Also, see Slack thread here: https://huggingface.slack.com/archives/C02V5EA0A95/p1747771966873299

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin
Copy link
Contributor

Wauplin commented Jun 11, 2025

@bot /style

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the work @hoytak ! I tested the PR both locally and in a colab to check how it behaves and needless to say that it's a great addition 🔥 And thanks again for removing any extra dependency!

Regarding the implementation itself, I did not review in depth the hf_xet/python binding as it seems to "just work" + the comments make it clear what's going on. I've left some comments to address (mostly cosmetic). Apart from that we should be good to merge.

EDIT: after making the changes, could you run make style and commit the changes please? You can also run make quality to make sure everything's fine.


Attaching some screenshots of my tests

image

image

@hoytak
Copy link
Author

hoytak commented Jun 11, 2025

Thanks a lot for the work @hoytak ! I tested the PR both locally and in a colab to check how it behaves and needless to say that it's a great addition 🔥 And thanks again for removing any extra dependency!

Great review! Thank you.

However, these screenshots are for the download path, while these changes are part of the upload path. It should be similar, but just wanted to clarify.

@hoytak hoytak marked this pull request as ready for review June 11, 2025 15:28
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready to be merged as soon as the CI is green! 🎉

@Wauplin
Copy link
Contributor

Wauplin commented Jun 11, 2025

re-tested properly 🤦

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants