Skip to content

feat(hub): adding pathsInfo function #1031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

axel7083
Copy link
Contributor

@axel7083 axel7083 commented Nov 15, 2024

Description

Following discussion #1024 and incompatibility of using the HEAD request to get the same etag as the python library is using for populating the cache directory.

This PR add the pathsInfo function that return the paths information including the LFS oid (or etag) if the file is a LFS pointer.

As suggested by @coyotte508 in #1024 (review)

Related issues

Fixes #1023 (provide an alternative method to fileDownloadInfo.

Tests

  • unit tests has been added

Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

You should export the function in lib/index.ts

@axel7083 axel7083 requested a review from coyotte508 November 15, 2024 11:24
@coyotte508 coyotte508 merged commit 8fc1f6e into huggingface:main Nov 15, 2024
4 checks passed
@julien-c
Copy link
Member

great contrib @axel7083

coyotte508 added a commit that referenced this pull request Nov 18, 2024
## Description

Following #1031 which
added a `pathsInfo` method which can return the etag/commitHash for a
given file. Allowing to be compliant with the
`_hf_hub_download_to_cache_dir`[^1] method from the python library.

[^1]:
[huggingface_hub/file_download.py#L882](https://github.com/huggingface/huggingface_hub/blob/c547c839dbbe0163e3ca422d017daad7c7f9361f/src/huggingface_hub/file_download.py#L882)

## Potential issue

The JS implementation do not handle the .lock files as the python
library does.. This could be a problem if using the JS and PY function..
?

The JS could make a basic implementation of the lock file that the PY
lib is doing if this is a hard requirement.

## Testing

I wrote tests for the existing `downloadFile` function (no change to the
implementation) and the new one added `downloadFileToCacheDir`.
 
- [x] unit tests has been added

---------

Co-authored-by: Eliott C. <[email protected]>
Co-authored-by: Eliott C. <[email protected]>
coyotte508 added a commit that referenced this pull request Nov 19, 2024
## Description

We can now create a `snapshotDownload` method similator to the
`snapshot_download` of the PY lib[^1], clone to the cache (only cache
supported for now) a repository (either model, space or dataset)

[^1]:
https://huggingface.co/docs/huggingface_hub/en/guides/download#download-an-entire-repository

## Related issues/PR

With the amazing help of @coyotte508 we were able to merge the following
changes

- #1034
- #1031
- #999

Which allow this PR to provide a python compliant clone of a hugging
face repository to the cache directory.

## Testing

- [x] unit tests are covering the new feature

**Manually**

```ts
await snapshotDownload({
	repo: {
		name: 'OuteAI/OuteTTS-0.1-350M',
		type: 'model',
	},
});
```

assert using the `huggingface-cli` tool (python)
```
$: huggingface-cli scan-cache
REPO ID                             REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED     LAST_MODIFIED     REFS LOCAL PATH                                                                         
----------------------------------- --------- ------------ -------- ----------------- ----------------- ---- ---------------------------------------------------------------------------------- 
OuteAI/OuteTTS-0.1-350M             model           731.6M       14 5 minutes ago     5 minutes ago     main /home/axel7083/.cache/huggingface/hub/models--OuteAI--OuteTTS-0.1-350M
```

---------

Co-authored-by: Eliott C. <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug(@huggingface/hub): fileDownloadInfo return an etag for LFS file which seems weird
3 participants