Skip to content

[safetensors] add named groups to RE_SAFETENSORS_SHARD_FILE regex #622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 12, 2024

Conversation

mishig25
Copy link
Collaborator

@mishig25 mishig25 commented Apr 11, 2024

add named groups to RE_SAFETENSORS_SHARD_FILE regex

safetensors equivalent of #621

@mishig25 mishig25 requested a review from coyotte508 as a code owner April 11, 2024 16:12
@mishig25 mishig25 requested a review from julien-c April 11, 2024 16:14
@@ -14,7 +14,7 @@ export const SAFETENSORS_INDEX_FILE = "model.safetensors.index.json";
/// but in some situations safetensors weights have different filenames.
export const RE_SAFETENSORS_FILE = /\.safetensors$/;
export const RE_SAFETENSORS_INDEX_FILE = /\.safetensors\.index\.json$/;
export const RE_SAFETENSORS_SHARD_FILE = /\d{5}-of-\d{5}\.safetensors$/;
export const RE_SAFETENSORS_SHARD_FILE = /(?<shard>\d{5})-of-(?<total>\d{5})\.safetensors$/;
Copy link
Member

@coyotte508 coyotte508 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add the "prefix" group? If you're planning on extracting it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case add [_-] between the captured prefix and the shard number

(most models on the hub use a dash at this position (it's what's generated by transformers), but not bigscience/bloom)

@@ -14,7 +14,7 @@ export const SAFETENSORS_INDEX_FILE = "model.safetensors.index.json";
/// but in some situations safetensors weights have different filenames.
export const RE_SAFETENSORS_FILE = /\.safetensors$/;
export const RE_SAFETENSORS_INDEX_FILE = /\.safetensors\.index\.json$/;
export const RE_SAFETENSORS_SHARD_FILE = /\d{5}-of-\d{5}\.safetensors$/;
export const RE_SAFETENSORS_SHARD_FILE = /(?<shard>\d{5})-of-(?<total>\d{5})\.safetensors$/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case add [_-] between the captured prefix and the shard number

(most models on the hub use a dash at this position (it's what's generated by transformers), but not bigscience/bloom)

@mishig25 mishig25 merged commit d5e538d into main Apr 12, 2024
@mishig25 mishig25 deleted the improve_RE_SAFETENSORS_SHARD_FILE branch April 12, 2024 08:30
mishig25 pushed a commit that referenced this pull request Apr 12, 2024
mishig25 pushed a commit that referenced this pull request Apr 15, 2024
follow up to
#622 (comment)

> but that was my point, the -_ is not optional
> we do want to enforce some level of convention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants