-
Notifications
You must be signed in to change notification settings - Fork 441
add stable audio tools as a library + code snippets. #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The failing test is unrelated to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata in model-libraries.ts
looks good to me. Added a comment about the snippet. Better to wait review from others :)
@@ -326,6 +326,50 @@ export const sklearn = (model: ModelData): string[] => { | |||
} | |||
}; | |||
|
|||
export const stable_audio_tools = (model: ModelData): string[] => [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code snippets are usually much smaller than this (basically from stable_audio_tools import get_pretrained_model
+ model, model_config = get_pretrained_model("${model.id}")
).
Are we sure the whole snippet is valid for all models tagged as stable-audio-tools
? (or is there only one model?). I'm not against a more complete snippet if you think it makes sense though. Happy to get opinion from others as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only one model - but they support other models (although there are no pre-trained models supported)
Let me see if I can reduce the snippet size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the shortened snippet.
import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond
device = "cuda" if torch.cuda.is_available() else "cpu"
# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]
model = model.to(device)
# Set up text and timing conditioning
conditioning = [{
"prompt": "128 BPM tech house drum loop",
}]
# Generate stereo audio
output = generate_diffusion_cond(
model,
conditioning=conditioning,
sample_size=sample_size,
device=device
)
# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")
# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too complex imo. Usually the snippets don't have things such as loading into GPU for example. We usually also don't show the actual inference (and just do the loading), but it might be fine for this case.
Although it's just one model, the library is fine-tuning, so we need to make sure our snippet would work for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code snippet to the minified code snippet as mentioned above!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, happy to hear other opinions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no preference on my side!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion either especially since it can be reassessed later on if we see it brakes on some finetunes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Should we merge with the current snippet for now? I'm setting a reminder to revisit this in 20 or so days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me!
@@ -326,6 +326,50 @@ export const sklearn = (model: ModelData): string[] => { | |||
} | |||
}; | |||
|
|||
export const stable_audio_tools = (model: ModelData): string[] => [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me!
Given there is more or less agreement here, I will merge this! |
@Vaibhavs10 does the model card also require |
@NielsRogge Yes it does! Actually, it requires |
Ok, opened a PR here: https://huggingface.co/stabilityai/stable-audio-open-1.0/discussions/25 |
Thanks for the ping, @NielsRogge, and to take of this - I made a note in my internal doc not to forget about this! |
Adds inference snippets, download count support for stable audio tools.