Skip to content

add stable audio tools as a library + code snippets. #741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 7, 2024

Conversation

Vaibhavs10
Copy link
Member

Adds inference snippets, download count support for stable audio tools.

@Vaibhavs10
Copy link
Member Author

The failing test is unrelated to this PR.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata in model-libraries.ts looks good to me. Added a comment about the snippet. Better to wait review from others :)

@@ -326,6 +326,50 @@ export const sklearn = (model: ModelData): string[] => {
}
};

export const stable_audio_tools = (model: ModelData): string[] => [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code snippets are usually much smaller than this (basically from stable_audio_tools import get_pretrained_model + model, model_config = get_pretrained_model("${model.id}")).

Are we sure the whole snippet is valid for all models tagged as stable-audio-tools? (or is there only one model?). I'm not against a more complete snippet if you think it makes sense though. Happy to get opinion from others as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only one model - but they support other models (although there are no pre-trained models supported)

Let me see if I can reduce the snippet size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the shortened snippet.

import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)

# Set up text and timing conditioning
conditioning = [{
    "prompt": "128 BPM tech house drum loop",
}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    conditioning=conditioning,
    sample_size=sample_size,
    device=device
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complex imo. Usually the snippets don't have things such as loading into GPU for example. We usually also don't show the actual inference (and just do the loading), but it might be fine for this case.

Although it's just one model, the library is fine-tuning, so we need to make sure our snippet would work for those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code snippet to the minified code snippet as mentioned above!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, happy to hear other opinions :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no preference on my side!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion either especially since it can be reassessed later on if we see it brakes on some finetunes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Should we merge with the current snippet for now? I'm setting a reminder to revisit this in 20 or so days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me!

@@ -326,6 +326,50 @@ export const sklearn = (model: ModelData): string[] => {
}
};

export const stable_audio_tools = (model: ModelData): string[] => [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me!

@Vaibhavs10
Copy link
Member Author

Given there is more or less agreement here, I will merge this!

@Vaibhavs10 Vaibhavs10 merged commit cf70e66 into main Jun 7, 2024
4 checks passed
@Vaibhavs10 Vaibhavs10 deleted the add-stable-audio branch June 7, 2024 12:23
@NielsRogge
Copy link
Contributor

@Vaibhavs10 does the model card also require "library_name: "stable_audio" to be added? https://huggingface.co/stabilityai/stable-audio-open-1.0

@Wauplin
Copy link
Contributor

Wauplin commented Jun 10, 2024

@NielsRogge Yes it does! Actually, it requires library_name: stable-audio-tools in the model card metadata.

@NielsRogge
Copy link
Contributor

Ok, opened a PR here: https://huggingface.co/stabilityai/stable-audio-open-1.0/discussions/25

@Vaibhavs10
Copy link
Member Author

Thanks for the ping, @NielsRogge, and to take of this - I made a note in my internal doc not to forget about this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants