Skip to content

add stable audio tools as a library + code snippets. #741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions packages/tasks/src/model-libraries-snippets.ts
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,43 @@ export const sklearn = (model: ModelData): string[] => {
}
};

export const stable_audio_tools = (model: ModelData): string[] => [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code snippets are usually much smaller than this (basically from stable_audio_tools import get_pretrained_model + model, model_config = get_pretrained_model("${model.id}")).

Are we sure the whole snippet is valid for all models tagged as stable-audio-tools? (or is there only one model?). I'm not against a more complete snippet if you think it makes sense though. Happy to get opinion from others as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only one model - but they support other models (although there are no pre-trained models supported)

Let me see if I can reduce the snippet size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the shortened snippet.

import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)

# Set up text and timing conditioning
conditioning = [{
    "prompt": "128 BPM tech house drum loop",
}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    conditioning=conditioning,
    sample_size=sample_size,
    device=device
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complex imo. Usually the snippets don't have things such as loading into GPU for example. We usually also don't show the actual inference (and just do the loading), but it might be fine for this case.

Although it's just one model, the library is fine-tuning, so we need to make sure our snippet would work for those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code snippet to the minified code snippet as mentioned above!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, happy to hear other opinions :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no preference on my side!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion either especially since it can be reassessed later on if we see it brakes on some finetunes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Should we merge with the current snippet for now? I'm setting a reminder to revisit this in 20 or so days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me!

`import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download model
model, model_config = get_pretrained_model("${model.id}")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)

# Set up text and timing conditioning
conditioning = [{
"prompt": "128 BPM tech house drum loop",
}]

# Generate stereo audio
output = generate_diffusion_cond(
model,
conditioning=conditioning,
sample_size=sample_size,
device=device
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)`,
];

export const fastai = (model: ModelData): string[] => [
`from huggingface_hub import from_pretrained_fastai

Expand Down
8 changes: 8 additions & 0 deletions packages/tasks/src/model-libraries.ts
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,14 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = {
term: { path: "hyperparams.yaml" },
},
},
"stable-audio-tools": {
prettyLabel: "Stable Audio Tools",
repoName: "stable-audio-tools",
repoUrl: "https://github.com/Stability-AI/stable-audio-tools.git",
filter: false,
countDownloads: { term: { path: "model.safetensors" } },
snippets: snippets.stable_audio_tools,
},
"stable-baselines3": {
prettyLabel: "stable-baselines3",
repoName: "stable-baselines3",
Expand Down
Loading