-
Notifications
You must be signed in to change notification settings - Fork 430
[Inference Providers] Async calls for text-to-video
with fal.ai
#1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Julien Chaumond <[email protected]>
…e.js into async-calls-falai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 🤩
async function pollFalResponse(res: FalAiOutput, args: TextToVideoArgs, options?: Options): Promise<Blob> { | ||
const requestId = res.request_id; | ||
if (!requestId) { | ||
throw new InferenceOutputError("No request ID found in the response"); | ||
} | ||
let status = res.status; | ||
const { url, info } = await makeRequestOptions(args, { ...options, task: "text-to-video" }); | ||
const baseUrl = url?.split("?")[0] || ""; | ||
const query = url?.includes("_subdomain=queue") ? "?_subdomain=queue" : ""; | ||
|
||
const statusUrl = `${baseUrl}/requests/${requestId}/status${query}`; | ||
const resultUrl = `${baseUrl}/requests/${requestId}${query}`; | ||
|
||
while (status !== "COMPLETED") { | ||
await delay(1000); | ||
const statusResponse = await fetch(statusUrl, { headers: info.headers }); | ||
|
||
if (!statusResponse.ok) { | ||
throw new Error(`HTTP error! status: ${statusResponse.status}`); | ||
} | ||
status = (await statusResponse.json()).status; | ||
} | ||
|
||
const resultResponse = await fetch(resultUrl, { headers: info.headers }); | ||
const result = await resultResponse.json(); | ||
const isValidOutput = | ||
typeof result === "object" && | ||
!!result && | ||
"video" in result && | ||
typeof result.video === "object" && | ||
!!result.video && | ||
"url" in result.video && | ||
typeof result.video.url === "string" && | ||
isUrl(result.video.url); | ||
if (!isValidOutput) { | ||
throw new InferenceOutputError("Expected { video: { url: string } }"); | ||
} | ||
const urlResponse = await fetch(result.video.url); | ||
return await urlResponse.blob(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to move this util in the src/providers/fal-ai.ts
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, we should probably do the same for:
async function pollBflResponse(url: string, outputType?: "url" | "blob"): Promise<Blob> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in cf2d1ac. I had to move FalAiOutput
to src/providers/fal-ai.ts
to avoid an import cycle, maybe we should move every provider-specific output type to their respective provider files, wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should move every provider-specific output type to their respective provider files, wdyt ?
Agree with this and in particular their output type + logic to validate and parse the response. Same as the get_response
's in Python. Out of scope for this PR though
Co-authored-by: Simon Brandeis <[email protected]>
Co-authored-by: Simon Brandeis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
…e.js into async-calls-falai
…e.js into async-calls-falai
} | ||
const urlResponse = await fetch((res as FalAiOutput).video.url); | ||
return await urlResponse.blob(); | ||
const { url, info } = await makeRequestOptions(args, { ...options, task: "text-to-video" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, but it feels a bit weird to have to call makeRequestOptions
again here
I think we can have a simpler / better API that does not require to mentally map why we call makeRequestOptions
with those parameters.
I'm wondering whether request
should return an optional Promise
that we would just await
here? - example usage syntax:
const res = await request<FalAiQueueOutput | ReplicateOutput | NovitaOutput>(payload, {
...options,
task: "text-to-video",
});
if (res.poll) {
const blob: Blob = await res.poll;
}
Anyways - I'm OK with addressing that in a subsequent PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes totally agree, it feels a bit wrong to call makeRequestOptions
again just after calling request
.
I'm currently working on a refactoring PR, I'll rework this part as well.
EDIT: I think it's better to have smaller and easier PRs, so for this one in particular, let's open a dedicated PR 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you!!
That means we can allow any |
Yes @kefranabg ! |
…makeRequestOptions calls (#1314) Related to @SBrandeis's comment here #1292 (comment). This PR addresses the original concern about redundant `makeRequestOptions` calls introduced in #1292.The solution implemented here updates the `request` function to return both the response and a _request context_ when needed, allowing provider-specific polling code to reuse this context without redundant calls to `makeRequestOptions`. This differs from the initial suggestion in the comment as each provider implements polling differently with different parameters / response formats. Making a generic `.poll` property would require mixing provider-specific logic into the core request function (we don't want that, right? 😄 ). In the end, we want to keep provider-specific logic isolated in their respective provider files (PR coming today to push that further!).
What does this PR do?
This PR adds asynchronous polling to the fal.ai text-to-video generation. This allows running inference with models that may take > 2 min to generate results. The other motivation behind this PR is to align the Python and JS clients, the Python client has already been merged into main: huggingface/huggingface_hub#2927
Main Changes
baseUrl
property withmakeBaseUrl()
function across all providers. This is needed to be able to customize the base url based on the task. We want to useFAL_AI_API_BASE_URL_QUEUE
fortext-to-video
only. I'm not convinced if it's the simplest and the best way to do that.pollFalResponse()
fortext-to-video
(similarly to what it's done with BFL fortext-to-image
).Any refactoring suggestions are welcome! I'm willing to spend some additional time to make provider-specific updates easier to implement and better align our two clients 🙂
btw, I did not update the VCR tests as we've discussed that it'd be best to remove the VCR for
text-to-video
. Maybe we should remove them here?EDIT: removed the text-to-video tests in f8a6386.
I've tested it locally with tencent/HunyuanVideo for which the generation takes more than 2min and it works fine:
fal-ai-output.mp4