Support models that don't split stream chunks in tokens #8235

chenmoneygithub · 2025-05-17T05:24:33Z

Resolve #8211

Gemini models don't split stream chunks into 1-2 tokens per chunk, but instead have lots of tokens per chunk. We are reworking the listeners' logic to support this use case.

Side note - I do understand the rationale of buffering tokens for safety checks. but why not streaming back the original tokens for consistency with other models? This also creates a burden for app developer who wants to render the stream on UI.

chenmoneygithub added 2 commits May 16, 2025 22:21

Support models that don't split well

eb99cf6

clean up

069f2a3

chenmoneygithub force-pushed the fix-gemini-streaming branch from 286aca0 to 069f2a3 Compare May 17, 2025 05:25

chenmoneygithub mentioned this pull request May 17, 2025

[Bug] Google Models do not stream chunks using StreamListener #8211

Closed

okhat merged commit 5d31cd1 into stanfordnlp:main May 17, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support models that don't split stream chunks in tokens #8235

Support models that don't split stream chunks in tokens #8235

chenmoneygithub commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!

Support models that don't split stream chunks in tokens #8235

Support models that don't split stream chunks in tokens #8235

Conversation

chenmoneygithub commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!