Discussion about this post

User's avatar
Santhosh Guntupalli's avatar

Great write-up. One thing worth adding for anyone building on top of this: the queue parameter is essentially your chunk boundary window, and if you're processing chunked audio in a pipeline, misaligned chunk boundaries (e.g. cutting in the middle of a phoneme) can cause near-zero-duration frames and invalid MP3 frames that make Whisper silently produce garbled output. Setting queue=30 is safe for most use cases, but for real-time streaming, going lower than queue=5 can introduce subtle timestamp drift that compounds over long videos. Worth watching if you're building anything production-grade.

No posts

Ready for more?