- Added environment variables to prevent CPU thread pools from busy-waiting.
- Deferred loading of video models until first use to reduce VRAM footprint.
- Implemented streaming of speaking clips for improved responsiveness.
- Introduced a queue for managing speaking clips to handle multiple requests smoothly.
- Updated video playback logic to ensure proper handling of clip generation.
Upgrade PyTorch to 2.7+ with cu128 wheels for Blackwell (sm_120) GPU
support. Replace silero-vad (which depends on torchaudio) with a direct
ONNX Runtime implementation of the same Silero VAD model, eliminating
the torchaudio dependency entirely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>