Enhance video handling and performance optimizations

- Added environment variables to prevent CPU thread pools from busy-waiting.
- Deferred loading of video models until first use to reduce VRAM footprint.
- Implemented streaming of speaking clips for improved responsiveness.
- Introduced a queue for managing speaking clips to handle multiple requests smoothly.
- Updated video playback logic to ensure proper handling of clip generation.
This commit is contained in:
2026-04-24 00:36:18 -04:00
parent 129df7d1fa
commit 44a10667c2
7 changed files with 234 additions and 69 deletions
+5 -3
View File
@@ -118,11 +118,13 @@ class ModelManager:
log.info("Video engine disabled (config.video.enabled=false). Skipping load.")
return
log.info("Loading avatar video engine...")
log.info("Video engine configured (models load on first avatar upload).")
cfg = VideoConfig.from_dict(video_cfg_raw)
self.video_engine = VideoEngine(cfg)
self.video_engine.load_models()
log.info("Avatar video engine loaded (mode=%s).", cfg.mode)
# load_models() is intentionally deferred: Wan2.2 + MuseTalk consume
# ~6.5 GB VRAM at idle, which causes WDDM preemption latency on the
# Windows host even with no connected clients. Models are loaded on
# demand when set_avatar() is first called.
def create_vad(self) -> StreamingVAD:
"""Create a new StreamingVAD instance for a client session."""