Enhance video handling and performance optimizations

- Added environment variables to prevent CPU thread pools from busy-waiting. - Deferred loading of video models until first use to reduce VRAM footprint. - Implemented streaming of speaking clips for improved responsiveness. - Introduced a queue for managing speaking clips to handle multiple requests smoothly. - Updated video playback logic to ensure proper handling of clip generation.
2026-04-24 00:36:18 -04:00
parent 129df7d1fa
commit 44a10667c2
7 changed files with 234 additions and 69 deletions
@@ -118,11 +118,13 @@ class ModelManager:
            log.info("Video engine disabled (config.video.enabled=false). Skipping load.")
            return

-        log.info("Loading avatar video engine...")
+        log.info("Video engine configured (models load on first avatar upload).")
        cfg = VideoConfig.from_dict(video_cfg_raw)
        self.video_engine = VideoEngine(cfg)
-        self.video_engine.load_models()
-        log.info("Avatar video engine loaded (mode=%s).", cfg.mode)
+        # load_models() is intentionally deferred: Wan2.2 + MuseTalk consume
+        # ~6.5 GB VRAM at idle, which causes WDDM preemption latency on the
+        # Windows host even with no connected clients. Models are loaded on
+        # demand when set_avatar() is first called.

    def create_vad(self) -> StreamingVAD:
        """Create a new StreamingVAD instance for a client session."""