Enhance video handling and performance optimizations
- Added environment variables to prevent CPU thread pools from busy-waiting. - Deferred loading of video models until first use to reduce VRAM footprint. - Implemented streaming of speaking clips for improved responsiveness. - Introduced a queue for managing speaking clips to handle multiple requests smoothly. - Updated video playback logic to ensure proper handling of clip generation.
This commit is contained in:
+5
-3
@@ -118,11 +118,13 @@ class ModelManager:
|
||||
log.info("Video engine disabled (config.video.enabled=false). Skipping load.")
|
||||
return
|
||||
|
||||
log.info("Loading avatar video engine...")
|
||||
log.info("Video engine configured (models load on first avatar upload).")
|
||||
cfg = VideoConfig.from_dict(video_cfg_raw)
|
||||
self.video_engine = VideoEngine(cfg)
|
||||
self.video_engine.load_models()
|
||||
log.info("Avatar video engine loaded (mode=%s).", cfg.mode)
|
||||
# load_models() is intentionally deferred: Wan2.2 + MuseTalk consume
|
||||
# ~6.5 GB VRAM at idle, which causes WDDM preemption latency on the
|
||||
# Windows host even with no connected clients. Models are loaded on
|
||||
# demand when set_avatar() is first called.
|
||||
|
||||
def create_vad(self) -> StreamingVAD:
|
||||
"""Create a new StreamingVAD instance for a client session."""
|
||||
|
||||
Reference in New Issue
Block a user