62 lines
2.0 KiB
Markdown
62 lines
2.0 KiB
Markdown
# Voice Chat
|
|
|
|
A real-time voice conversation app powered by local AI models. Speak into your mic and get spoken responses back — all running on your own GPU with no cloud APIs.
|
|
|
|
## Pipeline
|
|
|
|
**Mic input** → **VAD** (Silero ONNX) → **ASR** (Qwen3-ASR-0.6B) → **LLM** (Qwen3.5-0.8B) → **TTS** (Kokoro) → **Speaker output**
|
|
|
|
- **VAD** — Silero VAD via ONNX Runtime, detects speech/silence boundaries on CPU
|
|
- **ASR** — Qwen3-ASR-0.6B, bfloat16 on CUDA
|
|
- **LLM** — Qwen3.5-0.8B, loaded via transformers
|
|
- **TTS** — Kokoro, streams sentence-by-sentence audio at 24 kHz
|
|
- **Barge-in** — interrupt the assistant mid-response by speaking
|
|
|
|
## Requirements
|
|
|
|
- NVIDIA GPU with CUDA 12.8 support
|
|
- Docker + Docker Compose with the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
docker compose up --build
|
|
```
|
|
|
|
Then open [http://localhost:8000](http://localhost:8000) in your browser.
|
|
|
|
Models are downloaded from Hugging Face on first launch and cached in a Docker volume (`huggingface-cache`) so they persist across rebuilds.
|
|
|
|
## Local Development (without Docker)
|
|
|
|
```bash
|
|
# Install PyTorch with CUDA 12.8
|
|
pip install torch --index-url https://download.pytorch.org/whl/cu128
|
|
|
|
# Install auto-gptq
|
|
pip install "auto-gptq>=0.7.1" --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu128/
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Run
|
|
python run.py
|
|
```
|
|
|
|
The server starts on port 8000.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
server/
|
|
main.py — FastAPI app, WebSocket endpoint
|
|
models.py — Model loading and management
|
|
pipeline.py — VAD -> ASR -> LLM -> TTS orchestration
|
|
vad.py — Silero VAD (ONNX) streaming wrapper
|
|
asr.py — Speech recognition engine
|
|
llm.py — Language model engine
|
|
tts.py — Kokoro TTS engine
|
|
audio_utils.py — PCM/float32 conversion helpers
|
|
static/ — Browser UI (HTML/JS/CSS)
|
|
```
|