diff --git a/Dockerfile b/Dockerfile
index d41be01..131ba56 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -23,7 +23,7 @@ WORKDIR /app
 
 # Install PyTorch 2.7+ with CUDA 12.8 support (includes Blackwell/sm_120 support)
 RUN python3.11 -m pip install --no-cache-dir \
-    torch torchvision \
+    torch \
     --index-url https://download.pytorch.org/whl/cu128
 
 # Install auto-gptq pre-built wheel for CUDA 12.8 (avoids compiling from source)
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..bb99cbd
--- /dev/null
+++ b/README.md
@@ -0,0 +1,61 @@
+# Voice Chat
+
+A real-time voice conversation app powered by local AI models. Speak into your mic and get spoken responses back — all running on your own GPU with no cloud APIs.
+
+## Pipeline
+
+**Mic input** &rarr; **VAD** (Silero ONNX) &rarr; **ASR** (Qwen3-ASR-0.6B) &rarr; **LLM** (Qwen3.5-0.8B) &rarr; **TTS** (Kokoro) &rarr; **Speaker output**
+
+- **VAD** — Silero VAD via ONNX Runtime, detects speech/silence boundaries on CPU
+- **ASR** — Qwen3-ASR-0.6B, bfloat16 on CUDA
+- **LLM** — Qwen3.5-0.8B, loaded via transformers
+- **TTS** — Kokoro, streams sentence-by-sentence audio at 24 kHz
+- **Barge-in** — interrupt the assistant mid-response by speaking
+
+## Requirements
+
+- NVIDIA GPU with CUDA 12.8 support
+- Docker + Docker Compose with the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+
+## Quick Start
+
+```bash
+docker compose up --build
+```
+
+Then open [http://localhost:8000](http://localhost:8000) in your browser.
+
+Models are downloaded from Hugging Face on first launch and cached in a Docker volume (`huggingface-cache`) so they persist across rebuilds.
+
+## Local Development (without Docker)
+
+```bash
+# Install PyTorch with CUDA 12.8
+pip install torch --index-url https://download.pytorch.org/whl/cu128
+
+# Install auto-gptq
+pip install "auto-gptq>=0.7.1" --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu128/
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Run
+python run.py
+```
+
+The server starts on port 8000.
+
+## Project Structure
+
+```
+server/
+  main.py       — FastAPI app, WebSocket endpoint
+  models.py     — Model loading and management
+  pipeline.py   — VAD -> ASR -> LLM -> TTS orchestration
+  vad.py        — Silero VAD (ONNX) streaming wrapper
+  asr.py        — Speech recognition engine
+  llm.py        — Language model engine
+  tts.py        — Kokoro TTS engine
+  audio_utils.py — PCM/float32 conversion helpers
+static/         — Browser UI (HTML/JS/CSS)
+```