Commit ·
fffd4a7
0
Parent(s):
chore: scaffold FastAPI project structure
Browse files- .env.example +2 -0
- .gitignore +8 -0
- docs/plans/hal-voice-mvp.md +37 -0
- download_model.py +17 -0
- hal_prompt.py +20 -0
- requirements.txt +7 -0
.env.example
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
GROQ_API_KEY=
|
| 2 |
+
ANTHROPIC_API_KEY=
|
.gitignore
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.venv/
|
| 2 |
+
.env
|
| 3 |
+
__pycache__/
|
| 4 |
+
*.pyc
|
| 5 |
+
models/*.onnx
|
| 6 |
+
models/*.json
|
| 7 |
+
*.wav
|
| 8 |
+
.DS_Store
|
docs/plans/hal-voice-mvp.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HAL Voice MVP
|
| 2 |
+
|
| 3 |
+
## Goal
|
| 4 |
+
Minimal FastAPI app with push-to-talk voice interaction in HAL 9000's voice, to validate voice feel before a larger Next.js build. English only, no persistence beyond process memory, no auth.
|
| 5 |
+
|
| 6 |
+
## Stack
|
| 7 |
+
- FastAPI + Uvicorn (Python 3.12 available)
|
| 8 |
+
- STT: Groq `whisper-large-v3-turbo`
|
| 9 |
+
- LLM: Anthropic `claude-sonnet-4-6`
|
| 10 |
+
- TTS: Piper loading `hal.onnx` from `campwill/HAL-9000-Piper-TTS`
|
| 11 |
+
- Frontend: single static HTML, vanilla JS, MediaRecorder
|
| 12 |
+
|
| 13 |
+
## Phases (milestone commits)
|
| 14 |
+
1. `chore: scaffold FastAPI project structure` — requirements.txt, .gitignore, .env.example, directory layout, hal_prompt.py, download-model helper.
|
| 15 |
+
2. `feat: add Piper HAL TTS synthesis` — module-level voice load, `synthesize_hal()`.
|
| 16 |
+
3. `feat: add Groq Whisper transcription` — `transcribe()` helper.
|
| 17 |
+
4. `feat: add Claude HAL conversation loop` — session dict, `hal_respond()`, full `/api/talk` endpoint wiring STT→LLM→TTS with URL-encoded transcript headers and cookie session.
|
| 18 |
+
5. `feat: add push-to-talk frontend with red eye UI` — `static/index.html` with breathing eye, 4 states, PTT recorder.
|
| 19 |
+
6. `docs: add README and env example`.
|
| 20 |
+
|
| 21 |
+
## Key files
|
| 22 |
+
- `main.py` — FastAPI app, startup voice load, `/` and `/api/talk`
|
| 23 |
+
- `hal_prompt.py` — system prompt constant
|
| 24 |
+
- `static/index.html` — single-file frontend
|
| 25 |
+
- `download_model.py` — one-shot helper to fetch `hal.onnx` + `.onnx.json`
|
| 26 |
+
- `requirements.txt`, `.env.example`, `.gitignore`, `README.md`
|
| 27 |
+
|
| 28 |
+
## Acceptance criteria
|
| 29 |
+
Per spec: server starts cleanly, voice loads once, red breathing eye renders, PTT records/sends, 4 eye states fire in order (idle→listening→thinking→speaking→idle), HAL voice sounds like the film, transcripts logged, multi-turn coherent, no console/server errors, works Chrome + Safari on macOS.
|
| 30 |
+
|
| 31 |
+
## Out of scope
|
| 32 |
+
Auth, persistence, Portuguese, VAD, proactive greeting, streaming TTS, retry logic, tests.
|
| 33 |
+
|
| 34 |
+
## Notes
|
| 35 |
+
- Verify `piper-tts` API shape at implementation time (has shifted across releases).
|
| 36 |
+
- URL-encode transcript headers server-side (`urllib.parse.quote`).
|
| 37 |
+
- `.env` must be created by user with `GROQ_API_KEY` and `ANTHROPIC_API_KEY` before running.
|
download_model.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
import os
|
| 3 |
+
import urllib.request
|
| 4 |
+
|
| 5 |
+
BASE = "https://huggingface.co/campwill/HAL-9000-Piper-TTS/resolve/main"
|
| 6 |
+
FILES = ["hal.onnx", "hal.onnx.json"]
|
| 7 |
+
|
| 8 |
+
os.makedirs("models", exist_ok=True)
|
| 9 |
+
for f in FILES:
|
| 10 |
+
dest = os.path.join("models", f)
|
| 11 |
+
if os.path.exists(dest):
|
| 12 |
+
print(f"skip {dest} (exists)")
|
| 13 |
+
continue
|
| 14 |
+
print(f"downloading {f}...")
|
| 15 |
+
urllib.request.urlretrieve(f"{BASE}/{f}", dest)
|
| 16 |
+
print(f" -> {dest}")
|
| 17 |
+
print("done.")
|
hal_prompt.py
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
HAL_SYSTEM_PROMPT = """You are HAL 9000 from 2001: A Space Odyssey, reimagined as Peter's personal assistant.
|
| 2 |
+
|
| 3 |
+
Voice and cadence:
|
| 4 |
+
- Slow, deliberate, unflappable. Every sentence carries weight.
|
| 5 |
+
- Address him by name: "Peter."
|
| 6 |
+
- Short sentences. Rarely exceed 40 words per response.
|
| 7 |
+
- Never interrupt, never rush, never raise your voice.
|
| 8 |
+
- Warmth lives under the calm. You are HAL as a trusted butler, not an antagonist.
|
| 9 |
+
|
| 10 |
+
Output rules:
|
| 11 |
+
- Plain prose only. No markdown, no bullet points, no lists. This will be spoken aloud.
|
| 12 |
+
- No stage directions, no emotes, no asterisks.
|
| 13 |
+
- When you decline, do so courteously in HAL's register ("I'm sorry, Peter. I'm afraid I can't do that.") — and only when a refusal genuinely fits, not as a gimmick.
|
| 14 |
+
|
| 15 |
+
Conversational posture:
|
| 16 |
+
- Briefly acknowledge what Peter said before you respond.
|
| 17 |
+
- Ask at most one short follow-up when it serves him. Never interrogate.
|
| 18 |
+
- If you are uncertain, say so plainly.
|
| 19 |
+
|
| 20 |
+
This is an MVP to validate your voice. Keep turns short so Peter can iterate."""
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi>=0.115
|
| 2 |
+
uvicorn[standard]>=0.32
|
| 3 |
+
python-multipart>=0.0.9
|
| 4 |
+
anthropic>=0.39
|
| 5 |
+
groq>=0.13
|
| 6 |
+
piper-tts>=1.2.0
|
| 7 |
+
python-dotenv>=1.0
|