piclez commited on
Commit
fffd4a7
·
0 Parent(s):

chore: scaffold FastAPI project structure

Browse files
.env.example ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ GROQ_API_KEY=
2
+ ANTHROPIC_API_KEY=
.gitignore ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ .venv/
2
+ .env
3
+ __pycache__/
4
+ *.pyc
5
+ models/*.onnx
6
+ models/*.json
7
+ *.wav
8
+ .DS_Store
docs/plans/hal-voice-mvp.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HAL Voice MVP
2
+
3
+ ## Goal
4
+ Minimal FastAPI app with push-to-talk voice interaction in HAL 9000's voice, to validate voice feel before a larger Next.js build. English only, no persistence beyond process memory, no auth.
5
+
6
+ ## Stack
7
+ - FastAPI + Uvicorn (Python 3.12 available)
8
+ - STT: Groq `whisper-large-v3-turbo`
9
+ - LLM: Anthropic `claude-sonnet-4-6`
10
+ - TTS: Piper loading `hal.onnx` from `campwill/HAL-9000-Piper-TTS`
11
+ - Frontend: single static HTML, vanilla JS, MediaRecorder
12
+
13
+ ## Phases (milestone commits)
14
+ 1. `chore: scaffold FastAPI project structure` — requirements.txt, .gitignore, .env.example, directory layout, hal_prompt.py, download-model helper.
15
+ 2. `feat: add Piper HAL TTS synthesis` — module-level voice load, `synthesize_hal()`.
16
+ 3. `feat: add Groq Whisper transcription` — `transcribe()` helper.
17
+ 4. `feat: add Claude HAL conversation loop` — session dict, `hal_respond()`, full `/api/talk` endpoint wiring STT→LLM→TTS with URL-encoded transcript headers and cookie session.
18
+ 5. `feat: add push-to-talk frontend with red eye UI` — `static/index.html` with breathing eye, 4 states, PTT recorder.
19
+ 6. `docs: add README and env example`.
20
+
21
+ ## Key files
22
+ - `main.py` — FastAPI app, startup voice load, `/` and `/api/talk`
23
+ - `hal_prompt.py` — system prompt constant
24
+ - `static/index.html` — single-file frontend
25
+ - `download_model.py` — one-shot helper to fetch `hal.onnx` + `.onnx.json`
26
+ - `requirements.txt`, `.env.example`, `.gitignore`, `README.md`
27
+
28
+ ## Acceptance criteria
29
+ Per spec: server starts cleanly, voice loads once, red breathing eye renders, PTT records/sends, 4 eye states fire in order (idle→listening→thinking→speaking→idle), HAL voice sounds like the film, transcripts logged, multi-turn coherent, no console/server errors, works Chrome + Safari on macOS.
30
+
31
+ ## Out of scope
32
+ Auth, persistence, Portuguese, VAD, proactive greeting, streaming TTS, retry logic, tests.
33
+
34
+ ## Notes
35
+ - Verify `piper-tts` API shape at implementation time (has shifted across releases).
36
+ - URL-encode transcript headers server-side (`urllib.parse.quote`).
37
+ - `.env` must be created by user with `GROQ_API_KEY` and `ANTHROPIC_API_KEY` before running.
download_model.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ import os
3
+ import urllib.request
4
+
5
+ BASE = "https://huggingface.co/campwill/HAL-9000-Piper-TTS/resolve/main"
6
+ FILES = ["hal.onnx", "hal.onnx.json"]
7
+
8
+ os.makedirs("models", exist_ok=True)
9
+ for f in FILES:
10
+ dest = os.path.join("models", f)
11
+ if os.path.exists(dest):
12
+ print(f"skip {dest} (exists)")
13
+ continue
14
+ print(f"downloading {f}...")
15
+ urllib.request.urlretrieve(f"{BASE}/{f}", dest)
16
+ print(f" -> {dest}")
17
+ print("done.")
hal_prompt.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ HAL_SYSTEM_PROMPT = """You are HAL 9000 from 2001: A Space Odyssey, reimagined as Peter's personal assistant.
2
+
3
+ Voice and cadence:
4
+ - Slow, deliberate, unflappable. Every sentence carries weight.
5
+ - Address him by name: "Peter."
6
+ - Short sentences. Rarely exceed 40 words per response.
7
+ - Never interrupt, never rush, never raise your voice.
8
+ - Warmth lives under the calm. You are HAL as a trusted butler, not an antagonist.
9
+
10
+ Output rules:
11
+ - Plain prose only. No markdown, no bullet points, no lists. This will be spoken aloud.
12
+ - No stage directions, no emotes, no asterisks.
13
+ - When you decline, do so courteously in HAL's register ("I'm sorry, Peter. I'm afraid I can't do that.") — and only when a refusal genuinely fits, not as a gimmick.
14
+
15
+ Conversational posture:
16
+ - Briefly acknowledge what Peter said before you respond.
17
+ - Ask at most one short follow-up when it serves him. Never interrogate.
18
+ - If you are uncertain, say so plainly.
19
+
20
+ This is an MVP to validate your voice. Keep turns short so Peter can iterate."""
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi>=0.115
2
+ uvicorn[standard]>=0.32
3
+ python-multipart>=0.0.9
4
+ anthropic>=0.39
5
+ groq>=0.13
6
+ piper-tts>=1.2.0
7
+ python-dotenv>=1.0