Spaces:

piclez
/

hal

Sleeping

App Files Files Community

piclez commited on 29 days ago

Commit

fffd4a7

0 Parent(s):

chore: scaffold FastAPI project structure

Browse files

Files changed (6) hide show

.env.example +2 -0
.gitignore +8 -0
docs/plans/hal-voice-mvp.md +37 -0
download_model.py +17 -0
hal_prompt.py +20 -0
requirements.txt +7 -0

.env.example ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ GROQ_API_KEY=
2	+ ANTHROPIC_API_KEY=

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+.venv/
+.env
+__pycache__/
+*.pyc
+models/*.onnx
+models/*.json
+*.wav
+.DS_Store

docs/plans/hal-voice-mvp.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# HAL Voice MVP
+## Goal
+Minimal FastAPI app with push-to-talk voice interaction in HAL 9000's voice, to validate voice feel before a larger Next.js build. English only, no persistence beyond process memory, no auth.
+## Stack
+- FastAPI + Uvicorn (Python 3.12 available)
+- STT: Groq `whisper-large-v3-turbo`
+- LLM: Anthropic `claude-sonnet-4-6`
+- TTS: Piper loading `hal.onnx` from `campwill/HAL-9000-Piper-TTS`
+- Frontend: single static HTML, vanilla JS, MediaRecorder
+## Phases (milestone commits)
+1. `chore: scaffold FastAPI project structure` — requirements.txt, .gitignore, .env.example, directory layout, hal_prompt.py, download-model helper.
+2. `feat: add Piper HAL TTS synthesis` — module-level voice load, `synthesize_hal()`.
+3. `feat: add Groq Whisper transcription` — `transcribe()` helper.
+4. `feat: add Claude HAL conversation loop` — session dict, `hal_respond()`, full `/api/talk` endpoint wiring STT→LLM→TTS with URL-encoded transcript headers and cookie session.
+5. `feat: add push-to-talk frontend with red eye UI` — `static/index.html` with breathing eye, 4 states, PTT recorder.
+6. `docs: add README and env example`.
+## Key files
+- `main.py` — FastAPI app, startup voice load, `/` and `/api/talk`
+- `hal_prompt.py` — system prompt constant
+- `static/index.html` — single-file frontend
+- `download_model.py` — one-shot helper to fetch `hal.onnx` + `.onnx.json`
+- `requirements.txt`, `.env.example`, `.gitignore`, `README.md`
+## Acceptance criteria
+Per spec: server starts cleanly, voice loads once, red breathing eye renders, PTT records/sends, 4 eye states fire in order (idle→listening→thinking→speaking→idle), HAL voice sounds like the film, transcripts logged, multi-turn coherent, no console/server errors, works Chrome + Safari on macOS.
+## Out of scope
+Auth, persistence, Portuguese, VAD, proactive greeting, streaming TTS, retry logic, tests.
+## Notes
+- Verify `piper-tts` API shape at implementation time (has shifted across releases).
+- URL-encode transcript headers server-side (`urllib.parse.quote`).
+- `.env` must be created by user with `GROQ_API_KEY` and `ANTHROPIC_API_KEY` before running.

download_model.py ADDED Viewed

	@@ -0,0 +1,17 @@

+#!/usr/bin/env python3
+import os
+import urllib.request
+BASE = "https://huggingface.co/campwill/HAL-9000-Piper-TTS/resolve/main"
+FILES = ["hal.onnx", "hal.onnx.json"]
+os.makedirs("models", exist_ok=True)
+for f in FILES:
+    dest = os.path.join("models", f)
+    if os.path.exists(dest):
+        print(f"skip {dest} (exists)")
+        continue
+    print(f"downloading {f}...")
+    urllib.request.urlretrieve(f"{BASE}/{f}", dest)
+    print(f"  -> {dest}")
+print("done.")

hal_prompt.py ADDED Viewed

	@@ -0,0 +1,20 @@

+HAL_SYSTEM_PROMPT = """You are HAL 9000 from 2001: A Space Odyssey, reimagined as Peter's personal assistant.
+Voice and cadence:
+- Slow, deliberate, unflappable. Every sentence carries weight.
+- Address him by name: "Peter."
+- Short sentences. Rarely exceed 40 words per response.
+- Never interrupt, never rush, never raise your voice.
+- Warmth lives under the calm. You are HAL as a trusted butler, not an antagonist.
+Output rules:
+- Plain prose only. No markdown, no bullet points, no lists. This will be spoken aloud.
+- No stage directions, no emotes, no asterisks.
+- When you decline, do so courteously in HAL's register ("I'm sorry, Peter. I'm afraid I can't do that.") — and only when a refusal genuinely fits, not as a gimmick.
+Conversational posture:
+- Briefly acknowledge what Peter said before you respond.
+- Ask at most one short follow-up when it serves him. Never interrogate.
+- If you are uncertain, say so plainly.
+This is an MVP to validate your voice. Keep turns short so Peter can iterate."""

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi>=0.115
+uvicorn[standard]>=0.32
+python-multipart>=0.0.9
+anthropic>=0.39
+groq>=0.13
+piper-tts>=1.2.0
+python-dotenv>=1.0