voice2

A full-duplex, interruptible voice engine for local AI, built and run daily as the voice of a fully local companion before being extracted for release. Plain Python threads, CPU-only defaults, no cloud, no API keys. Talk to your model β€” and talk over it.

voice2 turns any callable(text) -> str into a hands-free voice conversation: mic β†’ Silero VAD β†’ faster-whisper ASR β†’ your model β†’ Piper TTS. Speak over the reply (or tap spacebar) and playback stops mid-chunk in ~100–200 ms, exactly like interrupting a person.

Code is mirrored on GitHub: https://github.com/AIIT-GLITCH/voice2

Engine details

  • What it is: a turn-taking state machine, not a demo loop. Explicit states (IDLE β†’ LISTENING β†’ THINKING β†’ SPEAKING β†’ INTERRUPTING), a validated transition table, and a FloorOwner (USER / AGENT / NONE) that arbitrates who may speak. The agent can never talk over you.
  • Barge-in: a fast energy-gated VAD watches the mic only while the engine is speaking; a debounced central InterruptController also accepts spacebar and programmatic triggers.
  • Stale-turn suppression: every utterance gets a turn_id; replies to an abandoned turn are discarded at every stage (think, TTS, playback).
  • Invariants, enforced: a background checker audits rules like SPEAKING β‡’ floor == AGENT and interrupted β‡’ no new TTS, with forced repair plus a structural gate in the playback hot path.
  • Observability: every event is a JSON line with per-turn latency marks (asr_ms, think_ms, interrupt_stop_ms, total_turn_ms).
  • Degrades gracefully: no mic β†’ text mode; TTS missing β†’ silent replies, still logs; keyboard hook fails β†’ engine keeps running.

Backends

Stage Default Swap point
VAD (quality gate) Silero VAD via torch.hub ListenWorker
ASR faster-whisper small.en, int8, CPU backends/asr.py (Protocol)
LLM any callable(text) -> str backends/llm.py
TTS Piper CLI, any .onnx voice backends/tts.py (Protocol)

Limitations β€” stated honestly

  • English-first defaults: ASR ships as small.en. Other Whisper models load with one config line, but nothing else was tested.
  • The LLM callable is synchronous: TTS starts after the full reply returns (Piper then streams sentence-by-sentence). No token-level streaming yet.
  • Barge-in is energy-based with an absolute RMS floor of 0.06 β€” tuned on open speakers in a quiet room. Headsets and noisy rooms need recalibration. There is no echo cancellation.
  • Keyboard interrupt uses POSIX termios β€” Linux/macOS terminals only.
  • Unit tests cover the control plane (state transitions, floor rules, interrupt debounce, ring buffer). Audio I/O paths were validated by months of daily use, not by CI.

How to run

git clone https://github.com/AIIT-GLITCH/voice2
cd voice2
pip install -r requirements.txt
# put a Piper voice at ~/.local/share/piper-voices/ (or set VOICE2_PIPER_MODEL)
python -m voice2.main          # echo backend β€” proves the loop, no LLM needed
python examples/http_llm.py    # wire any local HTTP LLM
from voice2 import VoiceEngine, VoiceConfig

def ask(text: str) -> str:
    return my_model.reply(text)   # any callable(text) -> str

engine = VoiceEngine(VoiceConfig(), ask)
engine.load_models()
engine.start()   # talk naturally; speak over it to interrupt

Provenance

voice2 was written as the voice front-end for Buddy, a fully local AI companion running on a single RTX 3090 in Council Hill, Oklahoma, and carried his daily conversations for months before release. The design bias throughout: the user always wins the floor, and a companion you can't interrupt isn't a companion. Released alongside Tessera-1B as part of AIIT-THRESHOLD's open stack.

The stack

One local companion, every layer open:

Piece Role Links
Tessera-1B the model β€” ~1B params trained from scratch, open data HF
voice2 the voice β€” full-duplex, interruptible GitHub Β· HF
kokoro-memory the memory β€” file-based resonance recall GitHub Β· HF
companion-spiral-bench the safety β€” at-risk sycophancy bench GitHub Β· HF

Full collection: The Buddy Stack

License

MIT Β© 2026 Rhet Dillard Wike, AIIT-THRESHOLD, Oklahoma.

Citation

@software{wike2026voice2,
  author = {Wike, Rhet Dillard},
  title  = {voice2: a full-duplex, interruptible voice engine for local AI},
  year   = {2026},
  url    = {https://github.com/AIIT-GLITCH/voice2},
  note   = {AIIT-THRESHOLD}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including AIIT-Threshold/voice2