TX-VOICE: Local-First Voice for TARX

Private voice AI that runs entirely on your machine. No cloud. No latency. No data leaving your device.

Overview

TX-VOICE is TARX's voice model optimized for local execution. Built on the Moshi architecture and quantized to Q8 for efficient inference on consumer hardware.

Spec	Value
Size	7.2 GB
Format	GGUF Q8
Architecture	Moshi (Kyutai)
Min RAM	16 GB
Recommended	32 GB+ RAM, Apple Silicon or NVIDIA GPU

Features

Real-time voice synthesis - Low-latency text-to-speech
100% local - No internet required after download
Privacy-first - Your voice data never leaves your machine
Hardware-optimized - Leverages Metal (macOS) or CUDA (NVIDIA)

Quick Start

With TARX Workbench

TX-VOICE is automatically configured in TARX Workbench. Just enable voice in settings.

Standalone Usage

# Clone the tarx-voice service
git clone https://github.com/tarx-ai/tarx-voice
cd tarx-voice

# Build and run
cargo build --release
./target/release/tarx-voice --model TX-VOICE

WebSocket API

TX-VOICE runs on port 11438 by default:

const ws = new WebSocket('ws://localhost:11438');
ws.send(JSON.stringify({ text: "Hello, world!" }));

Hardware Requirements

Hardware	Performance
Apple M1/M2/M3/M4	Excellent (Metal acceleration)
NVIDIA RTX 30/40	Excellent (CUDA acceleration)
Intel/AMD CPU	Good (AVX2 optimized)

Integration

TX-VOICE integrates with:

TARX Workbench - Desktop AI assistant
TARX Code-OSS - VS Code extension
Custom apps - WebSocket API

Architecture

Based on Kyutai's Moshi model, optimized for local inference:

Streaming audio generation
Low memory footprint via Q8 quantization
Rust-native inference via Candle

License

Apache 2.0 - Free for personal and commercial use.

Duplicated from Tarxxxxxx/TX-VOICE

slojoe4real
/

TX-VOICE