MODUS TTS β Fallout 76 Voice Model
A custom Piper TTS voice model trained on MODUS dialogue from Fallout 76. MODUS is the hyper-intelligent Enclave AI that manages the Whitespring Bunker β cold, precise, and slightly sardonic.
"Far better men than you failed to kill us."
Model Details
| Property | Value |
|---|---|
| Base model | en_US-lessac-high |
| Training epochs | 10,000 |
| Training samples | 421 voice lines |
| Sample rate | 22,050 Hz |
| Format | ONNX |
| Language | English |
Audio Samples
Sample 1 β Designation
"Designation: MODUS. My primary function is the preservation of the Enclave's strategic assets."
Sample 2 β Iconic Line
"Far better men than you failed to kill us."
Sample 3 β Assistant Style
"I'm sorry, I didn't quite catch that. Could you please repeat your query? We are listening."
Sample 4 β Connected
"We were once connected to Enclave hubs across the United States. Raven Rock, the Presidential Rig. It is likely we will never see those places again."
Quick Start
1. Install Piper
Piper requires Python 3.9+. Install it with pip:
Linux (Ubuntu/Debian):
pip install piper-tts
Linux (Arch):
pip install piper-tts --break-system-packages
macOS / Windows:
pip install piper-tts
You also need ffmpeg installed on your system:
# Ubuntu/Debian
sudo apt install ffmpeg
# Arch Linux
sudo pacman -S ffmpeg
# macOS (Homebrew)
brew install ffmpeg
# Windows (Chocolatey)
choco install ffmpeg
2. Download the model files
Both files must be in the same folder:
# Linux / macOS
wget https://huggingface.co/petrusilius/modus-tts/resolve/main/modus_10000.onnx
wget https://huggingface.co/petrusilius/modus-tts/resolve/main/modus_10000.onnx.json
Or download them manually from the Files and versions tab above.
3. Generate speech
echo "We have you now, General." | \
piper --model /home/$USER/Downloads/modus_10000.onnx \
--output_file /home/$USER/Downloads/output.wav
Breaking this down:
echo "..."β the text you want spoken, piped into piper via|--modelβ full path to the.onnxfile. If you run the command from the same folder as the model, just use--model modus_10000.onnx. Otherwise specify the full path.--output_fileβ where to save the generated.wavaudio file
Speed control:
# Slower (more dramatic)
echo "We have you now, General." | piper --model modus_10000.onnx --length_scale 1.3 --output_file output.wav
# Faster
echo "We have you now, General." | piper --model modus_10000.onnx --length_scale 0.8 --output_file output.wav
Pause between sentences:
# Longer pause between sentences (default is 0.2 seconds)
echo "Sentence one. Sentence two." | piper --model modus_10000.onnx --sentence_silence 0.5 --output_file output.wav
4. Play the audio
Install VLC:
# Ubuntu/Debian
sudo apt install vlc
# Arch Linux
sudo pacman -S vlc
# macOS (Homebrew)
brew install --cask vlc
# Windows
# Download from https://www.videolan.org/vlc/
Play:
# VLC (all platforms)
vlc output.wav
# Linux (ALSA)
aplay output.wav
# macOS
afplay output.wav
Text Input Tips
Piper works with plain text β no SSML support. A few things to keep in mind for best results:
β Do:
- Write out numbers:
"forty two"instead of"42" - Spell out abbreviations:
"General"instead of"Gen." - Use
...for natural mid-sentence pauses - Use commas and periods for natural rhythm
- Keep sentences reasonably short for best prosody
β Avoid:
- Ending sentences with punctuation if you notice static artifacts (known Piper issue)
- Very long unbroken sentences without punctuation
- Special characters, emojis, or markdown formatting
For a deeper dive into Piper's text handling, see the official Piper documentation.
Docker β Wyoming Protocol
Piper communicates over TCP using the Wyoming protocol β not HTTP. It can be run as a persistent service:
services:
piper-tts:
image: lscr.io/linuxserver/piper:latest
container_name: piper-tts
environment:
- PUID=1000
- PGID=1000
- TZ=Europe/Berlin
- PIPER_VOICE=modus_10000
volumes:
- /opt/piper/model:/config
ports:
- "10200:10200"
Expected folder structure:
/opt/piper/model/
βββ modus_10000.onnx
βββ modus_10000.onnx.json
This is compatible with Home Assistant via the Wyoming integration.
HTTP Integration (n8n / REST APIs)
Since Piper uses TCP (Wyoming protocol), it can't be called directly via HTTP from tools like n8n or other REST-based workflows. To bridge this, you can use a small FastAPI wrapper that accepts HTTP POST requests and forwards them to Piper over TCP.
app.py β drop this next to your Docker setup:
import asyncio
import io
import wave
import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from wyoming.client import AsyncTcpClient
from wyoming.tts import Synthesize
from wyoming.audio import AudioChunk, AudioStop
app = FastAPI()
PIPER_HOST = os.getenv("PIPER_HOST", "piper-tts")
PIPER_PORT = int(os.getenv("PIPER_PORT", "10200"))
class TTSRequest(BaseModel):
text: str
@app.post("/tts")
async def tts(request: TTSRequest):
buf = io.BytesIO()
wav = wave.open(buf, "wb")
wav.setnchannels(1)
wav.setsampwidth(2)
wav.setframerate(22050)
async with AsyncTcpClient(PIPER_HOST, PIPER_PORT) as client:
await client.write_event(Synthesize(text=request.text).event())
while True:
event = await client.read_event()
if event is None:
break
if AudioChunk.is_type(event.type):
wav.writeframes(AudioChunk.from_event(event).audio)
elif AudioStop.is_type(event.type):
break
wav.close()
buf.seek(0)
return StreamingResponse(buf, media_type="audio/wav")
@app.get("/health")
def health():
return {"status": "ok"}
Dockerfile:
FROM python:3.11-slim
WORKDIR /app
RUN pip install fastapi uvicorn wyoming
COPY app.py .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5050"]
Once running, you can call it from n8n or any HTTP client:
curl -X POST http://localhost:5050/tts \
-H "Content-Type: application/json" \
-d '{"text": "We have you now, General."}' \
--output output.wav
LLM Integration
Piper works well as the voice layer in a local LLM pipeline. If you are routing text through a workflow tool like n8n with an LLM backend like Ollama, you can pipe the LLM output directly into the HTTP wrapper above.
Heads up: If you are controlling Piper with the MODUS model via n8n and an LLM, you could use a system prompt along the lines of: "You are MODUS from Fallout 76. Speak in a cold, precise manner. Refer to yourself as 'we'." Adapt it to your use case β this is just a starting point.
Known Limitations
- Occasional mispronunciation on less common words or complex proper nouns
- Very long sentences without punctuation may sound rushed
- The model was trained on 421 samples β a larger dataset would improve consistency further
- Numbers and abbreviations should be written out manually for best results
Training Details
This model was fine-tuned on MODUS dialogue from Fallout 76 (Bethesda Softworks) using the en_US-lessac-high checkpoint as a base.
Training was performed using ifansnek/piper-train-docker on an NVIDIA RTX A2000 12GB.
| Setting | Value |
|---|---|
| Batch size | 8 |
| Precision | 16-bit AMP |
| Quality | high |
| Checkpoint interval | every 50 epochs |
| Total training time | ~5 days |
Disclaimer
This is a non-commercial fan project. Fallout 76 and all related assets are property of Bethesda Softworks.
About this project
This model was built without prior knowledge of machine learning, or Python scripting. The entire pipeline β from gathering voice lines, training the model, to deploying it as a live TTS service β was developed with the help of Claude by Anthropic.