CredResolve Voice (Swar)
CredResolve Voice is a production-style multilingual TTS application with reusable voice cloning, voice versioning, and a polished web interface.
This repository includes: cr
- a CredResolve-branded web app
- live HTTP audio streaming
- live WebSocket audio streaming
- Prometheus metrics for HTTP, WebSocket, and TTS lifecycle monitoring
- a ready-to-run Grafana and Prometheus monitoring stack
- reusable public voice IDs such as
voice_cr_a91f0c2d - voice version management under the same voice ID
- a developer-facing Integrate page
- a FastAPI backend with typed request handling
Product Overview
CredResolve Voice is built as a full application layer for real-world voice operations. The app is designed for product teams, internal tools, and API-based integrations that need:
- text-to-speech generation
- saved voice cloning
- versioned voice updates
- playback and download
- developer-friendly streaming endpoints
UI Included
The current UI includes these sections:
Studio- text input
- language selection
- auto, cloned, and design voice modes
- male and female voice selection
- sample rate selection for
24000 Hzand8000 Hz - diffusion controls
- expression tag shortcuts such as
[laughter] - streamed audio output with metrics
Voice Clone- reference audio upload
- optional reference text
- display name
- preprocessing toggle
- reusable saved voice creation
Voices- saved voice library
- stable public voice IDs
- active version tracking
- create new versions under the same voice ID
- commit a version live for TTS
- delete old versions safely
Integrate- HTTP stream endpoint
- WebSocket stream endpoint
- request field guide
- cURL example
- WebSocket URL and first-message example
Voice Registry
Each cloned voice gets a stable public voice ID.
Example:
voice_cr_a91f0c2d
Each voice can have multiple versions. When a new version is committed:
- the
voice_idstays the same - the active version changes
- future TTS requests use the committed version
This makes it possible to improve or replace a saved voice without breaking integrations that already use the public voice ID.
Streaming Behavior
The application supports two streaming paths:
HTTPPOST /api/v1/tts/stream- sends a WAV stream in bytes
- starts with a WAV header
- then streams audio chunk by chunk as chunks are generated
WebSocketWS /api/v1/tts/stream/ws- sends metadata first
- then sends binary audio frames progressively
- ends with a completion message
API Endpoints
Main app endpoints:
GET /api/v1/capabilitiesGET /api/v1/languagesGET /api/v1/voicesPOST /api/v1/voices/clonePOST /api/v1/voices/{voice_id}/versionsPOST /api/v1/voices/{voice_id}/versions/{version_id}/commitDELETE /api/v1/voices/{voice_id}DELETE /api/v1/voices/{voice_id}/versions/{version_id}POST /api/v1/tts/synthesizePOST /api/v1/tts/streamWS /api/v1/tts/stream/ws
Run Locally
Create or activate your virtual environment, install dependencies, and launch the app:
uv sync
source .venv/bin/activate
python3 -m omnivoice.cli.demo --ip 0.0.0.0 --port 8001
You can also use the console script:
credresolve-voice --ip 0.0.0.0 --port 8001
Common Launch Options
python3 -m omnivoice.cli.demo \
--ip 0.0.0.0 \
--port 8001 \
--device cuda \
--data-dir /home/ubuntu/credresolve_Multi/.credresolve_voice
Useful flags:
--model--device--ip--port--root-path--no-asr--data-dir
Data Storage
By default, application data is stored under .credresolve_voice/.
This includes:
- saved voice assets
- voice registry metadata
- uploaded reference audio
- generated output WAV files
These runtime artifacts are ignored by Git through .gitignore.
Example HTTP Stream Request
curl -X POST "http://127.0.0.1:8001/api/v1/tts/stream" \
-H "Content-Type: application/json" \
--output credresolve-stream.wav \
-d '{
"text": "This is a CredResolve Voice streaming request.",
"language": "auto",
"voice_mode": "auto",
"gender": "female",
"sample_rate": 8000,
"speed": 1.0,
"diffusion_steps": 32,
"guidance_scale": 2.0,
"denoise": true,
"preprocess_prompt": true,
"postprocess_output": true
}'
Example WebSocket Request
Connect to:
ws://127.0.0.1:8001/api/v1/tts/stream/ws
Then send:
{
"text": "Streaming over WebSocket",
"language": "auto",
"voice_mode": "auto",
"gender": "auto",
"sample_rate": 24000
}
Notes
8000 Hzoutput is supported for telephony-style use cases.- Expression tags are supported directly in
text, for example[laughter]. - Saved voices can be selected in Studio or through the API using
voice_id. - Prometheus metrics are exposed at
GET /metrics. - Monitoring setup is documented in
docs/monitoring.md. - The web UI is under
omnivoice/app/templates/andomnivoice/app/static/. - The FastAPI server is defined in
omnivoice/app/server.py. - The main application service layer is in
omnivoice/app/service.py.
Repository Structure
Key files and folders:
omnivoice/app/server.pyomnivoice/app/service.pyomnivoice/app/registry.pyomnivoice/app/generation.pyomnivoice/app/schemas.pyomnivoice/app/templates/index.htmlomnivoice/app/static/app.jsomnivoice/app/static/styles.cssomnivoice/cli/demo.py
Status
This repository is currently set up as the CredResolve Voice application codebase, with branding, UI, streaming APIs, saved voice registry support, and voice version management in place.