You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

CredResolve Voice (Swar)

CredResolve Voice is a production-style multilingual TTS application with reusable voice cloning, voice versioning, and a polished web interface.

This repository includes: cr

a CredResolve-branded web app
live HTTP audio streaming
live WebSocket audio streaming
Prometheus metrics for HTTP, WebSocket, and TTS lifecycle monitoring
a ready-to-run Grafana and Prometheus monitoring stack
reusable public voice IDs such as voice_cr_a91f0c2d
voice version management under the same voice ID
a developer-facing Integrate page
a FastAPI backend with typed request handling

Product Overview

CredResolve Voice is built as a full application layer for real-world voice operations. The app is designed for product teams, internal tools, and API-based integrations that need:

text-to-speech generation
saved voice cloning
versioned voice updates
playback and download
developer-friendly streaming endpoints

UI Included

The current UI includes these sections:

Studio
- text input
- language selection
- auto, cloned, and design voice modes
- male and female voice selection
- sample rate selection for 24000 Hz and 8000 Hz
- diffusion controls
- expression tag shortcuts such as [laughter]
- streamed audio output with metrics
Voice Clone
- reference audio upload
- optional reference text
- display name
- preprocessing toggle
- reusable saved voice creation
Voices
- saved voice library
- stable public voice IDs
- active version tracking
- create new versions under the same voice ID
- commit a version live for TTS
- delete old versions safely
Integrate
- HTTP stream endpoint
- WebSocket stream endpoint
- request field guide
- cURL example
- WebSocket URL and first-message example

Voice Registry

Each cloned voice gets a stable public voice ID.

Example:

voice_cr_a91f0c2d

Each voice can have multiple versions. When a new version is committed:

the voice_id stays the same
the active version changes
future TTS requests use the committed version

This makes it possible to improve or replace a saved voice without breaking integrations that already use the public voice ID.

Streaming Behavior

The application supports two streaming paths:

HTTP
- POST /api/v1/tts/stream
- sends a WAV stream in bytes
- starts with a WAV header
- then streams audio chunk by chunk as chunks are generated
WebSocket
- WS /api/v1/tts/stream/ws
- sends metadata first
- then sends binary audio frames progressively
- ends with a completion message

API Endpoints

Main app endpoints:

GET /api/v1/capabilities
GET /api/v1/languages
GET /api/v1/voices
POST /api/v1/voices/clone
POST /api/v1/voices/{voice_id}/versions
POST /api/v1/voices/{voice_id}/versions/{version_id}/commit
DELETE /api/v1/voices/{voice_id}
DELETE /api/v1/voices/{voice_id}/versions/{version_id}
POST /api/v1/tts/synthesize
POST /api/v1/tts/stream
WS /api/v1/tts/stream/ws

Run Locally

Create or activate your virtual environment, install dependencies, and launch the app:

uv sync
source .venv/bin/activate
python3 -m omnivoice.cli.demo --ip 0.0.0.0 --port 8001

You can also use the console script:

credresolve-voice --ip 0.0.0.0 --port 8001

Common Launch Options

python3 -m omnivoice.cli.demo \
  --ip 0.0.0.0 \
  --port 8001 \
  --device cuda \
  --data-dir /home/ubuntu/credresolve_Multi/.credresolve_voice

Useful flags:

--model
--device
--ip
--port
--root-path
--no-asr
--data-dir

Data Storage

By default, application data is stored under .credresolve_voice/.

This includes:

saved voice assets
voice registry metadata
uploaded reference audio
generated output WAV files

These runtime artifacts are ignored by Git through .gitignore.

Example HTTP Stream Request

curl -X POST "http://127.0.0.1:8001/api/v1/tts/stream" \
  -H "Content-Type: application/json" \
  --output credresolve-stream.wav \
  -d '{
    "text": "This is a CredResolve Voice streaming request.",
    "language": "auto",
    "voice_mode": "auto",
    "gender": "female",
    "sample_rate": 8000,
    "speed": 1.0,
    "diffusion_steps": 32,
    "guidance_scale": 2.0,
    "denoise": true,
    "preprocess_prompt": true,
    "postprocess_output": true
  }'

Example WebSocket Request

Connect to:

ws://127.0.0.1:8001/api/v1/tts/stream/ws

Then send:

{
  "text": "Streaming over WebSocket",
  "language": "auto",
  "voice_mode": "auto",
  "gender": "auto",
  "sample_rate": 24000
}

Notes

8000 Hz output is supported for telephony-style use cases.
Expression tags are supported directly in text, for example [laughter].
Saved voices can be selected in Studio or through the API using voice_id.
Prometheus metrics are exposed at GET /metrics.
Monitoring setup is documented in docs/monitoring.md.
The web UI is under omnivoice/app/templates/ and omnivoice/app/static/.
The FastAPI server is defined in omnivoice/app/server.py.
The main application service layer is in omnivoice/app/service.py.

Repository Structure

Key files and folders:

omnivoice/app/server.py
omnivoice/app/service.py
omnivoice/app/registry.py
omnivoice/app/generation.py
omnivoice/app/schemas.py
omnivoice/app/templates/index.html
omnivoice/app/static/app.js
omnivoice/app/static/styles.css
omnivoice/cli/demo.py

Status

This repository is currently set up as the CredResolve Voice application codebase, with branding, UI, streaming APIs, saved voice registry support, and voice version management in place.

Downloads last month: -; Downloads are not tracked for this model. How to track