You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

OmniVoice BR-PT TTS

OmniVoice BR-PT TTS is a Brazilian Portuguese text-to-speech fine-tune of k2-fsa/OmniVoice. It is intended for natural Brazilian Portuguese speech synthesis, voice-design experiments, and BR-PT TTS research workflows built around OmniVoice.

This repository contains the fine-tuned checkpoints, comparison samples, training metadata, and sample prompts used to compare the base OmniVoice model against the Brazilian Portuguese fine-tune.

Highlights

  • Fine-tuned from k2-fsa/OmniVoice for Brazilian Portuguese speech synthesis.
  • Trained on edwixx/brazilian-portuguese-TTS, a multi-speaker Brazilian Portuguese TTS dataset.
  • Prepared with transcript cleanup for OCR artifacts, old orthography noise, and suspicious text normalization issues.
  • Includes 10 side-by-side audio comparisons: baseline OmniVoice vs this BR-PT checkpoint.
  • Uses OmniVoice's Portuguese language ID (pt) with Brazilian Portuguese specialization coming from the fine-tuning data.

Sample Comparisons

Each pair below uses the same prompt. The first audio is the baseline k2-fsa/OmniVoice; the second is this fine-tuned BR-PT checkpoint at checkpoint-20000.

Pair 01

Prompt: Bom dia. Hoje vamos testar uma voz brasileira clara, natural e fácil de entender.

Baseline OmniVoice:

Open baseline WAV

OmniVoice BR-PT:

Open BR-PT WAV

Pair 02

Prompt: A cidade acordou cedo, com ônibus cheios, padarias abertas e gente conversando na calçada.

Baseline OmniVoice:

Open baseline WAV

OmniVoice BR-PT:

Open BR-PT WAV

Pair 03

Prompt: Ela perguntou se a reunião seria online ou presencial, porque precisava organizar a agenda.

Baseline OmniVoice:

Open baseline WAV

OmniVoice BR-PT:

Open BR-PT WAV

Pair 04

Prompt: No fim da tarde, a chuva passou e o céu ficou laranja por alguns minutos.

Baseline OmniVoice:

Open baseline WAV

OmniVoice BR-PT:

Open BR-PT WAV

Pair 05

Prompt: O professor explicou a diferença entre pronúncia formal e fala cotidiana no português do Brasil.

Baseline OmniVoice:

Open baseline WAV

OmniVoice BR-PT:

Open BR-PT WAV

More sample pairs are available in comparison_samples/baseline_vs_checkpoint_20000.

Model Details

  • Model type: Multilingual text-to-speech / speech synthesis model
  • Base model: k2-fsa/OmniVoice
  • Fine-tuned language focus: Brazilian Portuguese (pt-BR)
  • OmniVoice language code used during inference/training: pt
  • Best/current checkpoint in this repo: checkpoint-20000
  • Training dataset: edwixx/brazilian-portuguese-TTS
  • Architecture family: OmniVoice, with voice cloning and voice design capabilities inherited from the base model

Training Data

The model was fine-tuned on edwixx/brazilian-portuguese-TTS, a Brazilian Portuguese multi-speaker TTS dataset. During preparation, transcripts were normalized and filtered to improve linguistic quality for speech synthesis.

Dataset preparation summary:

  • Train rows: 35,699
  • Dev rows: 360
  • Speakers: 42
  • Filtered suspect rows: 1,024
  • Main filtering targets: OCR artifacts, old orthography residue, malformed punctuation, and text likely to hurt pronunciation modeling

Because the source material contains audiobook/literary speech, users should expect stronger performance on read/narrative Portuguese than on highly spontaneous conversation, slang-heavy speech, code-switching, or domain-specific jargon.

Training Procedure

The model was fine-tuned from k2-fsa/OmniVoice using the OmniVoice training codebase.

Key configuration:

  • Steps: 20,000
  • Save interval: every 1,000 steps
  • Eval interval: every 1,000 steps
  • Precision: bf16
  • Attention implementation: SDPA
  • Batch tokens: 4,096
  • Gradient accumulation steps: 4
  • Language ID: pt
  • Training config: config/brpt/train_config_brpt_sdpa.json
  • Data config: config/brpt/data_config_brpt.json

Validation loss snapshots:

Step Dev loss
17000 3.9982
18000 4.1444
19000 3.9238
20000 4.0745

Training was tracked with Weights & Biases and Trackio. The public Trackio dashboard is available here: edwixx/omnivoice-brpt-trackio.

Intended Uses

This model is intended for:

  • Brazilian Portuguese TTS research
  • Speech synthesis experiments using OmniVoice
  • Comparing base multilingual OmniVoice behavior against a BR-PT fine-tune
  • Prototyping Brazilian Portuguese narration, read speech, and voice-design workflows
  • Further fine-tuning or evaluation by users who understand OmniVoice checkpoints

Limitations and Risks

This is an experimental fine-tuned checkpoint, not a fully audited production voice model.

Known limitations:

  • Brazilian Portuguese quality should be judged with listening tests; validation loss alone is not a perceptual metric.
  • The dataset is multi-speaker and likely audiobook-oriented, so conversational expressiveness may vary.
  • Accent, speaker identity, prosody, and pronunciation can be inconsistent across prompts.
  • The model may mispronounce names, numbers, abbreviations, foreign words, and rare Brazilian regional terms.
  • The base model supports voice cloning and voice design; users are responsible for consent, attribution, and lawful use when cloning or imitating voices.

Bias and ethical considerations:

  • The model can reflect biases present in the source dataset and the base OmniVoice model.
  • Generated voices may sound like real people or demographic groups even when no identity is intended.
  • Do not use the model for impersonation, deception, fraud, harassment, or synthetic media without proper consent and disclosure.

Evaluation

Current evaluation is limited to training/dev loss and manual audio comparison samples. The repo includes 10 baseline-vs-fine-tuned sample pairs for qualitative inspection.

Recommended additional evaluation before production use:

  • Native Brazilian Portuguese MOS listening tests
  • Speaker similarity tests for voice cloning workflows
  • Word error rate or ASR-based intelligibility checks
  • Pronunciation checks for numbers, dates, names, and regional vocabulary
  • Bias and safety review for target deployment contexts

Usage

Install and use OmniVoice from the upstream repository: k2-fsa/OmniVoice.

Example inference:

omnivoice-infer \
  --model edwixx/omnivoice-brpt-tts \
  --text "Bom dia. Esta é uma demonstração em português brasileiro." \
  --language pt \
  --instruct "female, portuguese accent" \
  --output brpt_demo.wav

If loading a specific checkpoint from this repo, point OmniVoice at the desired checkpoint folder after downloading it locally.

Citation and Attribution

Please cite and credit the upstream OmniVoice project and the dataset used for this fine-tune:

License

This model card declares apache-2.0 to match the upstream k2-fsa/OmniVoice model metadata. Users should also review the dataset terms and any applicable rights for the source audio/transcripts before redistribution or commercial use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for edwixx/omnivoice-brpt-tts

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(36)
this model

Dataset used to train edwixx/omnivoice-brpt-tts

Evaluation results