OmniVoice BR-PT v1.5

OmniVoice BR-PT v1.5 is the selected Brazilian Portuguese fine-tune of k2-fsa/OmniVoice. This release uses the v1.5 gold-subset refinement checkpoint selected from checkpoint-9000.

It is optimized for Brazilian Portuguese TTS experiments, voice-design, and voice-cloning workflows using OmniVoice.

Why This Checkpoint

We evaluated base OmniVoice, the first BR-PT run, and v1.5 refine checkpoints on a 20-prompt Brazilian Portuguese ASR intelligibility set. Lower is better.

Model WER CER
base OmniVoice 0.0456 0.0400
v1 checkpoint-19000 0.0456 0.0400
v1.5 checkpoint-5000 0.0400 0.0379
v1.5 checkpoint-8000 0.0463 0.0404
v1.5 checkpoint-9000 0.0400 0.0379
v1.5 checkpoint-10000 0.0471 0.0390

checkpoint-9000 was selected because it tied for the best ASR score while being later in the refinement run than checkpoint-5000.

Samples

Reference audio was provided for sample generation and is included for reproducibility:

Open reference MP3

01 Welcome

Prompt: Bom dia. Esta é uma demonstração em português brasileiro com voz clara e natural.

Open WAV

02 Creator

Prompt: Hoje eu vou mostrar uma novidade rápida, simples e muito fácil de acompanhar.

Open WAV

03 Brazil

Prompt: São Paulo amanheceu com chuva, mas a cidade continuou cheia de energia.

Open WAV

04 Food

Prompt: Eu gostaria de um café sem açúcar e um pão de queijo bem quentinho, por favor.

Open WAV

06 Product

Prompt: Este produto foi feito para criadores que precisam gravar conteúdos todos os dias.

Open WAV

07 Numbers

Prompt: O preço final ficou em cinquenta e três reais, com entrega para amanhã de manhã.

Open WAV

08 Story

Prompt: No fim da tarde, a música começou baixinho e todo mundo ficou em silêncio para ouvir.

Open WAV

Training Details

  • Base model: k2-fsa/OmniVoice
  • Initial v1 checkpoint used for refinement: checkpoint-19000
  • Selected release checkpoint: v1.5 checkpoint-9000
  • Dataset: edwixx/brazilian-portuguese-TTS
  • Gold refine set: 12,014 train rows, 260 dev rows after filtering
  • Speakers in selected gold subset: 40
  • Language ID used for OmniVoice compatibility: pt
  • Locale metadata: pt-BR
  • Instruction used in refine data/samples: portuguese accent
  • Refine LR: 5e-6
  • Refine steps: 10,000
  • Selected checkpoint: 9,000
  • Final refine eval loss at step 10,000: 3.93625

Usage

omnivoice-infer   --model edwixx/omnivoice-brpt-v15   --text "Bom dia. Esta é uma demonstração em português brasileiro."   --language pt   --instruct "portuguese accent"   --output brpt_v15.wav

Voice cloning example:

omnivoice-infer   --model edwixx/omnivoice-brpt-v15   --text "Hoje eu vou mostrar uma novidade rápida e fácil de acompanhar."   --language pt   --instruct "portuguese accent"   --ref_audio reference.wav   --output brpt_clone.wav

Limitations

This is an experimental OmniVoice fine-tune. The model uses OmniVoice's generic Portuguese language code (pt) internally; Brazilian Portuguese behavior comes from the fine-tuning data, the gold subset, and consistent prompting. Human listening is still required to judge accent, naturalness, and voice-cloning quality.

Do not use this model for impersonation, deception, fraud, harassment, or cloning voices without consent.

Files

This repo intentionally excludes optimizer and random-state files to keep the release cleaner. Reports are in reports/, and audio examples are in samples/.

Attribution

Downloads last month
27
Safetensors
Model size
0.6B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for edwixx/omnivoice-brpt-v15

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(36)
this model

Dataset used to train edwixx/omnivoice-brpt-v15

Evaluation results

  • Post-refine ASR WER, 20 prompt set on edwixx/brazilian-portuguese-TTS
    self-reported
    0.040
  • Post-refine ASR CER, 20 prompt set on edwixx/brazilian-portuguese-TTS
    self-reported
    0.038