HuPER Corrector (Phoneme Corrector)

This repo releases the HuPER Corrector model checkpoints and inference code.

What it does

Given (1) a canonical phoneme sequence (ARPAbet) and (2) discrete audio tokens, the model predicts edit operations (KEEP/DEL/SUB:PHN) and optional insertions to better match realized phones.

Files

  • model.safetensors: model weights
  • hparams.json: training hyper-parameters saved from Lightning
  • edit_seq_speech/: inference + model definition
  • edit_seq_speech/config/vocab.json: op/insert mappings

Quickstart

import os
from huggingface_hub import snapshot_download

repo_dir = snapshot_download("huper29/huper_corrector")

# Make sure python can import the package
import sys
sys.path.append(repo_dir)

from edit_seq_speech.inference import PhonemeCorrectionInference

ckpt_path = os.path.join(repo_dir, "model.safetensors")   # or a .ckpt if you uploaded it
vocab_path = os.path.join(repo_dir, "edit_seq_speech/config/vocab.json")

infer = PhonemeCorrectionInference(
    checkpoint_path=ckpt_path,
    vocab_path=vocab_path,
)

wav_path = "your.wav"
text = "AY R OW T AH L EH T ER"  # phonemized input
final_phns, log = infer.predict(wav_path, text)
print(final_phns)

Notes / Limitations

  • Audio tokenization must match training (see code and provided artifacts).

  • Input phoneme format: ARPAbet tokens separated by spaces.

Citation

If you use this model, please cite:

@article{guo2026huper,
  title   = {HuPER: A Human-Inspired Framework for Phonetic Perception},
  author  = {Guo, Chenxu and Lian, Jiachen and Liu, Yisi and Huang, Baihe and Narayanan, Shriyaa and Cho, Cheol Jun and Anumanchipalli, Gopala},
  journal = {arXiv preprint arXiv:2602.01634},
  year    = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for huper29/huper_corrector