HuPER: A Human-Inspired Framework for Phonetic Perception
Paper
•
2602.01634
•
Published
This repo releases the HuPER Corrector model checkpoints and inference code.
Given (1) a canonical phoneme sequence (ARPAbet) and (2) discrete audio tokens, the model predicts edit operations (KEEP/DEL/SUB:PHN) and optional insertions to better match realized phones.
model.safetensors: model weightshparams.json: training hyper-parameters saved from Lightningedit_seq_speech/: inference + model definitionedit_seq_speech/config/vocab.json: op/insert mappingsimport os
from huggingface_hub import snapshot_download
repo_dir = snapshot_download("huper29/huper_corrector")
# Make sure python can import the package
import sys
sys.path.append(repo_dir)
from edit_seq_speech.inference import PhonemeCorrectionInference
ckpt_path = os.path.join(repo_dir, "model.safetensors") # or a .ckpt if you uploaded it
vocab_path = os.path.join(repo_dir, "edit_seq_speech/config/vocab.json")
infer = PhonemeCorrectionInference(
checkpoint_path=ckpt_path,
vocab_path=vocab_path,
)
wav_path = "your.wav"
text = "AY R OW T AH L EH T ER" # phonemized input
final_phns, log = infer.predict(wav_path, text)
print(final_phns)
Audio tokenization must match training (see code and provided artifacts).
Input phoneme format: ARPAbet tokens separated by spaces.
If you use this model, please cite:
@article{guo2026huper,
title = {HuPER: A Human-Inspired Framework for Phonetic Perception},
author = {Guo, Chenxu and Lian, Jiachen and Liu, Yisi and Huang, Baihe and Narayanan, Shriyaa and Cho, Cheol Jun and Anumanchipalli, Gopala},
journal = {arXiv preprint arXiv:2602.01634},
year = {2026}
}