Release HuPER Corrector weights and inference code

Browse files

Files changed (4) hide show

README.md +69 -0
hparams.json +26 -0
model.safetensors +3 -0
requirements.txt +7 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+tags:
+- speech
+- phoneme
+- pytorch-lightning
+library_name: pytorch
+---
+# HuPER Corrector (Phoneme Corrector)
+This repo releases the HuPER Corrector model checkpoints and inference code.
+## What it does
+Given (1) a canonical phoneme sequence (ARPAbet) and (2) discrete audio tokens, the model predicts edit operations (KEEP/DEL/SUB:PHN) and optional insertions to better match realized phones.
+## Files
+- `model.safetensors`: model weights
+- `hparams.json`: training hyper-parameters saved from Lightning
+- `edit_seq_speech/`: inference + model definition
+- `edit_seq_speech/config/vocab.json`: op/insert mappings
+## Quickstart
+```python
+import os
+from huggingface_hub import snapshot_download
+repo_dir = snapshot_download("huper29/huper_corrector")
+# Make sure python can import the package
+import sys
+sys.path.append(repo_dir)
+from edit_seq_speech.inference import PhonemeCorrectionInference
+ckpt_path = os.path.join(repo_dir, "model.safetensors")   # or a .ckpt if you uploaded it
+vocab_path = os.path.join(repo_dir, "edit_seq_speech/config/vocab.json")
+infer = PhonemeCorrectionInference(
+    checkpoint_path=ckpt_path,
+    vocab_path=vocab_path,
+)
+wav_path = "your.wav"
+text = "AY R OW T AH L EH T ER"  # phonemized input
+final_phns, log = infer.predict(wav_path, text)
+print(final_phns)
+```
+## Notes / Limitations
+-   Audio tokenization must match training (see code and provided artifacts).
+-   Input phoneme format: ARPAbet tokens separated by spaces.
+## Citation
+If you use this model, please cite:
+```bibtex
+@article{guo2026huper,
+  title   = {HuPER: A Human-Inspired Framework for Phonetic Perception},
+  author  = {Guo, Chenxu and Lian, Jiachen and Liu, Yisi and Huang, Baihe and Narayanan, Shriyaa and Cho, Cheol Jun and Anumanchipalli, Gopala},
+  journal = {arXiv preprint arXiv:2602.01634},
+  year    = {2026}
+}
+```

hparams.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "vocab_size": 42,
+  "audio_vocab_size": 2048,
+  "d_model": 512,
+  "nhead": 8,
+  "num_layers": 8,
+  "dropout": 0.2,
+  "lr": 0.0002,
+  "weight_decay": 0.01,
+  "scheduler_config": {
+    "type": "cosine",
+    "warmup_ratio": 0.1,
+    "eta_min": 1e-06,
+    "factor": 0.5,
+    "patience": 3,
+    "min_lr": 1e-06
+  },
+  "optimizer_config": {
+    "name": "adamw",
+    "betas": [
+      0.9,
+      0.999
+    ],
+    "eps": 1e-08
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79845cbe992dedaf9d9bab95157466647a8431d35dab06e805b245edcba4ead4
+size 149242192

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+torch
+torchaudio
+pytorch-lightning
+transformers
+huggingface_hub
+g2p_en
+safetensors