Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
tags:
|
| 4 |
+
- audio
|
| 5 |
+
- speech
|
| 6 |
+
- codec
|
| 7 |
+
- neural-codec
|
| 8 |
+
- packet-loss
|
| 9 |
+
- pytorch
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Zero-Ping
|
| 14 |
+
|
| 15 |
+
Neural speech codec (16 kHz) with built-in **packet-loss repair** via a local masked attention transformer. Designed for real-time VoIP / WebRTC applications where packets are lost in transit.
|
| 16 |
+
|
| 17 |
+
| | |
|
| 18 |
+
|---|---|
|
| 19 |
+
| Sample rate | 16 kHz mono |
|
| 20 |
+
| Bitrate | ~8.6 kbps (9 RVQ codebooks × 1024 entries) |
|
| 21 |
+
| Frame size | 15 ms (hop = 240 samples) |
|
| 22 |
+
| Latency | ~30 ms algorithmic (2 future frames) |
|
| 23 |
+
| Parameters | 17.8 M |
|
| 24 |
+
| Best val STOI | **0.931** |
|
| 25 |
+
|
| 26 |
+
Training data: LibriTTS train-clean-100 + VCTK + CommonVoice v2 (~700 h).
|
| 27 |
+
|
| 28 |
+
## Install
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
git clone https://huggingface.co/Lucabr01/Zero-Ping
|
| 32 |
+
cd Zero-Ping
|
| 33 |
+
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
|
| 34 |
+
pip install vector-quantize-pytorch einops huggingface_hub
|
| 35 |
+
pip install -e .
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## Usage
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
import torch, torchaudio
|
| 42 |
+
from zpcodec import ZPCodec, GilbertElliottConfig, GilbertElliottSimulator
|
| 43 |
+
|
| 44 |
+
# Load model (downloads weights automatically on first run)
|
| 45 |
+
model = ZPCodec.from_pretrained("Lucabr01/Zero-Ping", device="cpu")
|
| 46 |
+
|
| 47 |
+
# Load audio (must be 16 kHz mono)
|
| 48 |
+
wav, sr = torchaudio.load("speech.wav")
|
| 49 |
+
if sr != 16000:
|
| 50 |
+
wav = torchaudio.functional.resample(wav, sr, 16000)
|
| 51 |
+
wav = wav.mean(0, keepdim=True).unsqueeze(0) # [1, 1, T]
|
| 52 |
+
|
| 53 |
+
with torch.no_grad():
|
| 54 |
+
# Encode → decode (clean, no packet loss)
|
| 55 |
+
z_q, indices = model.encode(wav)
|
| 56 |
+
wav_clean = model.decode(z_q)
|
| 57 |
+
|
| 58 |
+
# Simulate 10% packet loss and repair
|
| 59 |
+
cfg = GilbertElliottConfig(p=0.05, r=0.5, k=0.999, h=0.5)
|
| 60 |
+
sim = GilbertElliottSimulator(cfg, sample_rate=16000, hop_length=model.hop_length)
|
| 61 |
+
mask = sim.sample_frame_mask(1, z_q.shape[-1])
|
| 62 |
+
wav_repaired = model.decode(z_q, frame_mask=mask)
|
| 63 |
+
|
| 64 |
+
torchaudio.save("clean.wav", wav_clean.squeeze(0), 16000)
|
| 65 |
+
torchaudio.save("repaired.wav", wav_repaired.squeeze(0), 16000)
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Architecture
|
| 69 |
+
|
| 70 |
+
Three-stage training:
|
| 71 |
+
1. Codec pre-training (GAN + multi-scale mel + waveform + STFT losses)
|
| 72 |
+
2. Repair transformer training (frozen codec, latent L1 on missing frames only)
|
| 73 |
+
3. Joint fine-tuning (all modules, Gilbert-Elliott curriculum from mild to severe loss)
|
| 74 |
+
|
| 75 |
+
The `GilbertElliottConfig` parameters let you tune the simulated channel:
|
| 76 |
+
- `p` — probability of entering the Bad state (higher = more frequent bursts)
|
| 77 |
+
- `r` — probability of leaving the Bad state (higher = shorter bursts)
|
| 78 |
+
- `h` — P(no loss | Bad state), default 0.5
|
| 79 |
+
|
| 80 |
+
## Citation
|
| 81 |
+
|
| 82 |
+
If you use Zero-Ping in your work, please cite:
|
| 83 |
+
```
|
| 84 |
+
@misc{zeropingcodec2025,
|
| 85 |
+
author = {Lucabr01},
|
| 86 |
+
title = {Zero-Ping: Neural Speech Codec with Packet-Loss Repair},
|
| 87 |
+
year = {2025},
|
| 88 |
+
url = {https://huggingface.co/Lucabr01/Zero-Ping}
|
| 89 |
+
}
|
| 90 |
+
```
|