Lucabr01 commited on
Commit
a25be20
·
verified ·
1 Parent(s): 2d1eb63

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - audio
5
+ - speech
6
+ - codec
7
+ - neural-codec
8
+ - packet-loss
9
+ - pytorch
10
+ license: mit
11
+ ---
12
+
13
+ # Zero-Ping
14
+
15
+ Neural speech codec (16 kHz) with built-in **packet-loss repair** via a local masked attention transformer. Designed for real-time VoIP / WebRTC applications where packets are lost in transit.
16
+
17
+ | | |
18
+ |---|---|
19
+ | Sample rate | 16 kHz mono |
20
+ | Bitrate | ~8.6 kbps (9 RVQ codebooks × 1024 entries) |
21
+ | Frame size | 15 ms (hop = 240 samples) |
22
+ | Latency | ~30 ms algorithmic (2 future frames) |
23
+ | Parameters | 17.8 M |
24
+ | Best val STOI | **0.931** |
25
+
26
+ Training data: LibriTTS train-clean-100 + VCTK + CommonVoice v2 (~700 h).
27
+
28
+ ## Install
29
+
30
+ ```bash
31
+ git clone https://huggingface.co/Lucabr01/Zero-Ping
32
+ cd Zero-Ping
33
+ pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
34
+ pip install vector-quantize-pytorch einops huggingface_hub
35
+ pip install -e .
36
+ ```
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ import torch, torchaudio
42
+ from zpcodec import ZPCodec, GilbertElliottConfig, GilbertElliottSimulator
43
+
44
+ # Load model (downloads weights automatically on first run)
45
+ model = ZPCodec.from_pretrained("Lucabr01/Zero-Ping", device="cpu")
46
+
47
+ # Load audio (must be 16 kHz mono)
48
+ wav, sr = torchaudio.load("speech.wav")
49
+ if sr != 16000:
50
+ wav = torchaudio.functional.resample(wav, sr, 16000)
51
+ wav = wav.mean(0, keepdim=True).unsqueeze(0) # [1, 1, T]
52
+
53
+ with torch.no_grad():
54
+ # Encode → decode (clean, no packet loss)
55
+ z_q, indices = model.encode(wav)
56
+ wav_clean = model.decode(z_q)
57
+
58
+ # Simulate 10% packet loss and repair
59
+ cfg = GilbertElliottConfig(p=0.05, r=0.5, k=0.999, h=0.5)
60
+ sim = GilbertElliottSimulator(cfg, sample_rate=16000, hop_length=model.hop_length)
61
+ mask = sim.sample_frame_mask(1, z_q.shape[-1])
62
+ wav_repaired = model.decode(z_q, frame_mask=mask)
63
+
64
+ torchaudio.save("clean.wav", wav_clean.squeeze(0), 16000)
65
+ torchaudio.save("repaired.wav", wav_repaired.squeeze(0), 16000)
66
+ ```
67
+
68
+ ## Architecture
69
+
70
+ Three-stage training:
71
+ 1. Codec pre-training (GAN + multi-scale mel + waveform + STFT losses)
72
+ 2. Repair transformer training (frozen codec, latent L1 on missing frames only)
73
+ 3. Joint fine-tuning (all modules, Gilbert-Elliott curriculum from mild to severe loss)
74
+
75
+ The `GilbertElliottConfig` parameters let you tune the simulated channel:
76
+ - `p` — probability of entering the Bad state (higher = more frequent bursts)
77
+ - `r` — probability of leaving the Bad state (higher = shorter bursts)
78
+ - `h` — P(no loss | Bad state), default 0.5
79
+
80
+ ## Citation
81
+
82
+ If you use Zero-Ping in your work, please cite:
83
+ ```
84
+ @misc{zeropingcodec2025,
85
+ author = {Lucabr01},
86
+ title = {Zero-Ping: Neural Speech Codec with Packet-Loss Repair},
87
+ year = {2025},
88
+ url = {https://huggingface.co/Lucabr01/Zero-Ping}
89
+ }
90
+ ```