base_model:
- microsoft/wavlm-large
library_name: torch
license: apache-2.0
pipeline_tag: audio-to-audio
⚡ FocalCodec
A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation.
This repository contains the 50 Hz causal checkpoint with a codebook size of 2048 trained on Libri-Light, as described in the preprints.
📜 Preprints:
🌐 Project Page: https://lucadellalib.github.io/focalcodec-web/
🔊 Downstream Tasks: https://github.com/lucadellalib/audiocodecs
🛠️ Installation
First of all, install Python 3.8 or later. Then, open a terminal and run:
pip install huggingface-hub safetensors sounddevice soundfile torch torchaudio
▶️ Quickstart
NOTE: the audios directory contains audio samples that you can download and use to test the codec.
You can easily load the model using torch.hub without cloning the repository:
import torch
import torchaudio
# Load FocalCodec model
codec = torch.hub.load(
repo_or_dir="lucadellalib/focalcodec",
model="focalcodec",
config="lucadellalib/focalcodec_50hz",
force_reload=True, # Fetch the latest FocalCodec version from Torch Hub
)
codec.eval().requires_grad_(False)
# Load and preprocess the input audio
audio_file = "audios/librispeech-dev-clean/251-118436-0003.wav"
sig, sample_rate = torchaudio.load(audio_file)
sig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate_input)
# Encode audio into tokens
toks = codec.sig_to_toks(sig) # Shape: (batch, time)
print(toks.shape)
print(toks)
# Convert tokens to their corresponding binary spherical codes
codes = codec.toks_to_codes(toks) # Shape: (batch, code_time, log2 codebook_size)
print(codes.shape)
print(codes)
# Decode tokens back into a waveform
rec_sig = codec.toks_to_sig(toks)
# Save the reconstructed audio
rec_sig = torchaudio.functional.resample(rec_sig, codec.sample_rate_output, sample_rate)
torchaudio.save("reconstruction.wav", rec_sig, sample_rate)
Alternatively, you can install FocalCodec as a standard Python package using pip:
pip install focalcodec@git+https://github.com/lucadellalib/focalcodec.git@main#egg=focalcodec
Once installed, you can import it in your scripts:
import focalcodec
config = "lucadellalib/focalcodec_50hz"
codec = focalcodec.FocalCodec.from_pretrained(config)
Check the code documentation for more details on model usage and available configurations.
NOTE: the initial v0.0.1 release is still available at https://github.com/lucadellalib/focalcodec/tree/v0.0.1.
It can be loaded via torch.hub as repo_or_dir="lucadellalib/focalcodec:v0.0.1", or installed via pip as
focalcodec@git+https://github.com/lucadellalib/focalcodec.git@v0.0.1#egg=focalcodec.
@ Citing
@article{dellalibera2025focalcodec,
title = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks},
author = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli},
journal = {arXiv preprint arXiv:2502.04465},
year = {2025},
}
@article{dellalibera2025focalcodecstream,
title = {{FocalCodec-Stream}: Streaming Low-Bitrate Speech Coding via Causal Distillation},
author = {Luca {Della Libera} and Cem Subakan and Mirco Ravanelli},
journal = {arXiv preprint arXiv:2509.16195},
year = {2025},
}