YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
NovaSR Candle Port
A Rust port of NovaSR - a lightning-fast audio super-resolution model - using Hugging Face Candle.
Overview
NovaSR is a tiny 52KB model that upsamples 16kHz audio to crystal-clear 48kHz at speeds exceeding 3600x realtime. This project ports the original PyTorch implementation to Rust using the Candle deep learning framework.
Features
- Tiny Model: Only ~13,000 parameters (52KB)
- Blazing Fast: 3600x realtime inference on GPU
- High Quality: 16kHz β 48kHz upsampling
- Pure Rust: No Python dependencies for inference
- WASM Compatible: Can run in the browser
Architecture
Input (16kHz) β ConvPre β Interpolate (3x) β AMPBlock0 β ConvPost β Tanh β Output (48kHz)
Key Components
SnakeBeta Activation: Novel activation function for periodic signals
x + (1 - cos(2 * Ξ± * x)) / (2 * Ξ² + Ξ΅)AMPBlock0: Residual block with dilated convolutions and SnakeBeta activation
Generator: Main upsampling network with pre/post convolutions
Project Structure
βββ src/ β βββ lib.rs # Core library β βββ main.rs # CLI tool βββ models/ # Safetensors weights βββ scripts/ # Utility scripts (conversion, parity) βββ data/ # Audio samples βββ Cargo.toml βββ README.md
Installation
Prerequisites
- Rust 1.70+
- For weight conversion: Python 3.8+ with PyTorch and safetensors
Build
# Clone the repository
git clone <repo-url>
cd novasr-candle
# Build the library and CLI
cargo build --release
# The binary will be at target/release/novasr-cli
Usage
Convert Weights
First, convert the PyTorch weights to Candle format:
uv run --project NovaSR python scripts/convert_weights.py \
--input novasr_model.pth \
--output models/novasr_v1.safetensors
CLI Usage
# Upsample an audio file using a local model
./target/release/novasr-cli input.wav output.wav models/novasr_v1.safetensors
# Upsample using a model from Hugging Face Hub
./target/release/novasr-cli input.wav output.wav babybirdprd/novasr-candle
Library Usage
use candle_core::{Device, DType};
use candle_nn::VarBuilder;
use novasr_candle::{load_model, upsample_audio};
fn main() -> anyhow::Result<()> {
let device = Device::Cpu;
// Load from HF Hub
let model = novasr_candle::from_hf("babybirdprd/novasr-candle", "main", &device)?;
// OR load local
// let vb = unsafe { VarBuilder::from_mmaped_safetensors("model.safetensors", DType::F32, &device)? };
// let model = load_model(vb)?;
// Process audio
let input = candle_core::Tensor::from_vec(
audio_samples,
(1, 1, sample_count),
&device,
)?;
let output = upsample_audio(&model, &input)?;
Ok(())
}
Implementation Details
SnakeBeta Activation
The SnakeBeta activation is a learnable periodic activation function:
pub fn forward(&self, x: &Tensor) -> Result<Tensor> {
let a = self.alpha.exp()?;
let b = self.beta.exp()?;
// x + (1 - cos(2 * alpha * x)) / (2 * beta + epsilon)
let two_a_x = (x.broadcast_mul(&a)? * 2.0)?;
let cos_term = two_a_x.cos()?;
let one_minus_cos = (Tensor::ones_like(&cos_term)? - cos_term)?;
let inv_2b = ((b * 2.0)? + 1e-9)?.recip()?;
x.add(&one_minus_cos.broadcast_mul(&inv_2b)?)
}
Generator
The Generator performs the main upsampling:
- Pre-convolution: 7x1 conv to expand channels
- Interpolation: 3x linear upsampling
- Residual blocks: AMPBlock0 with SnakeBeta activations
- Post-convolution: 7x1 conv to output channel
- Tanh: Output normalization
Model Specifications
| Property | Value |
|---|---|
| Total Parameters | ~13,000 |
| Model Size | 52 KB |
| Input Sample Rate | 16 kHz |
| Output Sample Rate | 48 kHz |
| Upsampling Factor | 3x |
| Inference Speed | 3600x realtime (A100) |
Comparison with Original
| Feature | PyTorch (Original) | Candle (This Port) |
|---|---|---|
| Language | Python | Rust |
| Framework | PyTorch | Candle |
| Dependencies | Heavy | Minimal |
| WASM Support | No | Yes |
| Performance | Fast | Comparable |
Web Demo
A web demo showcasing the model architecture and Rust implementation is available at:
https://vkf4cascsh44i.ok.kimi.link
Future Work
- WASM bindings for browser inference
- Quantization support (INT8)
- GPU acceleration (CUDA)
- Streaming inference for real-time processing
- Batch processing support
Credits
- Original NovaSR by Yatharth Sharma
- Candle by Hugging Face
License
Apache 2.0 - Same as the original NovaSR project