NovaSR Candle Port

A Rust port of NovaSR - a lightning-fast audio super-resolution model - using Hugging Face Candle.

Overview

NovaSR is a tiny 52KB model that upsamples 16kHz audio to crystal-clear 48kHz at speeds exceeding 3600x realtime. This project ports the original PyTorch implementation to Rust using the Candle deep learning framework.

Features

Tiny Model: Only ~13,000 parameters (52KB)
Blazing Fast: 3600x realtime inference on GPU
High Quality: 16kHz → 48kHz upsampling
Pure Rust: No Python dependencies for inference
WASM Compatible: Can run in the browser

Architecture

Input (16kHz) → ConvPre → Interpolate (3x) → AMPBlock0 → ConvPost → Tanh → Output (48kHz)

Key Components

SnakeBeta Activation: Novel activation function for periodic signals
```
x + (1 - cos(2 * α * x)) / (2 * β + ε)
```
AMPBlock0: Residual block with dilated convolutions and SnakeBeta activation
Generator: Main upsampling network with pre/post convolutions

Project Structure

├── src/ │ ├── lib.rs # Core library │ └── main.rs # CLI tool ├── models/ # Safetensors weights ├── scripts/ # Utility scripts (conversion, parity) ├── data/ # Audio samples ├── Cargo.toml └── README.md

Installation

Prerequisites

Rust 1.70+
For weight conversion: Python 3.8+ with PyTorch and safetensors

Build

# Clone the repository
git clone <repo-url>
cd novasr-candle


# Build the library and CLI
cargo build --release

# The binary will be at target/release/novasr-cli

Usage

Convert Weights

First, convert the PyTorch weights to Candle format:

uv run --project NovaSR python scripts/convert_weights.py \
  --input novasr_model.pth \
  --output models/novasr_v1.safetensors

CLI Usage

# Upsample an audio file using a local model
./target/release/novasr-cli input.wav output.wav models/novasr_v1.safetensors

# Upsample using a model from Hugging Face Hub
./target/release/novasr-cli input.wav output.wav babybirdprd/novasr-candle

Library Usage

use candle_core::{Device, DType};
use candle_nn::VarBuilder;
use novasr_candle::{load_model, upsample_audio};

fn main() -> anyhow::Result<()> {
    let device = Device::Cpu;
    
    // Load from HF Hub
    let model = novasr_candle::from_hf("babybirdprd/novasr-candle", "main", &device)?;
    
    // OR load local
    // let vb = unsafe { VarBuilder::from_mmaped_safetensors("model.safetensors", DType::F32, &device)? };
    // let model = load_model(vb)?;

    
    // Process audio
    let input = candle_core::Tensor::from_vec(
        audio_samples,
        (1, 1, sample_count),
        &device,
    )?;
    
    let output = upsample_audio(&model, &input)?;
    
    Ok(())
}

Implementation Details

SnakeBeta Activation

The SnakeBeta activation is a learnable periodic activation function:

pub fn forward(&self, x: &Tensor) -> Result<Tensor> {
    let a = self.alpha.exp()?;
    let b = self.beta.exp()?;
    
    // x + (1 - cos(2 * alpha * x)) / (2 * beta + epsilon)
    let two_a_x = (x.broadcast_mul(&a)? * 2.0)?;
    let cos_term = two_a_x.cos()?;
    let one_minus_cos = (Tensor::ones_like(&cos_term)? - cos_term)?;
    let inv_2b = ((b * 2.0)? + 1e-9)?.recip()?;
    
    x.add(&one_minus_cos.broadcast_mul(&inv_2b)?)
}

Generator

The Generator performs the main upsampling:

Pre-convolution: 7x1 conv to expand channels
Interpolation: 3x linear upsampling
Residual blocks: AMPBlock0 with SnakeBeta activations
Post-convolution: 7x1 conv to output channel
Tanh: Output normalization

Model Specifications

Property	Value
Total Parameters	~13,000
Model Size	52 KB
Input Sample Rate	16 kHz
Output Sample Rate	48 kHz
Upsampling Factor	3x
Inference Speed	3600x realtime (A100)

Comparison with Original

Feature	PyTorch (Original)	Candle (This Port)
Language	Python	Rust
Framework	PyTorch	Candle
Dependencies	Heavy	Minimal
WASM Support	No	Yes
Performance	Fast	Comparable

Web Demo

A web demo showcasing the model architecture and Rust implementation is available at:

https://vkf4cascsh44i.ok.kimi.link

Future Work

WASM bindings for browser inference
Quantization support (INT8)
GPU acceleration (CUDA)
Streaming inference for real-time processing
Batch processing support

Credits

Original NovaSR by Yatharth Sharma
Candle by Hugging Face

License

Apache 2.0 - Same as the original NovaSR project

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support