YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

NovaSR Candle Port

A Rust port of NovaSR - a lightning-fast audio super-resolution model - using Hugging Face Candle.

Overview

NovaSR is a tiny 52KB model that upsamples 16kHz audio to crystal-clear 48kHz at speeds exceeding 3600x realtime. This project ports the original PyTorch implementation to Rust using the Candle deep learning framework.

Features

  • Tiny Model: Only ~13,000 parameters (52KB)
  • Blazing Fast: 3600x realtime inference on GPU
  • High Quality: 16kHz β†’ 48kHz upsampling
  • Pure Rust: No Python dependencies for inference
  • WASM Compatible: Can run in the browser

Architecture

Input (16kHz) β†’ ConvPre β†’ Interpolate (3x) β†’ AMPBlock0 β†’ ConvPost β†’ Tanh β†’ Output (48kHz)

Key Components

  1. SnakeBeta Activation: Novel activation function for periodic signals

    x + (1 - cos(2 * Ξ± * x)) / (2 * Ξ² + Ξ΅)
    
  2. AMPBlock0: Residual block with dilated convolutions and SnakeBeta activation

  3. Generator: Main upsampling network with pre/post convolutions

Project Structure

β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ lib.rs # Core library β”‚ └── main.rs # CLI tool β”œβ”€β”€ models/ # Safetensors weights β”œβ”€β”€ scripts/ # Utility scripts (conversion, parity) β”œβ”€β”€ data/ # Audio samples β”œβ”€β”€ Cargo.toml └── README.md

Installation

Prerequisites

  • Rust 1.70+
  • For weight conversion: Python 3.8+ with PyTorch and safetensors

Build

# Clone the repository
git clone <repo-url>
cd novasr-candle


# Build the library and CLI
cargo build --release

# The binary will be at target/release/novasr-cli

Usage

Convert Weights

First, convert the PyTorch weights to Candle format:

uv run --project NovaSR python scripts/convert_weights.py \
  --input novasr_model.pth \
  --output models/novasr_v1.safetensors

CLI Usage

# Upsample an audio file using a local model
./target/release/novasr-cli input.wav output.wav models/novasr_v1.safetensors

# Upsample using a model from Hugging Face Hub
./target/release/novasr-cli input.wav output.wav babybirdprd/novasr-candle

Library Usage

use candle_core::{Device, DType};
use candle_nn::VarBuilder;
use novasr_candle::{load_model, upsample_audio};

fn main() -> anyhow::Result<()> {
    let device = Device::Cpu;
    
    // Load from HF Hub
    let model = novasr_candle::from_hf("babybirdprd/novasr-candle", "main", &device)?;
    
    // OR load local
    // let vb = unsafe { VarBuilder::from_mmaped_safetensors("model.safetensors", DType::F32, &device)? };
    // let model = load_model(vb)?;

    
    // Process audio
    let input = candle_core::Tensor::from_vec(
        audio_samples,
        (1, 1, sample_count),
        &device,
    )?;
    
    let output = upsample_audio(&model, &input)?;
    
    Ok(())
}

Implementation Details

SnakeBeta Activation

The SnakeBeta activation is a learnable periodic activation function:

pub fn forward(&self, x: &Tensor) -> Result<Tensor> {
    let a = self.alpha.exp()?;
    let b = self.beta.exp()?;
    
    // x + (1 - cos(2 * alpha * x)) / (2 * beta + epsilon)
    let two_a_x = (x.broadcast_mul(&a)? * 2.0)?;
    let cos_term = two_a_x.cos()?;
    let one_minus_cos = (Tensor::ones_like(&cos_term)? - cos_term)?;
    let inv_2b = ((b * 2.0)? + 1e-9)?.recip()?;
    
    x.add(&one_minus_cos.broadcast_mul(&inv_2b)?)
}

Generator

The Generator performs the main upsampling:

  1. Pre-convolution: 7x1 conv to expand channels
  2. Interpolation: 3x linear upsampling
  3. Residual blocks: AMPBlock0 with SnakeBeta activations
  4. Post-convolution: 7x1 conv to output channel
  5. Tanh: Output normalization

Model Specifications

Property Value
Total Parameters ~13,000
Model Size 52 KB
Input Sample Rate 16 kHz
Output Sample Rate 48 kHz
Upsampling Factor 3x
Inference Speed 3600x realtime (A100)

Comparison with Original

Feature PyTorch (Original) Candle (This Port)
Language Python Rust
Framework PyTorch Candle
Dependencies Heavy Minimal
WASM Support No Yes
Performance Fast Comparable

Web Demo

A web demo showcasing the model architecture and Rust implementation is available at:

https://vkf4cascsh44i.ok.kimi.link

Future Work

  • WASM bindings for browser inference
  • Quantization support (INT8)
  • GPU acceleration (CUDA)
  • Streaming inference for real-time processing
  • Batch processing support

Credits

License

Apache 2.0 - Same as the original NovaSR project

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support