YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen3-0.6B for Burn

This repository contains Qwen3-0.6B weights in formats compatible with the Burn deep learning framework.

Model Details

  • Base Model: Qwen/Qwen3-0.6B
  • Parameters: 0.6B
  • Architecture: Qwen3 (decoder-only transformer)
  • License: Apache-2.0

Configuration

Parameter Value
hidden_size 1024
num_hidden_layers 28
num_attention_heads 16
num_key_value_heads 8
intermediate_size 3072
vocab_size 151936
max_position_embeddings 40960
rope_theta 1000000
rms_norm_eps 1e-6

Available Formats

File Format Size Description
model.safetensors HuggingFace SafeTensors 1.4 GB Original BF16 weights from Qwen
model.bpk Burn Burnpack 1.4 GB Converted for Burn (BF16)
tokenizer.json HuggingFace Tokenizers 11 MB Tokenizer file

Usage with qwen3-burn

Using .bpk format (recommended)

use qwen3_burn::{Qwen3Config, Qwen3ForCausalLM, Qwen3Tokenizer};
use burn::backend::candle::{Candle, CandleDevice};
use half::bf16;

type Backend = Candle<bf16, i64>;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let device = CandleDevice::metal(0); // or CandleDevice::Cpu

    // Load tokenizer
    let tokenizer = Qwen3Tokenizer::from_file("tokenizer.json")?;

    // Initialize model with 0.6B config preset
    let model = Qwen3Config::qwen3_0_6b()
        .init_causal_lm::<Backend>(&device)
        .with_weights("model.bpk")?;  // or "model.safetensors"

    // Generate text
    let (input_ids, _) = tokenizer.encode("Hello, world!")?;
    let input_tensor = Tensor::from_data(&input_ids, &device).unsqueeze();

    let output = model.generate_with_cache(
        input_tensor,
        50,    // max_new_tokens
        0.0,   // temperature (0 = greedy, >0 = sampling)
        0.9,   // top_p
        50,    // top_k
    );

    let text = tokenizer.decode(&output.to_data().to_vec())?;
    println!("{}", text);

    Ok(())
}

Performance

On Apple M-series with Metal backend:

  • ~25 tokens/sec with greedy decoding (temperature=0)
  • Model loading: ~2-3 seconds

Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support