AtomLlama-33K-5x5-DigitMesh-Sparse-Q8

An INT8 quantized version of atomllama-33K-5x5-DigitMesh-sparse for ultra-efficient 5脳5 digit mesh recognition.

Model Description

This is a 50% sparse + INT8 quantized variant of the AtomLlama-33K-5x5-DigitMesh model, combining structured sparsity with Quantization Aware Training (QAT). This dual compression approach maintains digit recognition accuracy while significantly reducing model size and computational requirements.

Key Features

  • Base Model: junzzhu/atomllama-33K-5x5-DigitMesh-sparse
  • Sparsity: ~50% (unstructured)
  • Quantization: INT8 with Sparse QAT
  • Parameters: ~33K total, ~16.5K non-zero
  • Architecture: LlamaForCausalLM
  • Task: 5脳5 binary digit mesh recognition
  • Compression: ~3x smaller than original model (46KB vs. 137KB)

Usage

Basic Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_path = "./models/atomllama-33K-5x5-DigitMesh-sparse-q8"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    dtype="auto",
    device_map="auto"
)

# Example: Classify a 5x5 binary digit pattern (digit "0")
pattern = "1 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1"
prompt = f"{pattern} <SEP>"

# Tokenize and generate prediction
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
inputs.pop("token_type_ids", None)

outputs = model.generate(
    **inputs,
    max_new_tokens=1,
    do_sample=False
)

# Decode the prediction
prediction = tokenizer.decode(
    outputs[0][len(inputs.input_ids[0]):],
    skip_special_tokens=True
).strip()

print(f"Predicted digit: {prediction}")  # Expected: "D0"

Compression Details

Sparsity

  • Type: Unstructured (weights pruned individually based on importance)
  • Target Sparsity: 50%
  • Method: SparseGPT with Hessian-based importance scoring

Quantization

License

Apache-2.0

Citation

@misc{atomllama-33k-digitMesh-sparse-q8,
  title={AtomLlama-33K-5x5-DigitMesh-Sparse-Q8: A 50% Sparse INT8 Quantized Model for Digit Recognition},
  author={Jun Zhu},
  year={2026},
  howpublished={\url{https://huggingface.co/junzzhu/atomllama-33K-5x5-DigitMesh-sparse-q8}}
}
Downloads last month
22
Safetensors
Model size
10.1k params
Tensor type
I64
F32
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for junzzhu/atomllama-33K-5x5-DigitMesh-sparse-q8