SOTA-Blitz-997

Near-SOTA Precision | 7-Minute T4 Training | Safetensors Native

Model Overview

SOTA-Blitz-997 is a high-velocity Vision Transformer (ViT) architecture optimized for the MNIST handwritten digit classification task. While most "State-of-the-Art" models rely on massive ensembles and hours of GPU compute, SOTA-Blitz-997 was engineered to achieve elite accuracy within a single 7-minute training window on a standard NVIDIA T4 by leveraging the global attention mechanisms of the Transformer block.

Performance & Proof

The model achieves a verified 99.72% Test Accuracy, leaving only 28 errors out of 10,000 images. This performance exceeds the human baseline (~97.5%) and demonstrates that ViT architectures can effectively "solve" classic computer vision benchmarks with extreme efficiency.

Training Logs (Verified Convergence)

Epoch	Loss	Train Acc	Test Acc	Best Acc
05/30	0.6235	95.068%	98.440%	98.590%
10/30	0.5923	96.287%	98.840%	99.030%
15/30	0.5683	97.107%	99.220%	99.230%
20/30	0.5485	97.927%	99.460%	99.550%
25/30	0.5345	98.460%	99.660%	99.660%
30/30	0.5296	98.700%	99.720%	99.720%

Final Performance: 28 Errors / 10,000 Digits (TTA Enabled).

Technical Specifications

Architecture: Optimized Vision Transformer (ViT) with Patch Embedding & Attention-heads.
Training Hardware: NVIDIA T4 GPU (Kaggle).
Training Time: ~7 Minutes.
Format: .safetensors (Zero-copy loading, no-pickle security).
License: Apache 2.0.
Architecture Note: Based on a timm ViT-Small backbone with a custom 1-channel patch embedding layer and 32x32 input resolution.

Usage

from safetensors.torch import load_file
import torch

# Load the SOTA weights
model_weights = load_file("SOTA-Blitz-997.safetensors")

# Apply to your ViT architecture
# model.load_state_dict(model_weights)

Made By

Andy-ML-And-AI

Downloads last month: -

Dataset used to train Andy-ML-And-AI/SOTA-Blitz-997

Evaluation results

Test Accuracy on MNIST
self-reported

99.720