SOTA-Blitz-997 / README.md
Andy-ML-And-AI's picture
Add info about model in README.md
4f3ebba verified
metadata
language:
  - en
license: apache-2.0
library_name: timm
tags:
  - vision
  - image-classification
  - vit
  - mnist
  - computer-vision
datasets:
  - mnist
metrics:
  - accuracy
model-index:
  - name: SOTA-Blitz-997
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          name: MNIST
          type: mnist
        metrics:
          - type: accuracy
            value: 99.72
            name: Test Accuracy

SOTA-Blitz-997

Near-SOTA Precision | 7-Minute T4 Training | Safetensors Native


Model Overview

SOTA-Blitz-997 is a high-velocity Vision Transformer (ViT) architecture optimized for the MNIST handwritten digit classification task. While most "State-of-the-Art" models rely on massive ensembles and hours of GPU compute, SOTA-Blitz-997 was engineered to achieve elite accuracy within a single 7-minute training window on a standard NVIDIA T4 by leveraging the global attention mechanisms of the Transformer block.

Performance & Proof

The model achieves a verified 99.72% Test Accuracy, leaving only 28 errors out of 10,000 images. This performance exceeds the human baseline (~97.5%) and demonstrates that ViT architectures can effectively "solve" classic computer vision benchmarks with extreme efficiency.

Training Logs (Verified Convergence)

Epoch Loss Train Acc Test Acc Best Acc
05/30 0.6235 95.068% 98.440% 98.590%
10/30 0.5923 96.287% 98.840% 99.030%
15/30 0.5683 97.107% 99.220% 99.230%
20/30 0.5485 97.927% 99.460% 99.550%
25/30 0.5345 98.460% 99.660% 99.660%
30/30 0.5296 98.700% 99.720% 99.720%

Final Performance: 28 Errors / 10,000 Digits (TTA Enabled).

Technical Specifications

  • Architecture: Optimized Vision Transformer (ViT) with Patch Embedding & Attention-heads.
  • Training Hardware: NVIDIA T4 GPU (Kaggle).
  • Training Time: ~7 Minutes.
  • Format: .safetensors (Zero-copy loading, no-pickle security).
  • License: Apache 2.0.
  • Architecture Note: Based on a timm ViT-Small backbone with a custom 1-channel patch embedding layer and 32x32 input resolution.

Usage

from safetensors.torch import load_file
import torch

# Load the SOTA weights
model_weights = load_file("SOTA-Blitz-997.safetensors")

# Apply to your ViT architecture
# model.load_state_dict(model_weights)

Made By

Andy-ML-And-AI