Add info about model in README.md

4f3ebba verified 22 days ago

2.55 kB

language:
  - en
license: apache-2.0
library_name: timm
tags:
  - vision
  - image-classification
  - vit
  - mnist
  - computer-vision
datasets:
  - mnist
metrics:
  - accuracy
model-index:
  - name: SOTA-Blitz-997
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          name: MNIST
          type: mnist
        metrics:
          - type: accuracy
            value: 99.72
            name: Test Accuracy

SOTA-Blitz-997

Near-SOTA Precision | 7-Minute T4 Training | Safetensors Native

Model Overview

SOTA-Blitz-997 is a high-velocity Vision Transformer (ViT) architecture optimized for the MNIST handwritten digit classification task. While most "State-of-the-Art" models rely on massive ensembles and hours of GPU compute, SOTA-Blitz-997 was engineered to achieve elite accuracy within a single 7-minute training window on a standard NVIDIA T4 by leveraging the global attention mechanisms of the Transformer block.

Performance & Proof

The model achieves a verified 99.72% Test Accuracy, leaving only 28 errors out of 10,000 images. This performance exceeds the human baseline (~97.5%) and demonstrates that ViT architectures can effectively "solve" classic computer vision benchmarks with extreme efficiency.

Training Logs (Verified Convergence)

Epoch	Loss	Train Acc	Test Acc	Best Acc
05/30	0.6235	95.068%	98.440%	98.590%
10/30	0.5923	96.287%	98.840%	99.030%
15/30	0.5683	97.107%	99.220%	99.230%
20/30	0.5485	97.927%	99.460%	99.550%
25/30	0.5345	98.460%	99.660%	99.660%
30/30	0.5296	98.700%	99.720%	99.720%

Final Performance: 28 Errors / 10,000 Digits (TTA Enabled).

Technical Specifications

Architecture: Optimized Vision Transformer (ViT) with Patch Embedding & Attention-heads.
Training Hardware: NVIDIA T4 GPU (Kaggle).
Training Time: ~7 Minutes.
Format: .safetensors (Zero-copy loading, no-pickle security).
License: Apache 2.0.
Architecture Note: Based on a timm ViT-Small backbone with a custom 1-channel patch embedding layer and 32x32 input resolution.

Usage

from safetensors.torch import load_file
import torch

# Load the SOTA weights
model_weights = load_file("SOTA-Blitz-997.safetensors")

# Apply to your ViT architecture
# model.load_state_dict(model_weights)

Made By

Andy-ML-And-AI