vit-beans-v3 / README.md

AbstractPhil

Upload README.md with huggingface_hub

f13ee7d verified 5 months ago

2.84 kB

tags:
  - image-classification
  - cantor-fusion
  - geometric-deep-learning
  - safetensors
  - vision-transformer
  - warm-restarts
library_name: pytorch
datasets:
  - cifar10
  - cifar100
metrics:
  - accuracy

vit-beans-v3

Geometric Deep Learning with Cantor Multihead Fusion + AdamW Warm Restarts

This repository contains multiple training runs using Cantor fusion architecture with pentachoron structures, geometric routing, and CosineAnnealingWarmRestarts for automatic exploration cycles.

Training Strategy: AdamW + Warm Restarts

This model uses AdamW with Cosine Annealing Warm Restarts (SGDR):

Drop phase: LR decays from 0.0001 → 1e-07 over 40 epochs
Restart phase: LR jumps back to 0.0001 to explore new regions
Cycle multiplier: Each cycle is 1.5x longer than previous
Benefits: Automatic exploration + exploitation, finds better minima, robust training

🚀 LR Boost at Restarts (NEW!)

This run uses restart_lr_mult = 1.25x:

Normal restart: 3e-4 → 1e-7 → restart at 3e-4
Boosted restart: 3e-4 → 1e-7 → restart at 1.25e-04 (1.25x!)
Creates wider exploration curves to escape solidified local minima
Each restart provides progressively stronger exploration boost

Restart Schedule

Epochs 0-40:   LR: 0.0001 → 1e-07 (first cycle)
Epoch 40:      LR: RESTART to 0.000125 🔄
Epochs 40-100.0: LR: 0.000125 → 1e-07 (longer cycle)
...

Current Run

Latest: cifar100_consciousness_ADAMW_WarmRestart_boost1.25x_20251122_025019

Dataset: CIFAR100
Fusion Mode: consciousness
Optimizer: AdamW (adaptive moments)
Scheduler: CosineAnnealingWarmRestarts
Restart LR Mult: 1.25x
Architecture: 4 blocks, 4 heads
Simplex: 4-simplex (5 vertices)

Architecture

The Cantor Fusion architecture uses:

Geometric Routing: Pentachoron (5-simplex) structures for token routing
Cantor Multihead Fusion: Multiple fusion heads with geometric attention
Beatrix Consciousness Routing: Optional consciousness-aware token fusion
SafeTensors Format: All model weights use SafeTensors (not pickle)

Usage

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

model_path = hf_hub_download(
    repo_id="AbstractPhil/vit-beans-v3",
    filename="runs/YOUR_RUN_NAME/checkpoints/best_model.safetensors"
)

state_dict = load_file(model_path)
model.load_state_dict(state_dict)

Citation

@misc{vit_beans_v3,
  author = {AbstractPhil},
  title = {vit-beans-v3: Geometric Deep Learning with Warm Restarts},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/AbstractPhil/vit-beans-v3}
}

Repository maintained by: @AbstractPhil

Latest update: 2025-11-22 02:50:22