AbstractPhil
/

vit-beans-v3

@@ -23,7 +23,7 @@ This repository contains multiple training runs using Cantor fusion architecture
 ```
 vit-beans-v3/
 ├── runs/
-│   ├── cifar10_weighted_TIMESTAMP/
 │   │   ├── checkpoints/
 │   │   │   ├── best_model.safetensors
 │   │   │   ├── best_training_state.pt
@@ -31,17 +31,17 @@ vit-beans-v3/
 │   │   ├── tensorboard/
 │   │   ├── config.yaml
 │   │   └── README.md
-│   ├── cifar100_consciousness_TIMESTAMP/
-│   │   └── ...
 │   └── ...
 └── README.md (this file)
 ```
 ## Current Run
-**Latest**: `cifar100_weighted_20251119_170816`
 - **Dataset**: CIFAR100
 - **Fusion Mode**: weighted
 - **Architecture**: 6 blocks, 8 heads
 - **Simplex**: 4-simplex (5 vertices)
@@ -53,6 +53,14 @@ The Cantor Fusion architecture uses:
 - **Beatrix Consciousness Routing**: Optional consciousness-aware token fusion using the Devil's Staircase
 - **SafeTensors Format**: All model weights use SafeTensors (not pickle) for security
 ## Usage
 ### Download a Model
@@ -98,8 +106,11 @@ Each run directory contains:
 ## Training Details
 All models trained with:
-- Optimizer: AdamW
 - Mixed Precision: Available on A100
 - Augmentation: AutoAugment (CIFAR10 policy)
 - Format: SafeTensors (ClamAV safe)
@@ -110,4 +121,4 @@ Built with geometric consciousness-aware routing using the Devil's Staircase (Be
 **Repository maintained by**: [@AbstractPhil](https://huggingface.co/AbstractPhil)
-**Latest update**: 2025-11-19 17:08:18

 ```
 vit-beans-v3/
 ├── runs/
+│   ├── cifar10_weighted_SGD_TIMESTAMP/
 │   │   ├── checkpoints/
 │   │   │   ├── best_model.safetensors
 │   │   │   ├── best_training_state.pt
 │   │   ├── tensorboard/
 │   │   ├── config.yaml
 │   │   └── README.md
 │   └── ...
 └── README.md (this file)
 ```
 ## Current Run
+**Latest**: `cifar100_weighted_SGD_20251119_173038`
 - **Dataset**: CIFAR100
 - **Fusion Mode**: weighted
+- **Optimizer**: SGD (momentum=0.9)
+- **Scheduler**: MultiStepLR [40, 60, 80]
 - **Architecture**: 6 blocks, 8 heads
 - **Simplex**: 4-simplex (5 vertices)
 - **Beatrix Consciousness Routing**: Optional consciousness-aware token fusion using the Devil's Staircase
 - **SafeTensors Format**: All model weights use SafeTensors (not pickle) for security
+## Training Strategy
+This model uses the proven **SGD + milestone LR drops** strategy from WideResNet:
+- Initial LR: 0.1
+- Milestones: [40, 60, 80]
+- Decay factor: 0.2 (LR *= 0.2 at each milestone)
+- This causes the dramatic accuracy jumps seen in deep networks!
 ## Usage
 ### Download a Model
 ## Training Details
+Optimizer options:
+- **SGD**: High momentum (0.9), Nesterov, milestone-based LR drops
+- **AdamW**: Weight decay, cosine annealing with warmup
 All models trained with:
 - Mixed Precision: Available on A100
 - Augmentation: AutoAugment (CIFAR10 policy)
 - Format: SafeTensors (ClamAV safe)
 **Repository maintained by**: [@AbstractPhil](https://huggingface.co/AbstractPhil)
+**Latest update**: 2025-11-19 17:30:40