Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -5,6 +5,7 @@ tags:
|
|
| 5 |
- geometric-deep-learning
|
| 6 |
- safetensors
|
| 7 |
- vision-transformer
|
|
|
|
| 8 |
library_name: pytorch
|
| 9 |
datasets:
|
| 10 |
- cifar10
|
|
@@ -15,110 +16,71 @@ metrics:
|
|
| 15 |
|
| 16 |
# vit-beans-v3
|
| 17 |
|
| 18 |
-
**Geometric Deep Learning with Cantor Multihead Fusion**
|
| 19 |
|
| 20 |
-
This repository contains multiple training runs using Cantor fusion architecture with pentachoron structures
|
| 21 |
|
| 22 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
```
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
β β β βββ best_model.safetensors
|
| 29 |
-
β β β βββ best_training_state.pt
|
| 30 |
-
β β β βββ best_metadata.json
|
| 31 |
-
β β βββ tensorboard/
|
| 32 |
-
β β βββ config.yaml
|
| 33 |
-
β β βββ README.md
|
| 34 |
-
β βββ ...
|
| 35 |
-
βββ README.md (this file)
|
| 36 |
```
|
| 37 |
|
| 38 |
## Current Run
|
| 39 |
|
| 40 |
-
**Latest**: `
|
| 41 |
- **Dataset**: CIFAR100
|
| 42 |
-
- **Fusion Mode**:
|
| 43 |
-
- **Optimizer**:
|
| 44 |
-
- **Scheduler**:
|
| 45 |
-
- **Architecture**: 6 blocks,
|
| 46 |
-
- **Simplex**:
|
| 47 |
|
| 48 |
## Architecture
|
| 49 |
|
| 50 |
The Cantor Fusion architecture uses:
|
| 51 |
- **Geometric Routing**: Pentachoron (5-simplex) structures for token routing
|
| 52 |
- **Cantor Multihead Fusion**: Multiple fusion heads with geometric attention
|
| 53 |
-
- **Beatrix Consciousness Routing**: Optional consciousness-aware token fusion
|
| 54 |
-
- **SafeTensors Format**: All model weights use SafeTensors (not pickle)
|
| 55 |
-
|
| 56 |
-
## Training Strategy
|
| 57 |
-
|
| 58 |
-
This model uses the proven **SGD + milestone LR drops** strategy from WideResNet:
|
| 59 |
-
- Initial LR: 0.01
|
| 60 |
-
- Milestones: [60, 80]
|
| 61 |
-
- Decay factor: 0.2 (LR *= 0.2 at each milestone)
|
| 62 |
-
- This causes the dramatic accuracy jumps seen in deep networks!
|
| 63 |
|
| 64 |
## Usage
|
| 65 |
-
|
| 66 |
-
### Download a Model
|
| 67 |
```python
|
| 68 |
from huggingface_hub import hf_hub_download
|
| 69 |
from safetensors.torch import load_file
|
| 70 |
-
import torch
|
| 71 |
|
| 72 |
-
# Download model weights
|
| 73 |
model_path = hf_hub_download(
|
| 74 |
repo_id="AbstractPhil/vit-beans-v3",
|
| 75 |
filename="runs/YOUR_RUN_NAME/checkpoints/best_model.safetensors"
|
| 76 |
)
|
| 77 |
|
| 78 |
-
# Load weights (SafeTensors - no pickle!)
|
| 79 |
state_dict = load_file(model_path)
|
| 80 |
model.load_state_dict(state_dict)
|
| 81 |
```
|
| 82 |
|
| 83 |
-
### Browse Runs
|
| 84 |
-
|
| 85 |
-
Each run directory contains:
|
| 86 |
-
- `checkpoints/` - Model weights (safetensors), training state, metadata
|
| 87 |
-
- `tensorboard/` - TensorBoard logs for visualization
|
| 88 |
-
- `config.yaml` - Complete training configuration
|
| 89 |
-
- `README.md` - Run-specific details and results
|
| 90 |
-
|
| 91 |
-
## Model Variants
|
| 92 |
-
|
| 93 |
-
- **Weighted Fusion**: Standard geometric fusion with learned weights
|
| 94 |
-
- **Consciousness Fusion**: Uses Beatrix routing with consciousness emergence
|
| 95 |
-
|
| 96 |
## Citation
|
| 97 |
```bibtex
|
| 98 |
@misc{vit_beans_v3,
|
| 99 |
author = {AbstractPhil},
|
| 100 |
-
title = {vit-beans-v3: Geometric Deep Learning with
|
| 101 |
year = {2025},
|
| 102 |
publisher = {HuggingFace},
|
| 103 |
url = {https://huggingface.co/AbstractPhil/vit-beans-v3}
|
| 104 |
}
|
| 105 |
```
|
| 106 |
|
| 107 |
-
## Training Details
|
| 108 |
-
|
| 109 |
-
Optimizer options:
|
| 110 |
-
- **SGD**: High momentum (0.9), Nesterov, milestone-based LR drops
|
| 111 |
-
- **AdamW**: Weight decay, cosine annealing with warmup
|
| 112 |
-
|
| 113 |
-
All models trained with:
|
| 114 |
-
- Mixed Precision: Available on A100
|
| 115 |
-
- Augmentation: AutoAugment (CIFAR10 policy)
|
| 116 |
-
- Format: SafeTensors (ClamAV safe)
|
| 117 |
-
|
| 118 |
-
Built with geometric consciousness-aware routing using the Devil's Staircase (Beatrix) and pentachoron parameterization.
|
| 119 |
-
|
| 120 |
---
|
| 121 |
|
| 122 |
**Repository maintained by**: [@AbstractPhil](https://huggingface.co/AbstractPhil)
|
| 123 |
|
| 124 |
-
**Latest update**: 2025-11-19
|
|
|
|
| 5 |
- geometric-deep-learning
|
| 6 |
- safetensors
|
| 7 |
- vision-transformer
|
| 8 |
+
- warm-restarts
|
| 9 |
library_name: pytorch
|
| 10 |
datasets:
|
| 11 |
- cifar10
|
|
|
|
| 16 |
|
| 17 |
# vit-beans-v3
|
| 18 |
|
| 19 |
+
**Geometric Deep Learning with Cantor Multihead Fusion + AdamW Warm Restarts**
|
| 20 |
|
| 21 |
+
This repository contains multiple training runs using Cantor fusion architecture with pentachoron structures, geometric routing, and **CosineAnnealingWarmRestarts** for automatic exploration cycles.
|
| 22 |
|
| 23 |
+
## Training Strategy: AdamW + Warm Restarts
|
| 24 |
+
|
| 25 |
+
This model uses **AdamW with Cosine Annealing Warm Restarts** (SGDR):
|
| 26 |
+
- **Drop phase**: LR decays from 0.0003 β 1e-07 over 20 epochs
|
| 27 |
+
- **Restart phase**: LR jumps back to 0.0003 to explore new regions
|
| 28 |
+
- **Cycle multiplier**: Each cycle is 2x longer than previous
|
| 29 |
+
- **Benefits**: Automatic exploration + exploitation, finds better minima, robust training
|
| 30 |
+
|
| 31 |
+
### Restart Schedule
|
| 32 |
```
|
| 33 |
+
Epochs 0-20: LR: 0.0003 β 1e-07 (first cycle)
|
| 34 |
+
Epoch 20: LR: RESTART to 0.0003 π
|
| 35 |
+
Epochs 20-60: LR: 0.0003 β 1e-07 (longer cycle)
|
| 36 |
+
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
```
|
| 38 |
|
| 39 |
## Current Run
|
| 40 |
|
| 41 |
+
**Latest**: `cifar100_weighted_ADAMW_WarmRestart_20251119_200210`
|
| 42 |
- **Dataset**: CIFAR100
|
| 43 |
+
- **Fusion Mode**: weighted
|
| 44 |
+
- **Optimizer**: AdamW (adaptive moments)
|
| 45 |
+
- **Scheduler**: CosineAnnealingWarmRestarts
|
| 46 |
+
- **Architecture**: 6 blocks, 8 heads
|
| 47 |
+
- **Simplex**: 4-simplex (5 vertices)
|
| 48 |
|
| 49 |
## Architecture
|
| 50 |
|
| 51 |
The Cantor Fusion architecture uses:
|
| 52 |
- **Geometric Routing**: Pentachoron (5-simplex) structures for token routing
|
| 53 |
- **Cantor Multihead Fusion**: Multiple fusion heads with geometric attention
|
| 54 |
+
- **Beatrix Consciousness Routing**: Optional consciousness-aware token fusion
|
| 55 |
+
- **SafeTensors Format**: All model weights use SafeTensors (not pickle)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
## Usage
|
|
|
|
|
|
|
| 58 |
```python
|
| 59 |
from huggingface_hub import hf_hub_download
|
| 60 |
from safetensors.torch import load_file
|
|
|
|
| 61 |
|
|
|
|
| 62 |
model_path = hf_hub_download(
|
| 63 |
repo_id="AbstractPhil/vit-beans-v3",
|
| 64 |
filename="runs/YOUR_RUN_NAME/checkpoints/best_model.safetensors"
|
| 65 |
)
|
| 66 |
|
|
|
|
| 67 |
state_dict = load_file(model_path)
|
| 68 |
model.load_state_dict(state_dict)
|
| 69 |
```
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
## Citation
|
| 72 |
```bibtex
|
| 73 |
@misc{vit_beans_v3,
|
| 74 |
author = {AbstractPhil},
|
| 75 |
+
title = {vit-beans-v3: Geometric Deep Learning with Warm Restarts},
|
| 76 |
year = {2025},
|
| 77 |
publisher = {HuggingFace},
|
| 78 |
url = {https://huggingface.co/AbstractPhil/vit-beans-v3}
|
| 79 |
}
|
| 80 |
```
|
| 81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
---
|
| 83 |
|
| 84 |
**Repository maintained by**: [@AbstractPhil](https://huggingface.co/AbstractPhil)
|
| 85 |
|
| 86 |
+
**Latest update**: 2025-11-19 20:02:13
|