AbstractPhil commited on
Commit
a95f922
Β·
verified Β·
1 Parent(s): 0ec4d84

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +17 -6
README.md CHANGED
@@ -23,7 +23,7 @@ This repository contains multiple training runs using Cantor fusion architecture
23
  ```
24
  vit-beans-v3/
25
  β”œβ”€β”€ runs/
26
- β”‚ β”œβ”€β”€ cifar10_weighted_TIMESTAMP/
27
  β”‚ β”‚ β”œβ”€β”€ checkpoints/
28
  β”‚ β”‚ β”‚ β”œβ”€β”€ best_model.safetensors
29
  β”‚ β”‚ β”‚ β”œβ”€β”€ best_training_state.pt
@@ -31,17 +31,17 @@ vit-beans-v3/
31
  β”‚ β”‚ β”œβ”€β”€ tensorboard/
32
  β”‚ β”‚ β”œβ”€β”€ config.yaml
33
  β”‚ β”‚ └── README.md
34
- β”‚ β”œβ”€β”€ cifar100_consciousness_TIMESTAMP/
35
- β”‚ β”‚ └── ...
36
  β”‚ └── ...
37
  └── README.md (this file)
38
  ```
39
 
40
  ## Current Run
41
 
42
- **Latest**: `cifar100_weighted_20251119_170816`
43
  - **Dataset**: CIFAR100
44
  - **Fusion Mode**: weighted
 
 
45
  - **Architecture**: 6 blocks, 8 heads
46
  - **Simplex**: 4-simplex (5 vertices)
47
 
@@ -53,6 +53,14 @@ The Cantor Fusion architecture uses:
53
  - **Beatrix Consciousness Routing**: Optional consciousness-aware token fusion using the Devil's Staircase
54
  - **SafeTensors Format**: All model weights use SafeTensors (not pickle) for security
55
 
 
 
 
 
 
 
 
 
56
  ## Usage
57
 
58
  ### Download a Model
@@ -98,8 +106,11 @@ Each run directory contains:
98
 
99
  ## Training Details
100
 
 
 
 
 
101
  All models trained with:
102
- - Optimizer: AdamW
103
  - Mixed Precision: Available on A100
104
  - Augmentation: AutoAugment (CIFAR10 policy)
105
  - Format: SafeTensors (ClamAV safe)
@@ -110,4 +121,4 @@ Built with geometric consciousness-aware routing using the Devil's Staircase (Be
110
 
111
  **Repository maintained by**: [@AbstractPhil](https://huggingface.co/AbstractPhil)
112
 
113
- **Latest update**: 2025-11-19 17:08:18
 
23
  ```
24
  vit-beans-v3/
25
  β”œβ”€β”€ runs/
26
+ β”‚ β”œβ”€β”€ cifar10_weighted_SGD_TIMESTAMP/
27
  β”‚ β”‚ β”œβ”€β”€ checkpoints/
28
  β”‚ β”‚ β”‚ β”œβ”€β”€ best_model.safetensors
29
  β”‚ β”‚ β”‚ β”œβ”€β”€ best_training_state.pt
 
31
  β”‚ β”‚ β”œβ”€β”€ tensorboard/
32
  β”‚ β”‚ β”œβ”€β”€ config.yaml
33
  β”‚ β”‚ └── README.md
 
 
34
  β”‚ └── ...
35
  └── README.md (this file)
36
  ```
37
 
38
  ## Current Run
39
 
40
+ **Latest**: `cifar100_weighted_SGD_20251119_173038`
41
  - **Dataset**: CIFAR100
42
  - **Fusion Mode**: weighted
43
+ - **Optimizer**: SGD (momentum=0.9)
44
+ - **Scheduler**: MultiStepLR [40, 60, 80]
45
  - **Architecture**: 6 blocks, 8 heads
46
  - **Simplex**: 4-simplex (5 vertices)
47
 
 
53
  - **Beatrix Consciousness Routing**: Optional consciousness-aware token fusion using the Devil's Staircase
54
  - **SafeTensors Format**: All model weights use SafeTensors (not pickle) for security
55
 
56
+ ## Training Strategy
57
+
58
+ This model uses the proven **SGD + milestone LR drops** strategy from WideResNet:
59
+ - Initial LR: 0.1
60
+ - Milestones: [40, 60, 80]
61
+ - Decay factor: 0.2 (LR *= 0.2 at each milestone)
62
+ - This causes the dramatic accuracy jumps seen in deep networks!
63
+
64
  ## Usage
65
 
66
  ### Download a Model
 
106
 
107
  ## Training Details
108
 
109
+ Optimizer options:
110
+ - **SGD**: High momentum (0.9), Nesterov, milestone-based LR drops
111
+ - **AdamW**: Weight decay, cosine annealing with warmup
112
+
113
  All models trained with:
 
114
  - Mixed Precision: Available on A100
115
  - Augmentation: AutoAugment (CIFAR10 policy)
116
  - Format: SafeTensors (ClamAV safe)
 
121
 
122
  **Repository maintained by**: [@AbstractPhil](https://huggingface.co/AbstractPhil)
123
 
124
+ **Latest update**: 2025-11-19 17:30:40