AbstractPhil commited on
Commit
854d9b1
·
verified ·
1 Parent(s): 683c5d0

Update beatrix-trainC-chaos-native (Epoch 0, Acc: 0.0253) - chaos_native

Browse files
README.md CHANGED
@@ -10,34 +10,10 @@ license: mit
10
 
11
  # ViT-Beatrix Dual-Stream Family
12
 
13
- This repository contains the **Beatrix family** of dual-stream vision transformers with preserved geometric features.
14
 
15
- ## Current Experiment: beatrix-trainB-workshop
16
 
17
- **Model Path**: `weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/`
18
-
19
-
20
- ## Training Lineage
21
-
22
- - **Origin Checkpoint**: `20251008_131339`
23
- - **Origin Epoch**: 25
24
- - **Divergence Point**: div2_gentle_nomixup
25
- - **Experiment Name**: beatrix-trainB-workshop
26
- - **Training Philosophy**: Gentle Guidance (5% threshold, 5-epoch cooldown, no Mixup)
27
-
28
- This model was branched from a previous training run to explore different augmentation strategies.
29
-
30
-
31
- ## Key Innovation: Dual Processing Streams + Geometric Compatibility
32
-
33
- Unlike standard ViTs that destroy geometric features after injection, this architecture maintains **two parallel processing streams**:
34
-
35
- 1. **Visual Stream** (512D): Processes patch tokens
36
- 2. **Geometric Stream** (256D): Evolves 8 geometric tokens
37
-
38
- The streams cross-communicate via attention without homogenizing features.
39
-
40
- **Important:** This model uses discrete geometric simplex structures and is **incompatible with Mixup augmentation** (label interpolation). CutMix is supported (spatial mixing with discrete labels).
41
 
42
  ## Architecture
43
 
@@ -47,56 +23,12 @@ The streams cross-communicate via attention without homogenizing features.
47
  - **Dual Blocks**: 8 layers
48
  - **k-simplex**: 4
49
 
50
- ## Training Configuration
51
-
52
- - **Experiment**: beatrix-trainB-workshop
53
- - **Overfit Threshold**: 5.0%
54
- - **Augmentation Cooldown**: 5 epochs
55
- - **Min Accuracy for Augmentation**: 45.0%
56
- - **Mixup**: Disabled (geometric incompatibility)
57
-
58
  ## Performance
59
 
60
- - **Best Accuracy**: 0.5139
61
- - **Current Epoch**: 52
62
  - **Dataset**: CIFAR-100
63
 
64
- ## Usage
65
-
66
- ```python
67
- from geovocab2.train.model.vit_beatrix_dualstream import DualStreamGeometricClassifier
68
- from safetensors.torch import load_file
69
- from huggingface_hub import hf_hub_download
70
-
71
- # Download specific experiment
72
- model_path = hf_hub_download(
73
- repo_id="AbstractPhil/vit-beatrix-dualstream",
74
- filename="weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/model.safetensors"
75
- )
76
-
77
- # Load model
78
- model = DualStreamGeometricClassifier(
79
- num_classes=100,
80
- visual_dim=512,
81
- geom_dim=256,
82
- num_geom_tokens=8
83
- )
84
-
85
- state_dict = load_file(model_path)
86
- model.load_state_dict(state_dict)
87
- ```
88
-
89
- ## Citation
90
-
91
- ```bibtex
92
- @misc{vit-beatrix-dualstream,
93
- author = {AbstractPhil},
94
- title = {ViT-Beatrix Dual-Stream: Preserved Geometric Features},
95
- year = {2025},
96
- note = {Experiment: beatrix-trainB-workshop}
97
- }
98
- ```
99
-
100
  ---
101
 
102
- *Last updated: Epoch 52 | Best Accuracy: 0.5139*
 
10
 
11
  # ViT-Beatrix Dual-Stream Family
12
 
13
+ ## Current Experiment: beatrix-trainC-chaos-native
14
 
15
+ **Model Path**: `weights/beatrix-trainC-chaos-native/20251008_163456_chaos_native/`
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Architecture
19
 
 
23
  - **Dual Blocks**: 8 layers
24
  - **k-simplex**: 4
25
 
 
 
 
 
 
 
 
 
26
  ## Performance
27
 
28
+ - **Best Accuracy**: 0.0253
29
+ - **Current Epoch**: 0
30
  - **Dataset**: CIFAR-100
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ---
33
 
34
+ *Last updated: Epoch 0 | Best Accuracy: 0.0253*
weights/beatrix-trainC-chaos-native/20251008_163456_chaos_native/config.json ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "num_classes": 100,
3
+ "img_size": 32,
4
+ "patch_size": 4,
5
+ "visual_dim": 512,
6
+ "geom_dim": 256,
7
+ "k_simplex": 4,
8
+ "depth": 8,
9
+ "num_heads": 8,
10
+ "mlp_ratio": 4.0,
11
+ "dropout": 0.0,
12
+ "num_geom_tokens": 8,
13
+ "pe_levels": 12,
14
+ "pe_features_per_level": 2,
15
+ "pe_smooth_tau": 0.25,
16
+ "simplex_init_method": "regular",
17
+ "simplex_init_scale": 1.0,
18
+ "batch_size": 512,
19
+ "num_epochs": 150,
20
+ "learning_rate": 0.0001,
21
+ "weight_decay": 0.005,
22
+ "warmup_epochs": 10,
23
+ "task_loss_weight": 0.5,
24
+ "flow_loss_weight": 1.5,
25
+ "coherence_loss_weight": 0.5,
26
+ "multiscale_loss_weight": 0.3,
27
+ "use_adaptive_augmentation": false,
28
+ "overfit_threshold": 0.05,
29
+ "augmentation_cooldown_epochs": 5,
30
+ "min_accuracy_for_augmentation": 0.45,
31
+ "mixup_alpha": 0.2,
32
+ "cutmix_alpha": 1.0,
33
+ "use_cutmix_schedule": true,
34
+ "cutmix_schedule": [
35
+ [
36
+ 0,
37
+ 0.2
38
+ ],
39
+ [
40
+ 20,
41
+ 0.5
42
+ ],
43
+ [
44
+ 40,
45
+ 1.0
46
+ ],
47
+ [
48
+ 60,
49
+ 1.2
50
+ ],
51
+ [
52
+ 80,
53
+ 1.5
54
+ ],
55
+ [
56
+ 100,
57
+ 1.8
58
+ ],
59
+ [
60
+ 120,
61
+ 2.0
62
+ ]
63
+ ],
64
+ "device": "cuda",
65
+ "num_workers": 4,
66
+ "pin_memory": true,
67
+ "save_dir": "./checkpoints_dualstream",
68
+ "save_every": 10,
69
+ "use_safetensors": true,
70
+ "timestamp_dirs": true,
71
+ "push_to_hub": true,
72
+ "hub_model_id": "AbstractPhil/vit-beatrix-dualstream",
73
+ "hub_model_name": "beatrix-trainC-chaos-native",
74
+ "hub_upload_best_only": true,
75
+ "hub_upload_every_n_epochs": 10,
76
+ "use_tensorboard": true,
77
+ "log_dir": "./logs_dualstream",
78
+ "log_every": 50,
79
+ "monitor_stream_health": true,
80
+ "log_stream_norms": true
81
+ }
weights/beatrix-trainC-chaos-native/20251008_163456_chaos_native/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0fcd146ef0fc7663306054dd537a49eb360642bf20ae85ad0e48e4cf5777049
3
+ size 164567960