AbstractPhil commited on
Commit
539d5f1
·
verified ·
1 Parent(s): af7add5

Update beatrix-trainB-workshop (Epoch 26, Acc: 0.4096) - div2_gentle_nomixup

Browse files
README.md CHANGED
@@ -4,14 +4,31 @@ tags:
4
  - dual-stream-architecture
5
  - geometric-deep-learning
6
  - fractal-positional-encoding
 
7
  license: mit
8
  ---
9
 
10
- # ViT-Beatrix Dual-Stream: Preserved Geometric Features
11
 
12
- **Flux-inspired dual-stream architecture** that maintains geometric structure throughout the network.
13
 
14
- ## Key Innovation: Dual Processing Streams
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  Unlike standard ViTs that destroy geometric features after injection, this architecture maintains **two parallel processing streams**:
17
 
@@ -20,6 +37,8 @@ Unlike standard ViTs that destroy geometric features after injection, this archi
20
 
21
  The streams cross-communicate via attention without homogenizing features.
22
 
 
 
23
  ## Architecture
24
 
25
  - **Visual Dimension**: 512
@@ -28,10 +47,18 @@ The streams cross-communicate via attention without homogenizing features.
28
  - **Dual Blocks**: 8 layers
29
  - **k-simplex**: 4
30
 
 
 
 
 
 
 
 
 
31
  ## Performance
32
 
33
- - **Best Accuracy**: 0.5517
34
- - **Epoch**: 77
35
  - **Dataset**: CIFAR-100
36
 
37
  ## Usage
@@ -39,7 +66,15 @@ The streams cross-communicate via attention without homogenizing features.
39
  ```python
40
  from geovocab2.train.model.vit_beatrix_dualstream import DualStreamGeometricClassifier
41
  from safetensors.torch import load_file
 
 
 
 
 
 
 
42
 
 
43
  model = DualStreamGeometricClassifier(
44
  num_classes=100,
45
  visual_dim=512,
@@ -47,7 +82,7 @@ model = DualStreamGeometricClassifier(
47
  num_geom_tokens=8
48
  )
49
 
50
- state_dict = load_file("model.safetensors")
51
  model.load_state_dict(state_dict)
52
  ```
53
 
@@ -57,6 +92,11 @@ model.load_state_dict(state_dict)
57
  @misc{vit-beatrix-dualstream,
58
  author = {AbstractPhil},
59
  title = {ViT-Beatrix Dual-Stream: Preserved Geometric Features},
60
- year = {2025}
 
61
  }
62
  ```
 
 
 
 
 
4
  - dual-stream-architecture
5
  - geometric-deep-learning
6
  - fractal-positional-encoding
7
+ - beatrix-family
8
  license: mit
9
  ---
10
 
11
+ # ViT-Beatrix Dual-Stream Family
12
 
13
+ This repository contains the **Beatrix family** of dual-stream vision transformers with preserved geometric features.
14
 
15
+ ## Current Experiment: beatrix-trainB-workshop
16
+
17
+ **Model Path**: `weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/`
18
+
19
+
20
+ ## Training Lineage
21
+
22
+ - **Origin Checkpoint**: `20251008_131339`
23
+ - **Origin Epoch**: 25
24
+ - **Divergence Point**: div2_gentle_nomixup
25
+ - **Experiment Name**: beatrix-trainB-workshop
26
+ - **Training Philosophy**: Gentle Guidance (5% threshold, 5-epoch cooldown, no Mixup)
27
+
28
+ This model was branched from a previous training run to explore different augmentation strategies.
29
+
30
+
31
+ ## Key Innovation: Dual Processing Streams + Geometric Compatibility
32
 
33
  Unlike standard ViTs that destroy geometric features after injection, this architecture maintains **two parallel processing streams**:
34
 
 
37
 
38
  The streams cross-communicate via attention without homogenizing features.
39
 
40
+ **Important:** This model uses discrete geometric simplex structures and is **incompatible with Mixup augmentation** (label interpolation). CutMix is supported (spatial mixing with discrete labels).
41
+
42
  ## Architecture
43
 
44
  - **Visual Dimension**: 512
 
47
  - **Dual Blocks**: 8 layers
48
  - **k-simplex**: 4
49
 
50
+ ## Training Configuration
51
+
52
+ - **Experiment**: beatrix-trainB-workshop
53
+ - **Overfit Threshold**: 5.0%
54
+ - **Augmentation Cooldown**: 5 epochs
55
+ - **Min Accuracy for Augmentation**: 45.0%
56
+ - **Mixup**: Disabled (geometric incompatibility)
57
+
58
  ## Performance
59
 
60
+ - **Best Accuracy**: 0.4096
61
+ - **Current Epoch**: 26
62
  - **Dataset**: CIFAR-100
63
 
64
  ## Usage
 
66
  ```python
67
  from geovocab2.train.model.vit_beatrix_dualstream import DualStreamGeometricClassifier
68
  from safetensors.torch import load_file
69
+ from huggingface_hub import hf_hub_download
70
+
71
+ # Download specific experiment
72
+ model_path = hf_hub_download(
73
+ repo_id="AbstractPhil/vit-beatrix-dualstream",
74
+ filename="weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/model.safetensors"
75
+ )
76
 
77
+ # Load model
78
  model = DualStreamGeometricClassifier(
79
  num_classes=100,
80
  visual_dim=512,
 
82
  num_geom_tokens=8
83
  )
84
 
85
+ state_dict = load_file(model_path)
86
  model.load_state_dict(state_dict)
87
  ```
88
 
 
92
  @misc{vit-beatrix-dualstream,
93
  author = {AbstractPhil},
94
  title = {ViT-Beatrix Dual-Stream: Preserved Geometric Features},
95
+ year = {2025},
96
+ note = {Experiment: beatrix-trainB-workshop}
97
  }
98
  ```
99
+
100
+ ---
101
+
102
+ *Last updated: Epoch 26 | Best Accuracy: 0.4096*
weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "num_classes": 100,
3
+ "img_size": 32,
4
+ "patch_size": 4,
5
+ "visual_dim": 512,
6
+ "geom_dim": 256,
7
+ "k_simplex": 4,
8
+ "depth": 8,
9
+ "num_heads": 8,
10
+ "mlp_ratio": 4.0,
11
+ "dropout": 0.0,
12
+ "num_geom_tokens": 8,
13
+ "pe_levels": 12,
14
+ "pe_features_per_level": 2,
15
+ "pe_smooth_tau": 0.25,
16
+ "simplex_init_method": "regular",
17
+ "simplex_init_scale": 1.0,
18
+ "batch_size": 512,
19
+ "num_epochs": 100,
20
+ "learning_rate": 0.0001,
21
+ "weight_decay": 0.005,
22
+ "warmup_epochs": 10,
23
+ "task_loss_weight": 0.5,
24
+ "flow_loss_weight": 1.0,
25
+ "coherence_loss_weight": 0.3,
26
+ "multiscale_loss_weight": 0.2,
27
+ "use_adaptive_augmentation": true,
28
+ "overfit_threshold": 0.05,
29
+ "augmentation_cooldown_epochs": 5,
30
+ "min_accuracy_for_augmentation": 0.45,
31
+ "mixup_alpha": 0.2,
32
+ "cutmix_alpha": 1.0,
33
+ "device": "cuda",
34
+ "num_workers": 4,
35
+ "pin_memory": true,
36
+ "save_dir": "./checkpoints_dualstream",
37
+ "save_every": 10,
38
+ "use_safetensors": true,
39
+ "timestamp_dirs": true,
40
+ "push_to_hub": true,
41
+ "hub_model_id": "AbstractPhil/vit-beatrix-dualstream",
42
+ "hub_model_name": "beatrix-trainB-workshop",
43
+ "hub_upload_best_only": true,
44
+ "hub_upload_every_n_epochs": 10,
45
+ "use_tensorboard": true,
46
+ "log_dir": "./logs_dualstream",
47
+ "log_every": 50,
48
+ "monitor_stream_health": true,
49
+ "log_stream_norms": true
50
+ }
weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/lineage.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "origin_checkpoint": "/content/checkpoints_dualstream/20251008_131339",
3
+ "origin_epoch": 25,
4
+ "divergence_point": "div2_gentle_nomixup",
5
+ "divergence_timestamp": "20251008_152906_div2_gentle_nomixup",
6
+ "config_changes": {
7
+ "overfit_threshold": 0.05,
8
+ "augmentation_cooldown_epochs": 5,
9
+ "min_accuracy_for_augmentation": 0.45
10
+ }
11
+ }
weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2bab92765229c864cdbaecb8d183fc4d1d37515393ba3ff90120e66b169c6e2
3
+ size 164567960