Update beatrix-trainB-workshop (Epoch 26, Acc: 0.4096) - div2_gentle_nomixup

Browse files

Files changed (4) hide show

README.md +47 -7
weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/config.json +50 -0
weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/lineage.json +11 -0
weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/model.safetensors +3 -0

README.md CHANGED Viewed

@@ -4,14 +4,31 @@ tags:
 - dual-stream-architecture
 - geometric-deep-learning
 - fractal-positional-encoding
 license: mit
 ---
-# ViT-Beatrix Dual-Stream: Preserved Geometric Features
-**Flux-inspired dual-stream architecture** that maintains geometric structure throughout the network.
-## Key Innovation: Dual Processing Streams
 Unlike standard ViTs that destroy geometric features after injection, this architecture maintains **two parallel processing streams**:
@@ -20,6 +37,8 @@ Unlike standard ViTs that destroy geometric features after injection, this archi
 The streams cross-communicate via attention without homogenizing features.
 ## Architecture
 - **Visual Dimension**: 512
@@ -28,10 +47,18 @@ The streams cross-communicate via attention without homogenizing features.
 - **Dual Blocks**: 8 layers
 - **k-simplex**: 4
 ## Performance
-- **Best Accuracy**: 0.5517
-- **Epoch**: 77
 - **Dataset**: CIFAR-100
 ## Usage
@@ -39,7 +66,15 @@ The streams cross-communicate via attention without homogenizing features.
 ```python
 from geovocab2.train.model.vit_beatrix_dualstream import DualStreamGeometricClassifier
 from safetensors.torch import load_file
 model = DualStreamGeometricClassifier(
     num_classes=100,
     visual_dim=512,
@@ -47,7 +82,7 @@ model = DualStreamGeometricClassifier(
     num_geom_tokens=8
 )
-state_dict = load_file("model.safetensors")
 model.load_state_dict(state_dict)
 ```
@@ -57,6 +92,11 @@ model.load_state_dict(state_dict)
 @misc{vit-beatrix-dualstream,
   author = {AbstractPhil},
   title = {ViT-Beatrix Dual-Stream: Preserved Geometric Features},
-  year = {2025}
 }
 ```

 - dual-stream-architecture
 - geometric-deep-learning
 - fractal-positional-encoding
+- beatrix-family
 license: mit
 ---
+# ViT-Beatrix Dual-Stream Family
+This repository contains the **Beatrix family** of dual-stream vision transformers with preserved geometric features.
+## Current Experiment: beatrix-trainB-workshop
+**Model Path**: `weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/`
+## Training Lineage
+- **Origin Checkpoint**: `20251008_131339`
+- **Origin Epoch**: 25
+- **Divergence Point**: div2_gentle_nomixup
+- **Experiment Name**: beatrix-trainB-workshop
+- **Training Philosophy**: Gentle Guidance (5% threshold, 5-epoch cooldown, no Mixup)
+This model was branched from a previous training run to explore different augmentation strategies.
+## Key Innovation: Dual Processing Streams + Geometric Compatibility
 Unlike standard ViTs that destroy geometric features after injection, this architecture maintains **two parallel processing streams**:
 The streams cross-communicate via attention without homogenizing features.
+**Important:** This model uses discrete geometric simplex structures and is **incompatible with Mixup augmentation** (label interpolation). CutMix is supported (spatial mixing with discrete labels).
 ## Architecture
 - **Visual Dimension**: 512
 - **Dual Blocks**: 8 layers
 - **k-simplex**: 4
+## Training Configuration
+- **Experiment**: beatrix-trainB-workshop
+- **Overfit Threshold**: 5.0%
+- **Augmentation Cooldown**: 5 epochs
+- **Min Accuracy for Augmentation**: 45.0%
+- **Mixup**: Disabled (geometric incompatibility)
 ## Performance
+- **Best Accuracy**: 0.4096
+- **Current Epoch**: 26
 - **Dataset**: CIFAR-100
 ## Usage
 ```python
 from geovocab2.train.model.vit_beatrix_dualstream import DualStreamGeometricClassifier
 from safetensors.torch import load_file
+from huggingface_hub import hf_hub_download
+# Download specific experiment
+model_path = hf_hub_download(
+    repo_id="AbstractPhil/vit-beatrix-dualstream",
+    filename="weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/model.safetensors"
+)
+# Load model
 model = DualStreamGeometricClassifier(
     num_classes=100,
     visual_dim=512,
     num_geom_tokens=8
 )
+state_dict = load_file(model_path)
 model.load_state_dict(state_dict)
 ```
 @misc{vit-beatrix-dualstream,
   author = {AbstractPhil},
   title = {ViT-Beatrix Dual-Stream: Preserved Geometric Features},
+  year = {2025},
+  note = {Experiment: beatrix-trainB-workshop}
 }
 ```
+---
+*Last updated: Epoch 26 | Best Accuracy: 0.4096*

weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "num_classes": 100,
+  "img_size": 32,
+  "patch_size": 4,
+  "visual_dim": 512,
+  "geom_dim": 256,
+  "k_simplex": 4,
+  "depth": 8,
+  "num_heads": 8,
+  "mlp_ratio": 4.0,
+  "dropout": 0.0,
+  "num_geom_tokens": 8,
+  "pe_levels": 12,
+  "pe_features_per_level": 2,
+  "pe_smooth_tau": 0.25,
+  "simplex_init_method": "regular",
+  "simplex_init_scale": 1.0,
+  "batch_size": 512,
+  "num_epochs": 100,
+  "learning_rate": 0.0001,
+  "weight_decay": 0.005,
+  "warmup_epochs": 10,
+  "task_loss_weight": 0.5,
+  "flow_loss_weight": 1.0,
+  "coherence_loss_weight": 0.3,
+  "multiscale_loss_weight": 0.2,
+  "use_adaptive_augmentation": true,
+  "overfit_threshold": 0.05,
+  "augmentation_cooldown_epochs": 5,
+  "min_accuracy_for_augmentation": 0.45,
+  "mixup_alpha": 0.2,
+  "cutmix_alpha": 1.0,
+  "device": "cuda",
+  "num_workers": 4,
+  "pin_memory": true,
+  "save_dir": "./checkpoints_dualstream",
+  "save_every": 10,
+  "use_safetensors": true,
+  "timestamp_dirs": true,
+  "push_to_hub": true,
+  "hub_model_id": "AbstractPhil/vit-beatrix-dualstream",
+  "hub_model_name": "beatrix-trainB-workshop",
+  "hub_upload_best_only": true,
+  "hub_upload_every_n_epochs": 10,
+  "use_tensorboard": true,
+  "log_dir": "./logs_dualstream",
+  "log_every": 50,
+  "monitor_stream_health": true,
+  "log_stream_norms": true
+}

weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/lineage.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "origin_checkpoint": "/content/checkpoints_dualstream/20251008_131339",
+  "origin_epoch": 25,
+  "divergence_point": "div2_gentle_nomixup",
+  "divergence_timestamp": "20251008_152906_div2_gentle_nomixup",
+  "config_changes": {
+    "overfit_threshold": 0.05,
+    "augmentation_cooldown_epochs": 5,
+    "min_accuracy_for_augmentation": 0.45
+  }
+}

weights/beatrix-trainB-workshop/20251008_152906_div2_gentle_nomixup/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2bab92765229c864cdbaecb8d183fc4d1d37515393ba3ff90120e66b169c6e2
+size 164567960