File size: 6,588 Bytes
3cb02ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c05ddb1
3cb02ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e30e73e
3cb02ca
 
 
 
b45b889
3cb02ca
 
b45b889
3cb02ca
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
license: apache-2.0
---
# MobiusNet

A vision architecture built on continuous topological principles, replacing traditional activations with wave-based interference gating.

## Overview

MobiusNet introduces a fundamentally different approach to neural network design:

- **MobiusLens**: Wave superposition as a gating mechanism, replacing standard activations (ReLU, GELU)
- **Thirds Mask**: Cantor-inspired fractal channel suppression for regularization
- **Continuous Topology**: Layers sample a continuous manifold via the `t` parameter, not discrete units
- **Twist Rotations**: Smooth rotation through representation space across network depth
- **Integrator**: The integrator uses GELU in experimentation to enable additional GELU-based nonlinearity.

## Performance

| Model | Params | GFLOPs | Tiny ImageNet |
|-------|--------|--------|---------------|
| MobiusNet-Base | 33.7M | 2.69 | TBD |

## Installation

```bash
pip install torch torchvision safetensors huggingface_hub tensorboard tqdm
```

## Quick Start

### Training

```python
from mobius_trainer_full import train_tiny_imagenet

model, best_acc = train_tiny_imagenet(
    preset='mobius_base',
    epochs=200,
    lr=3e-4,
    batch_size=128,
    use_integrator=True,
    data_dir='./data/tiny-imagenet-200',
    output_dir='./outputs',
    hf_repo='AbstractPhil/mobiusnet',
    save_every_n_epochs=10,
    upload_every_n_epochs=10,
)
```

### Continue from Checkpoint

```python
# From local directory
model, best_acc = train_tiny_imagenet(
    preset='mobius_base',
    epochs=200,
    continue_from="./outputs/checkpoints/mobius_base_tiny_imagenet/20240101_120000",
)

# From HuggingFace (auto-downloads)
model, best_acc = train_tiny_imagenet(
    preset='mobius_base',
    epochs=200,
    continue_from="checkpoints/mobius_base_tiny_imagenet/20240101_120000",
)
```

### Inference

```python
from safetensors.torch import load_file
from mobius_trainer_full import MobiusNet, PRESETS

# Load model
config = PRESETS['mobius_base']
model = MobiusNet(num_classes=200, use_integrator=True, **config)
state_dict = load_file("best_model.safetensors")
model.load_state_dict(state_dict)
model.eval()

# Inference
with torch.no_grad():
    logits = model(image_tensor)
    pred = logits.argmax(1)
```

## Model Presets

| Preset | Channels | Depths | ~Params |
|--------|----------|--------|---------|
| `mobius_tiny_s` | (64, 128, 256) | (2, 2, 2) | 500K |
| `mobius_tiny_m` | (64, 128, 256, 512, 768) | (2, 2, 4, 2, 2) | 11M |
| `mobius_tiny_l` | (96, 192, 384, 768) | (3, 3, 3, 3) | 8M |
| `mobius_base` | (128, 256, 512, 768, 1024) | (2, 2, 2, 2, 2) | 33.7M |

## Architecture

```
Input
  β”‚
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stem (Conv β†’ BN)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”‚
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 1-N                       β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ MobiusConvBlock (Γ—depth)    β”‚ β”‚
β”‚ β”‚  β”œβ”€ Depthwise-Sep Conv      β”‚ β”‚
β”‚ β”‚  β”œβ”€ BatchNorm               β”‚ β”‚
β”‚ β”‚  β”œβ”€ MobiusLens (wave gate)  β”‚ β”‚
β”‚ β”‚  β”œβ”€ Thirds Mask             β”‚ β”‚
β”‚ β”‚  └─ Learned Residual        β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ Downsample (stride-2 conv)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”‚
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Integrator (Conv β†’ BN β†’ GELU)   β”‚  ← Task collapse
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”‚
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Pool β†’ Linear β†’ Classes         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Core Components

### MobiusLens

Wave-based gating mechanism with three interference paths:

```python
L = wave(phase_l, drift_l)   # Left path  (+1 drift)
M = wave(phase_m, drift_m)   # Middle path (0 drift, ghost)
R = wave(phase_r, drift_r)   # Right path (-1 drift)

# Interference
xor_comp = |L + R - 2*L*R|   # Differentiable XOR
and_comp = L * R              # Differentiable AND

# Gating
gate = weighted_sum(L, M, R) * interference_blend
output = input * sigmoid(layernorm(gate))
```

The middle path (M) acts as a "ghost" β€” present but diminished β€” maintaining gradient continuity while biasing information flow toward L/R edges (Cantor-like structure).

### Thirds Mask

Rotating channel suppression inspired by Cantor set construction:

```
Layer 0: suppress channels [0:C/3]
Layer 1: suppress channels [C/3:2C/3]
Layer 2: suppress channels [2C/3:C]
Layer 3: back to [0:C/3]
```

Forces redundancy and prevents co-adaptation across channel groups.

### Continuous Topology

Each layer samples a continuous manifold:

```python
t = layer_idx / (total_layers - 1)  # 0 β†’ 1

twist_in_angle = t * Ο€
twist_out_angle = -t * Ο€
scales = scale_range[0] + t * scale_span
```

Adding layers = finer sampling of the same underlying structure.

## Checkpoints

Saved to: `checkpoints/{variant}_{dataset}/{timestamp}/`

```
β”œβ”€β”€ config.json
β”œβ”€β”€ best_accuracy.json
β”œβ”€β”€ final_accuracy.json
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ checkpoint_epoch_0010.pt
β”‚   β”œβ”€β”€ checkpoint_epoch_0010.safetensors
β”‚   β”œβ”€β”€ best_model.pt
β”‚   β”œβ”€β”€ best_model.safetensors
β”‚   β”œβ”€β”€ final_model.pt
β”‚   └── final_model.safetensors
└── tensorboard/
```

## TensorBoard

Monitor training:

```bash
tensorboard --logdir ./outputs/checkpoints
```

Tracks:
- Loss, train/val accuracy
- Per-layer lens parameters (omega, alpha, twist angles, L/M/R weights)
- Residual weights
- Weight histograms

## Data Setup

### Tiny ImageNet

```bash
wget http://cs231n.stanford.edu/tiny-imagenet-200.zip
unzip tiny-imagenet-200.zip -d ./data/
```

## License

Apache 2.0

## Citation

```bibtex
@misc{mobiusnet2026,
  title={MobiusNet: Wave-Based Topological Vision Architecture},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/mobiusnet}
}
```