|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# MobiusNet |
|
|
|
|
|
A vision architecture built on continuous topological principles, replacing traditional activations with wave-based interference gating. |
|
|
|
|
|
## Overview |
|
|
|
|
|
MobiusNet introduces a fundamentally different approach to neural network design: |
|
|
|
|
|
- **MobiusLens**: Wave superposition as a gating mechanism, replacing standard activations (ReLU, GELU) |
|
|
- **Thirds Mask**: Cantor-inspired fractal channel suppression for regularization |
|
|
- **Continuous Topology**: Layers sample a continuous manifold via the `t` parameter, not discrete units |
|
|
- **Twist Rotations**: Smooth rotation through representation space across network depth |
|
|
- **Integrator**: The integrator uses GELU in experimentation to enable additional GELU-based nonlinearity. |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Model | Params | GFLOPs | Tiny ImageNet | |
|
|
|-------|--------|--------|---------------| |
|
|
| MobiusNet-Base | 33.7M | 2.69 | TBD | |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install torch torchvision safetensors huggingface_hub tensorboard tqdm |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Training |
|
|
|
|
|
```python |
|
|
from mobius_trainer_full import train_tiny_imagenet |
|
|
|
|
|
model, best_acc = train_tiny_imagenet( |
|
|
preset='mobius_base', |
|
|
epochs=200, |
|
|
lr=3e-4, |
|
|
batch_size=128, |
|
|
use_integrator=True, |
|
|
data_dir='./data/tiny-imagenet-200', |
|
|
output_dir='./outputs', |
|
|
hf_repo='AbstractPhil/mobiusnet', |
|
|
save_every_n_epochs=10, |
|
|
upload_every_n_epochs=10, |
|
|
) |
|
|
``` |
|
|
|
|
|
### Continue from Checkpoint |
|
|
|
|
|
```python |
|
|
# From local directory |
|
|
model, best_acc = train_tiny_imagenet( |
|
|
preset='mobius_base', |
|
|
epochs=200, |
|
|
continue_from="./outputs/checkpoints/mobius_base_tiny_imagenet/20240101_120000", |
|
|
) |
|
|
|
|
|
# From HuggingFace (auto-downloads) |
|
|
model, best_acc = train_tiny_imagenet( |
|
|
preset='mobius_base', |
|
|
epochs=200, |
|
|
continue_from="checkpoints/mobius_base_tiny_imagenet/20240101_120000", |
|
|
) |
|
|
``` |
|
|
|
|
|
### Inference |
|
|
|
|
|
```python |
|
|
from safetensors.torch import load_file |
|
|
from mobius_trainer_full import MobiusNet, PRESETS |
|
|
|
|
|
# Load model |
|
|
config = PRESETS['mobius_base'] |
|
|
model = MobiusNet(num_classes=200, use_integrator=True, **config) |
|
|
state_dict = load_file("best_model.safetensors") |
|
|
model.load_state_dict(state_dict) |
|
|
model.eval() |
|
|
|
|
|
# Inference |
|
|
with torch.no_grad(): |
|
|
logits = model(image_tensor) |
|
|
pred = logits.argmax(1) |
|
|
``` |
|
|
|
|
|
## Model Presets |
|
|
|
|
|
| Preset | Channels | Depths | ~Params | |
|
|
|--------|----------|--------|---------| |
|
|
| `mobius_tiny_s` | (64, 128, 256) | (2, 2, 2) | 500K | |
|
|
| `mobius_tiny_m` | (64, 128, 256, 512, 768) | (2, 2, 4, 2, 2) | 11M | |
|
|
| `mobius_tiny_l` | (96, 192, 384, 768) | (3, 3, 3, 3) | 8M | |
|
|
| `mobius_base` | (128, 256, 512, 768, 1024) | (2, 2, 2, 2, 2) | 33.7M | |
|
|
|
|
|
## Architecture |
|
|
|
|
|
``` |
|
|
Input |
|
|
β |
|
|
βΌ |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β Stem (Conv β BN) β |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β |
|
|
βΌ |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β Stage 1-N β |
|
|
β βββββββββββββββββββββββββββββββ β |
|
|
β β MobiusConvBlock (Γdepth) β β |
|
|
β β ββ Depthwise-Sep Conv β β |
|
|
β β ββ BatchNorm β β |
|
|
β β ββ MobiusLens (wave gate) β β |
|
|
β β ββ Thirds Mask β β |
|
|
β β ββ Learned Residual β β |
|
|
β βββββββββββββββββββββββββββββββ β |
|
|
β Downsample (stride-2 conv) β |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β |
|
|
βΌ |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β Integrator (Conv β BN β GELU) β β Task collapse |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β |
|
|
βΌ |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
β Pool β Linear β Classes β |
|
|
βββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
## Core Components |
|
|
|
|
|
### MobiusLens |
|
|
|
|
|
Wave-based gating mechanism with three interference paths: |
|
|
|
|
|
```python |
|
|
L = wave(phase_l, drift_l) # Left path (+1 drift) |
|
|
M = wave(phase_m, drift_m) # Middle path (0 drift, ghost) |
|
|
R = wave(phase_r, drift_r) # Right path (-1 drift) |
|
|
|
|
|
# Interference |
|
|
xor_comp = |L + R - 2*L*R| # Differentiable XOR |
|
|
and_comp = L * R # Differentiable AND |
|
|
|
|
|
# Gating |
|
|
gate = weighted_sum(L, M, R) * interference_blend |
|
|
output = input * sigmoid(layernorm(gate)) |
|
|
``` |
|
|
|
|
|
The middle path (M) acts as a "ghost" β present but diminished β maintaining gradient continuity while biasing information flow toward L/R edges (Cantor-like structure). |
|
|
|
|
|
### Thirds Mask |
|
|
|
|
|
Rotating channel suppression inspired by Cantor set construction: |
|
|
|
|
|
``` |
|
|
Layer 0: suppress channels [0:C/3] |
|
|
Layer 1: suppress channels [C/3:2C/3] |
|
|
Layer 2: suppress channels [2C/3:C] |
|
|
Layer 3: back to [0:C/3] |
|
|
``` |
|
|
|
|
|
Forces redundancy and prevents co-adaptation across channel groups. |
|
|
|
|
|
### Continuous Topology |
|
|
|
|
|
Each layer samples a continuous manifold: |
|
|
|
|
|
```python |
|
|
t = layer_idx / (total_layers - 1) # 0 β 1 |
|
|
|
|
|
twist_in_angle = t * Ο |
|
|
twist_out_angle = -t * Ο |
|
|
scales = scale_range[0] + t * scale_span |
|
|
``` |
|
|
|
|
|
Adding layers = finer sampling of the same underlying structure. |
|
|
|
|
|
## Checkpoints |
|
|
|
|
|
Saved to: `checkpoints/{variant}_{dataset}/{timestamp}/` |
|
|
|
|
|
``` |
|
|
βββ config.json |
|
|
βββ best_accuracy.json |
|
|
βββ final_accuracy.json |
|
|
βββ checkpoints/ |
|
|
β βββ checkpoint_epoch_0010.pt |
|
|
β βββ checkpoint_epoch_0010.safetensors |
|
|
β βββ best_model.pt |
|
|
β βββ best_model.safetensors |
|
|
β βββ final_model.pt |
|
|
β βββ final_model.safetensors |
|
|
βββ tensorboard/ |
|
|
``` |
|
|
|
|
|
## TensorBoard |
|
|
|
|
|
Monitor training: |
|
|
|
|
|
```bash |
|
|
tensorboard --logdir ./outputs/checkpoints |
|
|
``` |
|
|
|
|
|
Tracks: |
|
|
- Loss, train/val accuracy |
|
|
- Per-layer lens parameters (omega, alpha, twist angles, L/M/R weights) |
|
|
- Residual weights |
|
|
- Weight histograms |
|
|
|
|
|
## Data Setup |
|
|
|
|
|
### Tiny ImageNet |
|
|
|
|
|
```bash |
|
|
wget http://cs231n.stanford.edu/tiny-imagenet-200.zip |
|
|
unzip tiny-imagenet-200.zip -d ./data/ |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{mobiusnet2026, |
|
|
title={MobiusNet: Wave-Based Topological Vision Architecture}, |
|
|
author={AbstractPhil}, |
|
|
year={2026}, |
|
|
url={https://huggingface.co/AbstractPhil/mobiusnet} |
|
|
} |
|
|
``` |