| | --- |
| | license: apache-2.0 |
| | --- |
| | # MobiusNet |
| |
|
| | A vision architecture built on continuous topological principles, replacing traditional activations with wave-based interference gating. |
| |
|
| | ## Overview |
| |
|
| | MobiusNet introduces a fundamentally different approach to neural network design: |
| |
|
| | - **MobiusLens**: Wave superposition as a gating mechanism, replacing standard activations (ReLU, GELU) |
| | - **Thirds Mask**: Cantor-inspired fractal channel suppression for regularization |
| | - **Continuous Topology**: Layers sample a continuous manifold via the `t` parameter, not discrete units |
| | - **Twist Rotations**: Smooth rotation through representation space across network depth |
| | - **Integrator**: The integrator uses GELU in experimentation to enable additional GELU-based nonlinearity. |
| |
|
| | ## Performance |
| |
|
| | | Model | Params | GFLOPs | Tiny ImageNet | |
| | |-------|--------|--------|---------------| |
| | | MobiusNet-Base | 33.7M | 2.69 | TBD | |
| |
|
| | ## Installation |
| |
|
| | ```bash |
| | pip install torch torchvision safetensors huggingface_hub tensorboard tqdm |
| | ``` |
| |
|
| | ## Quick Start |
| |
|
| | ### Training |
| |
|
| | ```python |
| | from mobius_trainer_full import train_tiny_imagenet |
| | |
| | model, best_acc = train_tiny_imagenet( |
| | preset='mobius_base', |
| | epochs=200, |
| | lr=3e-4, |
| | batch_size=128, |
| | use_integrator=True, |
| | data_dir='./data/tiny-imagenet-200', |
| | output_dir='./outputs', |
| | hf_repo='AbstractPhil/mobiusnet', |
| | save_every_n_epochs=10, |
| | upload_every_n_epochs=10, |
| | ) |
| | ``` |
| |
|
| | ### Continue from Checkpoint |
| |
|
| | ```python |
| | # From local directory |
| | model, best_acc = train_tiny_imagenet( |
| | preset='mobius_base', |
| | epochs=200, |
| | continue_from="./outputs/checkpoints/mobius_base_tiny_imagenet/20240101_120000", |
| | ) |
| | |
| | # From HuggingFace (auto-downloads) |
| | model, best_acc = train_tiny_imagenet( |
| | preset='mobius_base', |
| | epochs=200, |
| | continue_from="checkpoints/mobius_base_tiny_imagenet/20240101_120000", |
| | ) |
| | ``` |
| |
|
| | ### Inference |
| |
|
| | ```python |
| | from safetensors.torch import load_file |
| | from mobius_trainer_full import MobiusNet, PRESETS |
| | |
| | # Load model |
| | config = PRESETS['mobius_base'] |
| | model = MobiusNet(num_classes=200, use_integrator=True, **config) |
| | state_dict = load_file("best_model.safetensors") |
| | model.load_state_dict(state_dict) |
| | model.eval() |
| | |
| | # Inference |
| | with torch.no_grad(): |
| | logits = model(image_tensor) |
| | pred = logits.argmax(1) |
| | ``` |
| |
|
| | ## Model Presets |
| |
|
| | | Preset | Channels | Depths | ~Params | |
| | |--------|----------|--------|---------| |
| | | `mobius_tiny_s` | (64, 128, 256) | (2, 2, 2) | 500K | |
| | | `mobius_tiny_m` | (64, 128, 256, 512, 768) | (2, 2, 4, 2, 2) | 11M | |
| | | `mobius_tiny_l` | (96, 192, 384, 768) | (3, 3, 3, 3) | 8M | |
| | | `mobius_base` | (128, 256, 512, 768, 1024) | (2, 2, 2, 2, 2) | 33.7M | |
| |
|
| | ## Architecture |
| |
|
| | ``` |
| | Input |
| | β |
| | βΌ |
| | βββββββββββββββββββββββββββββββββββ |
| | β Stem (Conv β BN) β |
| | βββββββββββββββββββββββββββββββββββ |
| | β |
| | βΌ |
| | βββββββββββββββββββββββββββββββββββ |
| | β Stage 1-N β |
| | β βββββββββββββββββββββββββββββββ β |
| | β β MobiusConvBlock (Γdepth) β β |
| | β β ββ Depthwise-Sep Conv β β |
| | β β ββ BatchNorm β β |
| | β β ββ MobiusLens (wave gate) β β |
| | β β ββ Thirds Mask β β |
| | β β ββ Learned Residual β β |
| | β βββββββββββββββββββββββββββββββ β |
| | β Downsample (stride-2 conv) β |
| | βββββββββββββββββββββββββββββββββββ |
| | β |
| | βΌ |
| | βββββββββββββββββββββββββββββββββββ |
| | β Integrator (Conv β BN β GELU) β β Task collapse |
| | βββββββββββββββββββββββββββββββββββ |
| | β |
| | βΌ |
| | βββββββββββββββββββββββββββββββββββ |
| | β Pool β Linear β Classes β |
| | βββββββββββββββββββββββββββββββββββ |
| | ``` |
| |
|
| | ## Core Components |
| |
|
| | ### MobiusLens |
| |
|
| | Wave-based gating mechanism with three interference paths: |
| |
|
| | ```python |
| | L = wave(phase_l, drift_l) # Left path (+1 drift) |
| | M = wave(phase_m, drift_m) # Middle path (0 drift, ghost) |
| | R = wave(phase_r, drift_r) # Right path (-1 drift) |
| | |
| | # Interference |
| | xor_comp = |L + R - 2*L*R| # Differentiable XOR |
| | and_comp = L * R # Differentiable AND |
| | |
| | # Gating |
| | gate = weighted_sum(L, M, R) * interference_blend |
| | output = input * sigmoid(layernorm(gate)) |
| | ``` |
| |
|
| | The middle path (M) acts as a "ghost" β present but diminished β maintaining gradient continuity while biasing information flow toward L/R edges (Cantor-like structure). |
| |
|
| | ### Thirds Mask |
| |
|
| | Rotating channel suppression inspired by Cantor set construction: |
| |
|
| | ``` |
| | Layer 0: suppress channels [0:C/3] |
| | Layer 1: suppress channels [C/3:2C/3] |
| | Layer 2: suppress channels [2C/3:C] |
| | Layer 3: back to [0:C/3] |
| | ``` |
| |
|
| | Forces redundancy and prevents co-adaptation across channel groups. |
| |
|
| | ### Continuous Topology |
| |
|
| | Each layer samples a continuous manifold: |
| |
|
| | ```python |
| | t = layer_idx / (total_layers - 1) # 0 β 1 |
| | |
| | twist_in_angle = t * Ο |
| | twist_out_angle = -t * Ο |
| | scales = scale_range[0] + t * scale_span |
| | ``` |
| |
|
| | Adding layers = finer sampling of the same underlying structure. |
| |
|
| | ## Checkpoints |
| |
|
| | Saved to: `checkpoints/{variant}_{dataset}/{timestamp}/` |
| |
|
| | ``` |
| | βββ config.json |
| | βββ best_accuracy.json |
| | βββ final_accuracy.json |
| | βββ checkpoints/ |
| | β βββ checkpoint_epoch_0010.pt |
| | β βββ checkpoint_epoch_0010.safetensors |
| | β βββ best_model.pt |
| | β βββ best_model.safetensors |
| | β βββ final_model.pt |
| | β βββ final_model.safetensors |
| | βββ tensorboard/ |
| | ``` |
| |
|
| | ## TensorBoard |
| |
|
| | Monitor training: |
| |
|
| | ```bash |
| | tensorboard --logdir ./outputs/checkpoints |
| | ``` |
| |
|
| | Tracks: |
| | - Loss, train/val accuracy |
| | - Per-layer lens parameters (omega, alpha, twist angles, L/M/R weights) |
| | - Residual weights |
| | - Weight histograms |
| |
|
| | ## Data Setup |
| |
|
| | ### Tiny ImageNet |
| |
|
| | ```bash |
| | wget http://cs231n.stanford.edu/tiny-imagenet-200.zip |
| | unzip tiny-imagenet-200.zip -d ./data/ |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{mobiusnet2026, |
| | title={MobiusNet: Wave-Based Topological Vision Architecture}, |
| | author={AbstractPhil}, |
| | year={2026}, |
| | url={https://huggingface.co/AbstractPhil/mobiusnet} |
| | } |
| | ``` |