| # STANNO: What It Is, What It Isn't |
|
|
| STANNO trains networks using direct weight modification, not backpropagation. It's specialized for specific tasks where this is useful (anomaly detection, online learning, interpretability). It's not a replacement for PyTorch or TensorFlow. |
|
|
| --- |
|
|
| ## STANNO Works Well For |
|
|
| ### 1. Anomaly Detection & Filtering |
|
|
| Train on normal data, then score new inputs by reconstruction error. Works reliably in production. |
|
|
| ```python |
| from stanno.integration.filter import STANNOFilter |
| |
| stanno.fit(normal_embeddings, normal_embeddings, epochs=50) |
| filter = STANNOFilter(stanno) |
| score = filter.score(new_embedding) # returns [0, 1]: 0=normal, 1=anomaly |
| ``` |
|
|
| ### 2. Online / Continual Learning |
|
|
| Update weights one sample at a time with no batch accumulation. Fast and interpretable. |
|
|
| ```python |
| from stanno.integration.continual import ContinualSTANNO |
| |
| cont = ContinualSTANNO(stanno) |
| for x_i, y_i in stream: |
| loss = cont.observe(x_i, y_i) # single-sample update |
| ``` |
|
|
| ### 3. Interpretable Weight Modification |
|
|
| See exactly what the trainer does at each synapse β the weight deltas are explicit, not hidden inside autodiff. |
|
|
| ```python |
| dW, db = trainer.compute_updates(state) # explicit weight changes |
| print(dW) # actual numbers, not gradients |
| ``` |
|
|
| ### 4. Multi-Stage Cascades |
|
|
| Chain multiple STANNOs into encoder-decoder pipelines or progressive compression networks, then train end-to-end with gradient flow across stage boundaries. |
|
|
| ```python |
| from stanno import CascadeSTANNO |
| |
| enc = STANNO(STANNOConfig(layers=[768, 256, 64])) |
| dec = STANNO(STANNOConfig(layers=[64, 256, 768])) |
| |
| ae = CascadeSTANNO([enc, dec]) |
| ae.fit(embeddings, embeddings, epochs=200) # trains both end-to-end |
| ``` |
|
|
| --- |
|
|
| ## STANNO Does NOT Work Well For |
|
|
| ### Regression (General Function Fitting) |
|
|
| STANNO is not optimized for regression. If you train on sin(x), you'll get MAE β 0.4β0.5. A standard neural network with Adam easily reaches MAE < 0.01. |
|
|
| **Why?** The fixed 4-module trainer applies the same update formula at every step. This works well for the tasks above, but not for learning arbitrary functions. |
|
|
| **Better choice:** Use PyTorch, TensorFlow, or scikit-learn. |
|
|
| ### Replacement for PyTorch/TensorFlow |
|
|
| STANNO intentionally avoids autodiff. If you need GPU acceleration, backpropagation, or access to a model zoo, use a standard framework. |
|
|
| ```python |
| # Bad idea |
| stanno = STANNO(...) # slow NumPy, no GPU |
| |
| # Good idea |
| torch.nn.Sequential(...) # fast, GPU, backprop, pretrained weights |
| ``` |
|
|
| ### Standalone Image Generation |
|
|
| Alone, STANNO is just a small neural network. For image workflows, use the ComfyUI nodes which integrate with Stable Diffusion and provide the full pipeline. |
|
|
| ```python |
| # Incomplete |
| stanno = STANNO(STANNOConfig(layers=[768, 512, 768])) # just a network |
| |
| # Complete (in ComfyUI) |
| # STANNOLoad β STANNODreamCond β KSampler β STANNOScoreImages |
| ``` |
|
|
| --- |
|
|
| ## Training Divergence (Why It Happens, How We Guard Against It) |
|
|
| Direct weight modification can diverge if training runs too long without safeguards. The weights keep changing, accumulate errors, and blow up. |
|
|
| **How we prevent it:** |
| - Divergence detection: Stop if loss > 100 |
| - Early stopping: Stop if no improvement for N epochs (default: patience=20) |
| - Default epochs: 300 (enough to converge without risking divergence) |
|
|
| If training stops with a divergence warning, reduce epochs or batch size. |
|
|
| --- |
|
|
| ## Realistic Performance Expectations |
|
|
| | Task | Realistic Performance | Notes | |
| |------|-----------------------|-------| |
| | Anomaly detection | > 90% accuracy | β Achievable, used in production | |
| | Online learning | < 100 steps to converge | β Fast adaptation | |
| | Cascades (end-to-end) | Stable training, gradient flow | β Works well | |
| | Sin regression (MAE) | β 0.4β0.5 | β Not the right tool β use PyTorch | |
| | Image reconstruction | Depends on model size | β Fine-tuning with ComfyUI nodes | |
| | General regression | Baseline only | β Not optimized | |
|
|
| --- |
|
|
| ## When to Use STANNO (Decision Tree) |
|
|
| **Do you want to:** |
| - Detect anomalies in a stream? β Use STANNO + filter β |
| - Learn from one sample at a time? β Use ContinualSTANNO β |
| - Train an encoder-decoder pipeline? β Use CascadeSTANNO β |
| - Fit sin(x) accurately? β Use PyTorch β |
| - Fine-tune a large pretrained model? β Use PyTorch β |
| - Generate images from scratch? β Use Stable Diffusion directly β |
| - Compose STANNO with image generation? β Use ComfyUI nodes β |
|
|
| --- |
|
|
| ## FAQ |
|
|
| **Q: Why doesn't STANNO fit sin(x) well?** |
|
|
| A: It's not designed for regression. The fixed 4-module trainer works great for anomaly detection and online learning, but arbitrary function fitting needs backpropagation or evolution. Use PyTorch for that. |
|
|
| **Q: Will longer training improve accuracy?** |
|
|
| A: No. Longer training will diverge. Training has built-in early stopping (patience parameter), so it stops when it's done learning. If you increase epochs, you risk overfitting and divergence. |
|
|
| **Q: Which trainer should I use: Fixed, LocalRule, or Evolutionary?** |
|
|
| A: Start with **Fixed** β it's stable and interpretable. **LocalRule** learns per-synapse rules, which can be powerful but also unstable. **Evolutionary** uses evolutionary strategies and is slower but novel. Experiment for your problem. |
|
|
| **Q: Is STANNO production-ready?** |
|
|
| A: For anomaly detection and online learning: **yes**. For regression or general purpose training: **no**. For ComfyUI image workflows: **yes, use the nodes**. |
|
|
| --- |
|
|
| ## Bottom Line |
|
|
| STANNO is specialized for anomaly detection, online learning, cascading, and ComfyUI workflows. It's not a general-purpose neural network and not a replacement for PyTorch or TensorFlow. Use it where the strengths match your problem. |
|
|