stanno / STANNO_IS_NOT.md

Up-to-date with original repo

8f0d906 verified 14 days ago

5.73 kB

	# STANNO: What It Is, What It Isn't

	STANNO trains networks using direct weight modification, not backpropagation. It's specialized for specific tasks where this is useful (anomaly detection, online learning, interpretability). It's not a replacement for PyTorch or TensorFlow.

	---

	## STANNO Works Well For

	### 1. Anomaly Detection & Filtering

	Train on normal data, then score new inputs by reconstruction error. Works reliably in production.

	```python
	from stanno.integration.filter import STANNOFilter

	stanno.fit(normal_embeddings, normal_embeddings, epochs=50)
	filter = STANNOFilter(stanno)
	score = filter.score(new_embedding) # returns [0, 1]: 0=normal, 1=anomaly
	```

	### 2. Online / Continual Learning

	Update weights one sample at a time with no batch accumulation. Fast and interpretable.

	```python
	from stanno.integration.continual import ContinualSTANNO

	cont = ContinualSTANNO(stanno)
	for x_i, y_i in stream:
	loss = cont.observe(x_i, y_i) # single-sample update
	```

	### 3. Interpretable Weight Modification

	See exactly what the trainer does at each synapse — the weight deltas are explicit, not hidden inside autodiff.

	```python
	dW, db = trainer.compute_updates(state) # explicit weight changes
	print(dW) # actual numbers, not gradients
	```

	### 4. Multi-Stage Cascades

	Chain multiple STANNOs into encoder-decoder pipelines or progressive compression networks, then train end-to-end with gradient flow across stage boundaries.

	```python
	from stanno import CascadeSTANNO

	enc = STANNO(STANNOConfig(layers=[768, 256, 64]))
	dec = STANNO(STANNOConfig(layers=[64, 256, 768]))

	ae = CascadeSTANNO([enc, dec])
	ae.fit(embeddings, embeddings, epochs=200) # trains both end-to-end
	```

	---

	## STANNO Does NOT Work Well For

	### Regression (General Function Fitting)

	STANNO is not optimized for regression. If you train on sin(x), you'll get MAE ≈ 0.4–0.5. A standard neural network with Adam easily reaches MAE < 0.01.

	Why? The fixed 4-module trainer applies the same update formula at every step. This works well for the tasks above, but not for learning arbitrary functions.

	Better choice: Use PyTorch, TensorFlow, or scikit-learn.

	### Replacement for PyTorch/TensorFlow

	STANNO intentionally avoids autodiff. If you need GPU acceleration, backpropagation, or access to a model zoo, use a standard framework.

	```python
	# Bad idea
	stanno = STANNO(...) # slow NumPy, no GPU

	# Good idea
	torch.nn.Sequential(...) # fast, GPU, backprop, pretrained weights
	```

	### Standalone Image Generation

	Alone, STANNO is just a small neural network. For image workflows, use the ComfyUI nodes which integrate with Stable Diffusion and provide the full pipeline.

	```python
	# Incomplete
	stanno = STANNO(STANNOConfig(layers=[768, 512, 768])) # just a network

	# Complete (in ComfyUI)
	# STANNOLoad → STANNODreamCond → KSampler → STANNOScoreImages
	```

	---

	## Training Divergence (Why It Happens, How We Guard Against It)

	Direct weight modification can diverge if training runs too long without safeguards. The weights keep changing, accumulate errors, and blow up.

	How we prevent it:
	- Divergence detection: Stop if loss > 100
	- Early stopping: Stop if no improvement for N epochs (default: patience=20)
	- Default epochs: 300 (enough to converge without risking divergence)

	If training stops with a divergence warning, reduce epochs or batch size.

	---

	## Realistic Performance Expectations

	\| Task \| Realistic Performance \| Notes \|
	\|------\|-----------------------\|-------\|
	\| Anomaly detection \| > 90% accuracy \| ✓ Achievable, used in production \|
	\| Online learning \| < 100 steps to converge \| ✓ Fast adaptation \|
	\| Cascades (end-to-end) \| Stable training, gradient flow \| ✓ Works well \|
	\| Sin regression (MAE) \| ≈ 0.4–0.5 \| ✗ Not the right tool — use PyTorch \|
	\| Image reconstruction \| Depends on model size \| ✓ Fine-tuning with ComfyUI nodes \|
	\| General regression \| Baseline only \| ✗ Not optimized \|

	---

	## When to Use STANNO (Decision Tree)

	Do you want to:
	- Detect anomalies in a stream? → Use STANNO + filter ✓
	- Learn from one sample at a time? → Use ContinualSTANNO ✓
	- Train an encoder-decoder pipeline? → Use CascadeSTANNO ✓
	- Fit sin(x) accurately? → Use PyTorch ✗
	- Fine-tune a large pretrained model? → Use PyTorch ✗
	- Generate images from scratch? → Use Stable Diffusion directly ✗
	- Compose STANNO with image generation? → Use ComfyUI nodes ✓

	---

	## FAQ

	Q: Why doesn't STANNO fit sin(x) well?

	A: It's not designed for regression. The fixed 4-module trainer works great for anomaly detection and online learning, but arbitrary function fitting needs backpropagation or evolution. Use PyTorch for that.

	Q: Will longer training improve accuracy?

	A: No. Longer training will diverge. Training has built-in early stopping (patience parameter), so it stops when it's done learning. If you increase epochs, you risk overfitting and divergence.

	Q: Which trainer should I use: Fixed, LocalRule, or Evolutionary?

	A: Start with Fixed — it's stable and interpretable. LocalRule learns per-synapse rules, which can be powerful but also unstable. Evolutionary uses evolutionary strategies and is slower but novel. Experiment for your problem.

	Q: Is STANNO production-ready?

	A: For anomaly detection and online learning: yes. For regression or general purpose training: no. For ComfyUI image workflows: yes, use the nodes.

	---

	## Bottom Line

	STANNO is specialized for anomaly detection, online learning, cascading, and ComfyUI workflows. It's not a general-purpose neural network and not a replacement for PyTorch or TensorFlow. Use it where the strengths match your problem.