DeepDream-MLX / Agents.md
NickMystic's picture
Upload folder using huggingface_hub
2dd52ce verified
# DeepDream MLX: Agents
## 1. The Mission
To resurrect the 2015 DeepDream aesthetic using modern 2025 Apple Silicon hardware, bypassing the need for archaic frameworks like Caffe or Torch7 by porting everything to native MLX.
## 2. Training & Fine-Tuning Plan (The "Punch-Card" Revival)
In the "classic" days (Intel Caffe era), training a custom DeepDream model meant fine-tuning a GoogLeNet on a dataset of specific objects (e.g., slugs, eyes, cars) so the network would hallucinate *those specific things* when dreaming.
**The Roadmap for MLX Training:**
### Phase 1: Dataset Prep
The `dream-creator` logic (from ProGamerGov) is still sound. We need:
1. **Structure:** `dataset/class_name/*.jpg` (Standard PyTorch ImageFolder format).
2. **Cleaning:** Remove corrupt images, deduplicate.
3. **Resizing:** Resize to ~224x224 or 256x256.
4. **Stats:** Calculate Mean/StdDev.
### Phase 2: The Trainer (`train_dream.py`)
We need to write a native MLX training loop.
* **Base Model:** Load `googlenet_mlx.npz`.
* **Architecture:** InceptionV1 (GoogLeNet).
* **Layer Freezing:**
- **Critical:** Freeze early layers (`conv1`, `conv2`, `inception3a/b`) to preserve the "visual vocabulary" (edges, textures).
- **Train:** Retrain only the higher layers (`inception4c`, `inception5b`, `fc`) and the Auxiliary Classifiers.
* **Auxiliary Classifiers:** Inception has two side-branches (`aux1`, `aux2`) used for training stability. We must support training these or stripping them.
* **Loss:** Cross-Entropy.
* **Optimizer:** SGD with Momentum (classic) or Adam.
### Phase 3: "Decorrelation" (The Secret Sauce)
`dream-creator` confirms that "Color Decorrelation" is key.
* **Matrix:** A 3x3 matrix calculated from the training set covariance.
* **Effect:** "Whitens" the input image gradients during dreaming, preventing the image from converging to a mono-color blob.
* **Implementation:** Port `data_tools/calc_cm.py` to MLX.
## 3. Animation & Video Strategy
The "Zoom" video effect is the second pillar of DeepDream.
* **Logic:** Feedback Loop.
1. Dream on Frame N.
2. Zoom (Scale + Crop center) Frame N to create Frame N+1.
3. Repeat.
* **Implementation:** A dedicated `dream_video.py` script.
* **Tech:** Use `scipy.ndimage.zoom` (same as original 2015 code) for the scaling, as MLX's `resize` might differ slightly in sub-pixel interpolation.
## 4. Available Models & Wishlist
**Current:**
* `alexnet`: The raw, chaotic ancestor.
* `googlenet` (InceptionV1): The classic "slugs and dogs".
* `vgg16/19`: The "painterly" style transfer beast.
* `resnet50`: Modern, sharp, geometric.
**Wishlist (To Convert):**
* `inception_v3`: More refined hallucinations.
* `googlenet_places365`: Hallucinates landscapes/interiors. (Verified working via `convert.py --download googlenet` when URL is fixed/found).
## 5. Hugging Face Hygiene
* **Repo:** `NickMystic/DeepDream-MLX`
* **LFS:** Track `*.npz`.
* **Cleanup:** Ensure `toConvert/` is empty of large raw files.
* **Banner:** `assets/deepdream_header.jpg`.
---
*Docs derived from deep analysis of `dream-creator` and classic Caffe workflows.*