| # DeepDream MLX: Agents | |
| ## 1. The Mission | |
| To resurrect the 2015 DeepDream aesthetic using modern 2025 Apple Silicon hardware, bypassing the need for archaic frameworks like Caffe or Torch7 by porting everything to native MLX. | |
| ## 2. Training & Fine-Tuning Plan (The "Punch-Card" Revival) | |
| In the "classic" days (Intel Caffe era), training a custom DeepDream model meant fine-tuning a GoogLeNet on a dataset of specific objects (e.g., slugs, eyes, cars) so the network would hallucinate *those specific things* when dreaming. | |
| **The Roadmap for MLX Training:** | |
| ### Phase 1: Dataset Prep | |
| The `dream-creator` logic (from ProGamerGov) is still sound. We need: | |
| 1. **Structure:** `dataset/class_name/*.jpg` (Standard PyTorch ImageFolder format). | |
| 2. **Cleaning:** Remove corrupt images, deduplicate. | |
| 3. **Resizing:** Resize to ~224x224 or 256x256. | |
| 4. **Stats:** Calculate Mean/StdDev. | |
| ### Phase 2: The Trainer (`train_dream.py`) | |
| We need to write a native MLX training loop. | |
| * **Base Model:** Load `googlenet_mlx.npz`. | |
| * **Architecture:** InceptionV1 (GoogLeNet). | |
| * **Layer Freezing:** | |
| - **Critical:** Freeze early layers (`conv1`, `conv2`, `inception3a/b`) to preserve the "visual vocabulary" (edges, textures). | |
| - **Train:** Retrain only the higher layers (`inception4c`, `inception5b`, `fc`) and the Auxiliary Classifiers. | |
| * **Auxiliary Classifiers:** Inception has two side-branches (`aux1`, `aux2`) used for training stability. We must support training these or stripping them. | |
| * **Loss:** Cross-Entropy. | |
| * **Optimizer:** SGD with Momentum (classic) or Adam. | |
| ### Phase 3: "Decorrelation" (The Secret Sauce) | |
| `dream-creator` confirms that "Color Decorrelation" is key. | |
| * **Matrix:** A 3x3 matrix calculated from the training set covariance. | |
| * **Effect:** "Whitens" the input image gradients during dreaming, preventing the image from converging to a mono-color blob. | |
| * **Implementation:** Port `data_tools/calc_cm.py` to MLX. | |
| ## 3. Animation & Video Strategy | |
| The "Zoom" video effect is the second pillar of DeepDream. | |
| * **Logic:** Feedback Loop. | |
| 1. Dream on Frame N. | |
| 2. Zoom (Scale + Crop center) Frame N to create Frame N+1. | |
| 3. Repeat. | |
| * **Implementation:** A dedicated `dream_video.py` script. | |
| * **Tech:** Use `scipy.ndimage.zoom` (same as original 2015 code) for the scaling, as MLX's `resize` might differ slightly in sub-pixel interpolation. | |
| ## 4. Available Models & Wishlist | |
| **Current:** | |
| * `alexnet`: The raw, chaotic ancestor. | |
| * `googlenet` (InceptionV1): The classic "slugs and dogs". | |
| * `vgg16/19`: The "painterly" style transfer beast. | |
| * `resnet50`: Modern, sharp, geometric. | |
| **Wishlist (To Convert):** | |
| * `inception_v3`: More refined hallucinations. | |
| * `googlenet_places365`: Hallucinates landscapes/interiors. (Verified working via `convert.py --download googlenet` when URL is fixed/found). | |
| ## 5. Hugging Face Hygiene | |
| * **Repo:** `NickMystic/DeepDream-MLX` | |
| * **LFS:** Track `*.npz`. | |
| * **Cleanup:** Ensure `toConvert/` is empty of large raw files. | |
| * **Banner:** `assets/deepdream_header.jpg`. | |
| --- | |
| *Docs derived from deep analysis of `dream-creator` and classic Caffe workflows.* | |