DeepDream MLX: Agents
1. The Mission
To resurrect the 2015 DeepDream aesthetic using modern 2025 Apple Silicon hardware, bypassing the need for archaic frameworks like Caffe or Torch7 by porting everything to native MLX.
2. Training & Fine-Tuning Plan (The "Punch-Card" Revival)
In the "classic" days (Intel Caffe era), training a custom DeepDream model meant fine-tuning a GoogLeNet on a dataset of specific objects (e.g., slugs, eyes, cars) so the network would hallucinate those specific things when dreaming.
The Roadmap for MLX Training:
Phase 1: Dataset Prep
The dream-creator logic (from ProGamerGov) is still sound. We need:
- Structure:
dataset/class_name/*.jpg(Standard PyTorch ImageFolder format). - Cleaning: Remove corrupt images, deduplicate.
- Resizing: Resize to ~224x224 or 256x256.
- Stats: Calculate Mean/StdDev.
Phase 2: The Trainer (train_dream.py)
We need to write a native MLX training loop.
- Base Model: Load
googlenet_mlx.npz. - Architecture: InceptionV1 (GoogLeNet).
- Layer Freezing:
- Critical: Freeze early layers (
conv1,conv2,inception3a/b) to preserve the "visual vocabulary" (edges, textures). - Train: Retrain only the higher layers (
inception4c,inception5b,fc) and the Auxiliary Classifiers.
- Critical: Freeze early layers (
- Auxiliary Classifiers: Inception has two side-branches (
aux1,aux2) used for training stability. We must support training these or stripping them. - Loss: Cross-Entropy.
- Optimizer: SGD with Momentum (classic) or Adam.
Phase 3: "Decorrelation" (The Secret Sauce)
dream-creator confirms that "Color Decorrelation" is key.
- Matrix: A 3x3 matrix calculated from the training set covariance.
- Effect: "Whitens" the input image gradients during dreaming, preventing the image from converging to a mono-color blob.
- Implementation: Port
data_tools/calc_cm.pyto MLX.
3. Animation & Video Strategy
The "Zoom" video effect is the second pillar of DeepDream.
- Logic: Feedback Loop.
- Dream on Frame N.
- Zoom (Scale + Crop center) Frame N to create Frame N+1.
- Repeat.
- Implementation: A dedicated
dream_video.pyscript. - Tech: Use
scipy.ndimage.zoom(same as original 2015 code) for the scaling, as MLX'sresizemight differ slightly in sub-pixel interpolation.
4. Available Models & Wishlist
Current:
alexnet: The raw, chaotic ancestor.googlenet(InceptionV1): The classic "slugs and dogs".vgg16/19: The "painterly" style transfer beast.resnet50: Modern, sharp, geometric.
Wishlist (To Convert):
inception_v3: More refined hallucinations.googlenet_places365: Hallucinates landscapes/interiors. (Verified working viaconvert.py --download googlenetwhen URL is fixed/found).
5. Hugging Face Hygiene
- Repo:
NickMystic/DeepDream-MLX - LFS: Track
*.npz. - Cleanup: Ensure
toConvert/is empty of large raw files. - Banner:
assets/deepdream_header.jpg.
Docs derived from deep analysis of dream-creator and classic Caffe workflows.