Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

.gitignore +14 -2
Agents.md +62 -0
README.md +60 -142
alexnet_places365.pth_mlx.npz +3 -0
alexnet_places365_mlx.npz +3 -0
benchmark.py +180 -0
comparisons/torch_dream.py +144 -0
convert.py +217 -0
dream.py +15 -51
dream_video.py +130 -0
mlx_alexnet.py +88 -0
resnet50_places365.pth_mlx.npz +3 -0
resnet50_places365_mlx.npz +3 -0
resnet50_places365_t7_mlx.npz +3 -0
toConvert/.gitkeep +0 -0

.gitignore CHANGED Viewed

@@ -2,13 +2,25 @@ venv/
 __pycache__/
 *.DS_Store
 pics/
-Agents.md
 # Ignore images generally
 *.jpg
 *.png
 *.gif
-# Un-ignore specific folders
 !assets/
 !input/

 __pycache__/
 *.DS_Store
 pics/
+borrowFrom/
+benchmark_results/
+# Large Model Files (Source)
+*.pth
+*.tar
+*.t7
+*.caffemodel
+*.ckpt
+# Ignore contents of toConvert but keep the folder
+toConvert/*
+!toConvert/.gitkeep
 # Ignore images generally
 *.jpg
 *.png
 *.gif
+# Un-ignore specific assets
 !assets/
 !input/

Agents.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# DeepDream MLX: Agents
+## 1. The Mission
+To resurrect the 2015 DeepDream aesthetic using modern 2025 Apple Silicon hardware, bypassing the need for archaic frameworks like Caffe or Torch7 by porting everything to native MLX.
+## 2. Training & Fine-Tuning Plan (The "Punch-Card" Revival)
+In the "classic" days (Intel Caffe era), training a custom DeepDream model meant fine-tuning a GoogLeNet on a dataset of specific objects (e.g., slugs, eyes, cars) so the network would hallucinate *those specific things* when dreaming.
+**The Roadmap for MLX Training:**
+### Phase 1: Dataset Prep
+The `dream-creator` logic (from ProGamerGov) is still sound. We need:
+1.  **Structure:** `dataset/class_name/*.jpg` (Standard PyTorch ImageFolder format).
+2.  **Cleaning:** Remove corrupt images, deduplicate.
+3.  **Resizing:** Resize to ~224x224 or 256x256.
+4.  **Stats:** Calculate Mean/StdDev.
+### Phase 2: The Trainer (`train_dream.py`)
+We need to write a native MLX training loop.
+*   **Base Model:** Load `googlenet_mlx.npz`.
+*   **Architecture:** InceptionV1 (GoogLeNet).
+*   **Layer Freezing:**
+    -   **Critical:** Freeze early layers (`conv1`, `conv2`, `inception3a/b`) to preserve the "visual vocabulary" (edges, textures).
+    -   **Train:** Retrain only the higher layers (`inception4c`, `inception5b`, `fc`) and the Auxiliary Classifiers.
+*   **Auxiliary Classifiers:** Inception has two side-branches (`aux1`, `aux2`) used for training stability. We must support training these or stripping them.
+*   **Loss:** Cross-Entropy.
+*   **Optimizer:** SGD with Momentum (classic) or Adam.
+### Phase 3: "Decorrelation" (The Secret Sauce)
+`dream-creator` confirms that "Color Decorrelation" is key.
+*   **Matrix:** A 3x3 matrix calculated from the training set covariance.
+*   **Effect:** "Whitens" the input image gradients during dreaming, preventing the image from converging to a mono-color blob.
+*   **Implementation:** Port `data_tools/calc_cm.py` to MLX.
+## 3. Animation & Video Strategy
+The "Zoom" video effect is the second pillar of DeepDream.
+*   **Logic:** Feedback Loop.
+    1.  Dream on Frame N.
+    2.  Zoom (Scale + Crop center) Frame N to create Frame N+1.
+    3.  Repeat.
+*   **Implementation:** A dedicated `dream_video.py` script.
+*   **Tech:** Use `scipy.ndimage.zoom` (same as original 2015 code) for the scaling, as MLX's `resize` might differ slightly in sub-pixel interpolation.
+## 4. Available Models & Wishlist
+**Current:**
+*   `alexnet`: The raw, chaotic ancestor.
+*   `googlenet` (InceptionV1): The classic "slugs and dogs".
+*   `vgg16/19`: The "painterly" style transfer beast.
+*   `resnet50`: Modern, sharp, geometric.
+**Wishlist (To Convert):**
+*   `inception_v3`: More refined hallucinations.
+*   `googlenet_places365`: Hallucinates landscapes/interiors. (Verified working via `convert.py --download googlenet` when URL is fixed/found).
+## 5. Hugging Face Hygiene
+*   **Repo:** `NickMystic/DeepDream-MLX`
+*   **LFS:** Track `*.npz`.
+*   **Cleanup:** Ensure `toConvert/` is empty of large raw files.
+*   **Banner:** `assets/deepdream_header.jpg`.
+---
+*Docs derived from deep analysis of `dream-creator` and classic Caffe workflows.*

README.md CHANGED Viewed

@@ -12,184 +12,102 @@ tags:
 - deepdream
 pipeline_tag: image-to-image
 ---
 # DeepDream-MLX
 <img src="assets/deepdream_header.jpg" alt="DeepDream Header" width="100%"/>
-**Status:** Fast. Native.
 **Vibe:** 2015 Hallucinations // 2025 Silicon.
-## ⚡️ Instant Gratification
 ```bash
-# 1. Install Dependencies
-pip install mlx numpy pillow scipy
-# 2. Dream (VGG16 Default)
-python dream.py --input love.jpg
-# 3. Dream (All Models)
-python dream.py --input love.jpg --model all
 ```
-## 🔮 The Lineage
-VGG and GoogLeNet: Cousins from the 2012 Big Bang. One went **Deep**, the other went **Wide**. We ported them all.
 ```text
-╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
-║                                          THE CONVOLUTIONAL ANCESTRY                                                 ║
-╠═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
-║                                                                                                                     ║
-║          ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                                               ║
-║          ┃      LeNet-5 (1998)      ┃  (The Grandfather)                                                            ║
-║          ┗━━━━━━━━━━━━┳━━━━━━━━━━━━━┛                                                                               ║
-║                       │                                                                                             ║
-║                       ▼                                                                                             ║
-║          ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                                               ║
-║          ┃      AlexNet (2012)      ┃  (The Ignition)                                                               ║
-║          ┗━━━━━━━━━━━━┳━━━━━━━━━━━━━┛                                                                               ║
-║                       │                                                                                             ║
-║    ╔══════════════════╩════════════════════════════════════════════════════════════════════════════════╗            ║
-║    ║                                                                                                   ║            ║
-║    ▼                                              ▼                                                    ▼            ║
-║                                                                                                                     ║
-║ ╔══════════════════════════════════╗    ╔══════════════════════════════════╗    ╔═════════════════════════════════╗ ║
-║ ║        THE OXFORD BRANCH         ║    ║        THE GOOGLE BRANCH         ║    ║     THE RESIDUAL REVOLUTION     ║ ║
-║ ║      (Philosophy: "Deeper")      ║    ║      (Philosophy: "Wider")       ║    ║     (Philosophy: "Identity")    ║ ║
-║ ╚═════════════════╦════════════════╝    ╚═════════════════��════════════════╝    ╚════════════════════╦════════════╝ ║
-║                   │                                       │                                          │              ║
-║         ┌─────────┴─────────┐                             │                                          │              ║
-║         │                   │                             │                                          │              ║
-║    ┏━━━━▼━━━━┓         ┏━━━━▼━━━━┓                   ┏━━━━▼━━━━┓                                ┏━━━━▼━━━━┓         ║
-║    ┃  VGG16  ┃         ┃  VGG19  ┃                   ┃Inception┃                                ┃ ResNet  ┃         ║
-║    ┃         ┃         ┃         ┃                   ┃   V1    ┃                                ┃   50    ┃         ║
-║    ┗━━━━┳━━━━┛         ┗━━━━┳━━━━┛                   ┗━━━━┳━━━━┛                                ┗━━━━┳━━━━┛         ║
-║         │                   │                             │                                          │              ║
-║    (The Painter)       (The Stylist)               (The Hallucinator)                             (The Modernist)   ║
-║         │                   │                             │                                          │              ║
-║         ▼                   ▼                             ▼                                          ▼              ║
-║   vgg16_mlx.npz       vgg19_mlx.npz               googlenet_mlx.npz                          resnet50_mlx.npz       ║
-║                                                                                                                     ║
-╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
 ```
-## 🧠 The Models
-*   **VGG16:** General purpose image features.
-*   **GoogLeNet (InceptionV1):** The classic DeepDream model.
-*   **VGG19:** Deeper VGG features.
-*   **ResNet50:** Modern deep features.
 ## 🧪 Recipes
-Copy-paste these to get the exact looks from the header.
-### 1. Classic Inception Patterns (GoogLeNet)
-*This setup targets various Inception layers for recognizable DeepDream shapes.*
 ```bash
-python dream.py --input love.jpg \
-    --model googlenet \
-    --steps 22 \
-    --lr 0.061 \
-    --octaves 4 \
-    --scale 1.8 \
-    --jitter 26 \
-    --smoothing 0.08 \
-    --layers inception3a inception4e inception5b
 ```
-### 2. Rich Textures (VGG16)
-*A VGG16 run for detailed, painterly results.*
 ```bash
-python dream.py --input love.jpg \
-    --model vgg16 \
-    --steps 24 \
-    --lr 0.07 \
-    --octaves 4 \
-    --scale 1.8 \
-    --jitter 36 \
-    --smoothing 0.19 \
-    --layers relu4_2
 ```
-### 3. Layered Patterns (VGG19)
-*A VGG19 run for complex, stylized outputs.*
 ```bash
-python dream.py --input love.jpg \
-    --model vgg19 \
-    --steps 14 \
-    --lr 0.045 \
-    --octaves 2 \
-    --scale 1.5 \
-    --jitter 27 \
-    --smoothing 0.41 \
-    --layers relu5_2
 ```
-### 4. Different VGG16 Vision
-*Another VGG16 setting, exploring alternative features.*
 ```bash
-python dream.py --input love.jpg \
-    --model vgg16 \
-    --steps 24 \
-    --lr 0.069 \
-    --octaves 4 \
-    --scale 1.8 \
-    --jitter 10 \
-    --smoothing 0.41 \
-    --layers relu5_1
-```
-### 5. Sharp Abstract Forms (ResNet50)
-*Modern features from ResNet50 for distinct, edgy results.*
 ```bash
-python dream.py --input love.jpg \
-    --model resnet50 \
-    --steps 22 \
-    --lr 0.13 \
-    --octaves 4 \
-    --scale 2 \
-    --jitter 83 \
-    --smoothing 0.47 \
-    --layers layer3_2 layer3_5
 ```
-## 💾 Weight Conversion & Efficiency
-We didn't just wrap existing libs. We wrote a custom exporter (`export_models.py`) to rip weights from standard PyTorch/Torchvision archives and serialize them into optimized MLX `.npz` arrays.
-### 50% Smaller Weights (FP16)
-We now support **Float16** (Half-Precision) weights by default. This cuts model size in half with zero visual loss for DeepDreaming.
-*   **VGG16:** 528MB → **264MB**
-*   **ResNet50:** 98MB → **49MB**
-`dream.py` automatically detects and loads `_bf16.npz` files if present.
-## 🔎 Where to find models?
-You can convert *any* standard PyTorch model to run here.
-1.  **Torchvision:** The source of our VGG/GoogLeNet/ResNet weights.
-2.  **Hugging Face Hub:** Massive repo of pretrained models.
-3.  **Caffe Model Zoo (Historical):** If you have `.caffemodel` files, load them into PyTorch (using tools like `load_caffe`) and then export.
-## 🎓 Training & Fine-Tuning (TODO)
-Want your DeepDream to see things *differently*? (e.g., dogs instead of slugs?)
-You need to fine-tune the base model on a new dataset.
-**Current Workflow:**
-1.  Train your model in PyTorch (standard ImageNet training or custom dataset).
-2.  Save the `.pth` checkpoint.
-3.  Use `export_models.py` to load your custom checkpoint and export to MLX.
-4.  Dream.
-*A dedicated `train_dream.py` script is on the roadmap.*
 ---
-*NickMystic*

 - deepdream
 pipeline_tag: image-to-image
 ---
 # DeepDream-MLX
 <img src="assets/deepdream_header.jpg" alt="DeepDream Header" width="100%"/>
+**Status:** Fast. Native.
 **Vibe:** 2015 Hallucinations // 2025 Silicon.
+DeepDream-MLX brings the classic psychedelic computer vision algorithm to modern Apple Silicon, running natively on the GPU via the [MLX](https://github.com/ml-explore/mlx) framework. No Caffe, no slow conversion layers—just pure tensor operations.
+## ⚡️ Quick Start
 ```bash
+# 1. Install
+pip install -r requirements.txt
+# 2. Dream (Default VGG16)
+python dream.py --input assets/demo_googlenet.jpg
+# 3. Explore Models
+python dream.py --input assets/demo_googlenet.jpg --model googlenet --layers inception4c
 ```
+## 🔮 The Evolution of Vision
+We support the classic ancestors of modern Computer Vision.
 ```text
+   TIMELINE       MODEL            PARAMS      PHILOSOPHY
+   ──────────────────────────────────────────────────────────
+   1998           LeNet-5          60K         "Digits."
+     │
+     ▼
+   2012           AlexNet          60M         "Deep."
+     │            (Available)
+     │
+     ├────────────┐
+     ▼            ▼
+   2014         2014
+   VGG16        GoogLeNet          7M          "Wide & Efficient."
+   138M         (Inception)
+   "Deeper."
+     │
+     ▼
+   2015
+   ResNet50       25M              "Identity & Residuals."
+   (Modern Standard)
 ```
 ## 🧪 Recipes
+### 1. The Classic (GoogLeNet)
+The original DeepDream look. Eyes, slugs, and pagodas.
 ```bash
+python dream.py --input img.jpg --model googlenet --layers inception4c --octaves 4 --scale 1.4
 ```
+### 2. The Painter (VGG16)
+Dense, rich textures. Great for artistic style transfer-like effects.
 ```bash
+python dream.py --input img.jpg --model vgg16 --layers relu4_3 --steps 20
 ```
+### 3. The Modernist (ResNet50)
+Sharp, geometric, and sometimes abstract architectural hallucinations.
 ```bash
+python dream.py --input img.jpg --model resnet50 --layers layer4_2
 ```
+## 🛠 Advanced Usage
+### Converting Models
+We include a universal converter that ingests standard PyTorch (`.pth`) and legacy Torch7 (`.t7`) models, optimizing them into MLX format (`float16` by default).
 ```bash
+# Convert a local file
+python convert.py --scan path/to/models
+# Download & Convert Places365 (AlexNet, ResNet, etc.)
+python convert.py --download all
+```
+### Benchmarking
+Verify performance on your machine.
 ```bash
+python benchmark.py
 ```
+## ⚖️ Performance (M2 Max)
+| Framework | Model | Precision | Speed |
+| :--- | :--- | :--- | :--- |
+| **MLX** | GoogLeNet | **float16** | **~3.6s** |
+| PyTorch (MPS) | GoogLeNet | float32 | ~4.5s |
+*Benchmarks run at 400px width, 10 iterations.*
 ---
+*Built for the dreamers.*

alexnet_places365.pth_mlx.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:587f2f379063fb722563b86d9e7fea2321119b571c6bff7e09e309abf6dbf0b4
+size 117002764

alexnet_places365_mlx.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:587f2f379063fb722563b86d9e7fea2321119b571c6bff7e09e309abf6dbf0b4
+size 117002764

benchmark.py ADDED Viewed

	@@ -0,0 +1,180 @@

+#!/usr/bin/env python3
+import argparse
+import os
+import subprocess
+import time
+from datetime import datetime
+import json
+# Benchmark Configuration
+MODELS = ["googlenet", "vgg16", "resnet50"] # vgg19 often similar to vgg16, skipping for speed unless requested
+PRECISIONS = ["int8", "bf16", "float32"]
+INPUT_IMAGE = "assets/demo_googlenet.jpg" # Use a standard asset if available, or fallback
+OUTPUT_DIR = "benchmark_results"
+def ensure_asset():
+    """Ensures a test image exists."""
+    if not os.path.exists(INPUT_IMAGE):
+        # Fallback if specific asset missing
+        candidates = [f for f in os.listdir("assets") if f.endswith(".jpg")]
+        if candidates:
+            return os.path.join("assets", candidates[0])
+        else:
+            raise FileNotFoundError("No test image found in assets/")
+    return INPUT_IMAGE
+def get_weight_file(model, precision):
+    """Maps model+precision to expected filename."""
+    suffix = ""
+    if precision == "int8":
+        suffix = "_mlx_int8.npz"
+    elif precision == "bf16":
+        suffix = "_mlx_bf16.npz"
+    elif precision == "float32":
+        suffix = "_mlx.npz"
+    return f"{model}{suffix}"
+def run_benchmark():
+    if not os.path.exists(OUTPUT_DIR):
+        os.makedirs(OUTPUT_DIR)
+    test_img = ensure_asset()
+    results = []
+    print(f"Starting Benchmark on {test_img}...")
+    print(f"{ 'Model':<15} {'Precision':<10} {'Time (s)':<10} {'Status':<10}")
+    print("-" * 50)
+    for model in MODELS:
+        for prec in PRECISIONS:
+            weight_file = get_weight_file(model, prec)
+            if not os.path.exists(weight_file):
+                print(f"{model:<15} {prec:<10} {'---':<10} {'Missing Weights'}")
+                continue
+            # Run dream.py
+            # We use a fixed seed or settings for consistency if possible,
+            # but dream.py is deterministic given same args usually.
+            # We limit steps to 5 for speed, or use default 10? Default 10 is better for realistic timing.
+            out_path = os.path.join(OUTPUT_DIR, f"bench_{model}_{prec}.jpg")
+            cmd = [
+                "python", "dream.py",
+                "--input", test_img,
+                "--output", out_path,
+                "--model", model,
+                "--weights", weight_file,
+                "--steps", "10",
+                "--width", "400"
+            ]
+            start_t = time.time()
+            try:
+                # Capture output to avoid clutter
+                subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+                duration = time.time() - start_t
+                print(f"{model:<15} {prec:<10} {duration:.2f}       {'OK'}")
+                results.append({
+                    "model": model,
+                    "precision": prec,
+                    "time": duration,
+                    "image": out_path
+                })
+            except subprocess.CalledProcessError:
+                print(f"{model:<15} {prec:<10} {'Error':<10} {'Failed'}")
+    # Generate Report
+    generate_report(results)
+    create_composite_image(results)
+def generate_report(results):
+    report_path = os.path.join(OUTPUT_DIR, "BENCHMARK_REPORT.md")
+    with open(report_path, "w") as f:
+        f.write("# DeepDream MLX Benchmark Report\n\n")
+        f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
+        f.write("| Model | Precision | Time (s) | Result |\n")
+        f.write("|-------|-----------|----------|--------|\n")
+        for r in results:
+            rel_img = os.path.basename(r['image'])
+            f.write(f"| {r['model']} | {r['precision']} | {r['time']:.2f} | <img src='{rel_img}' width='100'/> |\n")
+    print(f"\nReport generated at {report_path}")
+def create_composite_image(results):
+    try:
+        from PIL import Image, ImageDraw, ImageFont
+    except ImportError:
+        print("PIL not installed, skipping composite image.")
+        return
+    # Organize data
+    # matrix[model][precision] = image_path
+    matrix = {}
+    all_models = sorted(list(set(r['model'] for r in results)))
+    all_precs = sorted(list(set(r['precision'] for r in results)))
+    for r in results:
+        if r['model'] not in matrix:
+            matrix[r['model']] = {}
+        matrix[r['model']][r['precision']] = r['image']
+    if not matrix:
+        return
+    # Determine sizes
+    # Assume all images roughly same size, read first found
+    sample_img = Image.open(results[0]['image'])
+    w, h = sample_img.size
+    # Layout: Header row (precisions), Left col (models)
+    padding = 50
+    header_height = 60
+    label_width = 120
+    grid_w = label_width + len(all_precs) * (w + padding)
+    grid_h = header_height + len(all_models) * (h + padding)
+    composite = Image.new("RGB", (grid_w, grid_h), (255, 255, 255))
+    draw = ImageDraw.Draw(composite)
+    # Try to load a font, else default
+    try:
+        font = ImageFont.truetype("Arial", 24)
+    except IOError:
+        font = ImageFont.load_default()
+    # Draw Header
+    for i, prec in enumerate(all_precs):
+        x = label_width + i * (w + padding)
+        draw.text((x + w//2 - 20, 20), prec, fill=(0,0,0), font=font)
+    # Draw Rows
+    for j, model in enumerate(all_models):
+        y = header_height + j * (h + padding)
+        # Model Label
+        draw.text((10, y + h//2), model, fill=(0,0,0), font=font)
+        for i, prec in enumerate(all_precs):
+            x = label_width + i * (w + padding)
+            if prec in matrix[model]:
+                img_path = matrix[model][prec]
+                if os.path.exists(img_path):
+                    img = Image.open(img_path)
+                    if img.size != (w, h):
+                        img = img.resize((w, h))
+                    composite.paste(img, (x, y))
+                    # Draw time
+                    time_val = next(r['time'] for r in results if r['model'] == model and r['precision'] == prec)
+                    draw.text((x + 5, y + h + 5), f"{time_val:.2f}s", fill=(0,0,0), font=font)
+    comp_path = os.path.join(OUTPUT_DIR, "benchmark_composite.jpg")
+    composite.save(comp_path)
+    print(f"Composite benchmark image saved to {comp_path}")
+if __name__ == "__main__":
+    run_benchmark()

comparisons/torch_dream.py ADDED Viewed

	@@ -0,0 +1,144 @@

+#!/usr/bin/env python3
+import argparse
+import time
+import torch
+import torch.nn as nn
+from torchvision import models, transforms
+from PIL import Image
+import numpy as np
+import scipy.ndimage as nd
+# device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
+# MPS support for some ops (like rolling) might be tricky or just fall back to CPU.
+# For fairness, we try to use MPS where possible.
+DEVICE = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
+IMAGENET_MEAN = torch.tensor([0.485, 0.456, 0.406]).to(DEVICE).view(1, 3, 1, 1)
+IMAGENET_STD = torch.tensor([0.229, 0.224, 0.225]).to(DEVICE).view(1, 3, 1, 1)
+def preprocess(img_np):
+    # HWC -> CHW, Add batch dim
+    x = torch.from_numpy(img_np).float().permute(2, 0, 1).unsqueeze(0) / 255.0
+    x = x.to(DEVICE)
+    x = (x - IMAGENET_MEAN) / IMAGENET_STD
+    return x
+def deprocess(x):
+    x = x * IMAGENET_STD + IMAGENET_MEAN
+    x = torch.clamp(x, 0, 1)
+    x = x.squeeze(0).permute(1, 2, 0).cpu().detach().numpy()
+    return (x * 255).astype(np.uint8)
+def get_model(name):
+    if name == "googlenet":
+        model = models.googlenet(weights='DEFAULT')
+        layers = ["inception4c"] # Default roughly
+    elif name == "vgg16":
+        model = models.vgg16(weights='DEFAULT')
+        layers = ["features.20"] # relu4_2 roughly
+    elif name == "resnet50":
+        model = models.resnet50(weights='DEFAULT')
+        layers = ["layer4"]
+    else:
+        raise ValueError(name)
+    model.to(DEVICE)
+    model.eval()
+    for param in model.parameters():
+        param.requires_grad = False
+    return model, layers
+class Hook:
+    def __init__(self, module):
+        self.hook = module.register_forward_hook(self.hook_fn)
+        self.activation = None
+    def hook_fn(self, module, input, output):
+        self.activation = output
+    def close(self):
+        self.hook.remove()
+def deepdream(args):
+    img = Image.open(args.input).convert('RGB')
+    if args.width:
+        w, h = img.size
+        scale = args.width / w
+        img = img.resize((args.width, int(h*scale)), Image.LANCZOS)
+    img_np = np.array(img)
+    model, default_layer_names = get_model(args.model)
+    # Hooks
+    hooks = []
+    # Simplified layer selection for benchmark: just use leaf modules if possible
+    # or get by name. For torchvision models, names are tricky.
+    # We'll stick to a simple hardcoded layer for the benchmark comparison.
+    # GoogLeNet inception4c is usually 'inception4c' submodule.
+    target_modules = []
+    if args.model == "googlenet":
+        target_modules = [model.inception4c]
+    elif args.model == "vgg16":
+        target_modules = [model.features[20]] # relu4_2
+    elif args.model == "resnet50":
+        target_modules = [model.layer4]
+    for m in target_modules:
+        hooks.append(Hook(m))
+    input_tensor = preprocess(img_np).requires_grad_(True)
+    print(f"Running Torch ({DEVICE}) Dream on {args.model}...")
+    start_t = time.time()
+    # Octave handling is complex to replicate exactly pixel-perfect with MLX version
+    # due to resize implementation differences.
+    # We will implement a Single Scale run for benchmarking pure iteration speed.
+    # Multi-scale introduces resize overhead which is CPU bound mostly.
+    optimizer = torch.optim.SGD([input_tensor], lr=args.lr)
+    for i in range(args.steps):
+        optimizer.zero_grad()
+        model(input_tensor)
+        loss = 0
+        for h in hooks:
+            act = h.activation
+            loss += act.pow(2).mean()
+        loss.backward()
+        # Gradient Smoothing (Gaussian Blur) would go here.
+        # For benchmark simplicity, we skip explicit smoothing to test raw backprop speed,
+        # or we could add a simple avg pool.
+        # Normalize grad
+        g = input_tensor.grad
+        g /= (torch.std(g) + 1e-8)
+        input_tensor.grad = g
+        optimizer.step()
+        # Clip
+        # (Manual clip to bounds omitted for speed, standard clamp at end)
+    torch.cuda.synchronize() if str(DEVICE) == 'cuda' else None
+    # MPS sync?
+    duration = time.time() - start_t
+    print(f"Time: {duration:.4f}s")
+    out = deprocess(input_tensor)
+    Image.fromarray(out).save(args.output)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--output", default="torch_out.jpg")
+    parser.add_argument("--model", default="googlenet")
+    parser.add_argument("--steps", type=int, default=10)
+    parser.add_argument("--lr", type=float, default=0.05)
+    parser.add_argument("--width", type=int, default=400)
+    args = parser.parse_args()
+    deepdream(args)

convert.py ADDED Viewed

	@@ -0,0 +1,217 @@

+#!/usr/bin/env python3
+"""
+Universal Model Converter for DeepDream-MLX.
+Converts PyTorch (.pth) and Torch7 (.t7) models to MLX (.npz).
+Also supports auto-downloading standard Places365 models.
+Defaults to float16 for optimal performance on Apple Silicon.
+"""
+import os
+import argparse
+import glob
+import numpy as np
+import torch
+import torchvision.models as models
+from torch.hub import download_url_to_file
+# Optional Torchfile for .t7 support
+try:
+    import torchfile
+except ImportError:
+    torchfile = None
+# --- Configuration ---
+PLACES365_URLS = {
+    "alexnet": "http://places2.csail.mit.edu/models_places365/alexnet_places365.pth.tar",
+    "resnet50": "http://places2.csail.mit.edu/models_places365/resnet50_places365.pth.tar",
+    "vgg16": "http://places2.csail.mit.edu/models_places365/vgg16_places365.pth.tar",
+    "googlenet": "http://places2.csail.mit.edu/models_places365/googlenet_places365.pth.tar"
+}
+# --- Helper Functions ---
+def convert_tensor(tensor, target_dtype=np.float16):
+    """Converts a tensor/array to the target numpy dtype."""
+    if isinstance(tensor, torch.Tensor):
+        return tensor.cpu().detach().numpy().astype(target_dtype)
+    elif isinstance(tensor, np.ndarray):
+        return tensor.astype(target_dtype)
+    else:
+        return np.array(tensor).astype(target_dtype)
+def clean_state_dict(state_dict):
+    """
+    Flattens the state dictionary and removes common prefix artifacts
+    like 'module.' from DataParallel wrapping.
+    """
+    new_dict = {}
+    for k, v in state_dict.items():
+        # Remove 'module.' anywhere in the key
+        name = k.replace("module.", "")
+        new_dict[name] = convert_tensor(v)
+    return new_dict
+def get_places365_model_skeleton(arch):
+    """Returns a standard PyTorch model structure for Places365."""
+    if arch == "alexnet":
+        return models.alexnet(num_classes=365)
+    elif arch == "resnet50":
+        return models.resnet50(num_classes=365)
+    elif arch == "vgg16":
+        return models.vgg16(num_classes=365)
+    elif arch == "googlenet":
+        return models.googlenet(num_classes=365, aux_logits=False)
+    else:
+        raise ValueError(f"Unknown architecture: {arch}")
+# --- Conversion Logic ---
+def convert_torch7(filepath, target_dir):
+    if torchfile is None:
+        print(f"⚠️  Skipping {filepath}: 'torchfile' not installed. Run `pip install torchfile`.")
+        return
+    print(f"Processing Torch7 file: {filepath}")
+    try:
+        model_obj = torchfile.load(filepath)
+        converted_state = {}
+        def extract_layers(layer, prefix=""):
+            if hasattr(layer, 'weight') and layer.weight is not None:
+                converted_state[f"{prefix}.weight"] = convert_tensor(layer.weight)
+            if hasattr(layer, 'bias') and layer.bias is not None:
+                converted_state[f"{prefix}.bias"] = convert_tensor(layer.bias)
+            if hasattr(layer, 'modules') and layer.modules:
+                for i, sublayer in enumerate(layer.modules):
+                    # 0-based indexing for compatibility
+                    next_prefix = f"{prefix}.{i}" if prefix else f"{i}"
+                    extract_layers(sublayer, next_prefix)
+        extract_layers(model_obj)
+        if not converted_state:
+            print(f"❌ No weights found in {filepath}.")
+            return
+        name_base = os.path.splitext(os.path.basename(filepath))[0]
+        out_path = os.path.join(target_dir, f"{name_base}_t7_mlx.npz")
+        np.savez(out_path, **converted_state)
+        print(f"✅ Saved {out_path} ({len(converted_state)} tensors)")
+    except Exception as e:
+        print(f"❌ Failed to convert {filepath}: {e}")
+def convert_pytorch(filepath, target_dir):
+    print(f"Processing PyTorch file: {filepath}")
+    try:
+        checkpoint = torch.load(filepath, map_location="cpu")
+        if isinstance(checkpoint, dict) and 'state_dict' in checkpoint:
+            state_dict = checkpoint['state_dict']
+        elif isinstance(checkpoint, dict):
+            state_dict = checkpoint
+        else:
+            print(f"❌ Unknown checkpoint format in {filepath}")
+            return
+        clean_dict = clean_state_dict(state_dict)
+        name_base = os.path.splitext(os.path.basename(filepath))[0]
+        # Avoid double extension if file was .pth.tar
+        if name_base.endswith(".pth"):
+            name_base = os.path.splitext(name_base)[0]
+        out_path = os.path.join(target_dir, f"{name_base}_mlx.npz")
+        np.savez(out_path, **clean_dict)
+        size_mb = os.path.getsize(out_path) / (1024*1024)
+        print(f"✅ Saved {out_path} ({size_mb:.1f} MB)")
+    except Exception as e:
+        print(f"❌ Failed to convert {filepath}: {e}")
+def download_and_convert_places365(arch, download_dir, target_dir):
+    url = PLACES365_URLS.get(arch)
+    if not url:
+        print(f"No URL for {arch}")
+        return
+    filename = os.path.join(download_dir, os.path.basename(url))
+    # 1. Download
+    if not os.path.exists(filename):
+        print(f"Downloading {arch} from {url}...")
+        try:
+            download_url_to_file(url, filename)
+        except Exception as e:
+            print(f"Download failed: {e}")
+            return
+    else:
+        print(f"Found cached {filename}")
+    # 2. Load into standard Skeleton (ensures structural correctness)
+    print(f"Loading {arch} into PyTorch structure...")
+    try:
+        model = get_places365_model_skeleton(arch)
+        checkpoint = torch.load(filename, map_location="cpu")
+        state_dict = checkpoint['state_dict'] if 'state_dict' in checkpoint else checkpoint
+        # Robust Load
+        new_state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
+        try:
+            model.load_state_dict(new_state_dict, strict=True)
+        except:
+            model.load_state_dict(new_state_dict, strict=False)
+        # 3. Export
+        model.eval()
+        final_dict = clean_state_dict(model.state_dict())
+        out_path = os.path.join(target_dir, f"{arch}_places365_mlx.npz")
+        np.savez(out_path, **final_dict)
+        print(f"✅ Saved {out_path}")
+    except Exception as e:
+        print(f"Failed to process {arch}: {e}")
+# --- Main CLI ---
+def main():
+    parser = argparse.ArgumentParser(description="DeepDream-MLX Model Converter")
+    parser.add_argument("--scan", default="toConvert", help="Directory to scan for local files")
+    parser.add_argument("--download", choices=["alexnet", "resnet50", "vgg16", "googlenet", "all"],
+                        help="Download and convert specific Places365 models")
+    parser.add_argument("--dest", default=".", help="Output directory for .npz files")
+    args = parser.parse_args()
+    if not os.path.exists(args.dest):
+        os.makedirs(args.dest)
+    # 1. Handle Downloads
+    if args.download:
+        if not os.path.exists(args.scan):
+            os.makedirs(args.scan)
+        targets = ["alexnet", "resnet50", "vgg16", "googlenet"] if args.download == "all" else [args.download]
+        for t in targets:
+            download_and_convert_places365(t, args.scan, args.dest)
+    # 2. Handle Local Scan
+    if os.path.exists(args.scan):
+        print(f"\nScanning '{args.scan}' for local models...")
+        files = glob.glob(os.path.join(args.scan, "*"))
+        for f in files:
+            if os.path.isdir(f): continue
+            ext = os.path.splitext(f)[1].lower()
+            if ext == ".t7":
+                convert_torch7(f, args.dest)
+            elif ext in [".pth", ".pt", ".tar", ".pkl"]:
+                # If it looks like a downloaded places file we already processed, skip to avoid duplication
+                # heuristic: if we just downloaded it.
+                convert_pytorch(f, args.dest)
+            elif ext in [".caffemodel"]:
+                print(f"⚠️  Skipping Caffe model {os.path.basename(f)} (Not supported)")
+if __name__ == "__main__":
+    main()

dream.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import argparse
 import os
 import time
@@ -13,6 +14,7 @@ from mlx_googlenet import GoogLeNet
 from mlx_resnet50 import ResNet50
 from mlx_vgg16 import VGG16
 from mlx_vgg19 import VGG19
 IMAGENET_MEAN = mx.array([0.485, 0.456, 0.406])
 IMAGENET_STD = mx.array([0.229, 0.224, 0.225])
@@ -176,63 +178,20 @@ def deepdream(
 def get_weights_path(model_name, explicit_path=None):
     if explicit_path:
         return explicit_path
-    # 1. Try int8 (Maximum Efficiency / Smallest)
-    int8_path = f"{model_name}_mlx_int8.npz"
-    if os.path.exists(int8_path):
-        return int8_path
-    # 2. Try bf16 (Standard Efficient)
     bf16_path = f"{model_name}_mlx_bf16.npz"
     if os.path.exists(bf16_path):
         return bf16_path
-    # 3. Try standard float32
-    fp32_path = f"{model_name}_mlx.npz"
-    if os.path.exists(fp32_path):
-        return fp32_path
-    return int8_path # Return preferred default for error message context
 def run_dream_for_model(model_name, args, img_np):
@@ -313,6 +272,11 @@ def run_dream_for_model(model_name, args, img_np):
         weights = get_weights_path("resnet50", args.weights)
         default_layers = ["layer4_2"]
     else:  # googlenet
         model = GoogLeNet()
         weights = get_weights_path("googlenet", args.weights)
@@ -380,7 +344,7 @@ def parse_args():
     p.add_argument(
         "--model",
-        choices=["vgg16", "vgg19", "googlenet", "resnet50", "all"],
         default="vgg16",
         help="Model to use. 'all' runs all models.",
     )
@@ -427,7 +391,7 @@ def main():
     img_np = load_image(args.input, args.width)
     if args.model == "all":
-        models = ["vgg16", "vgg19", "googlenet", "resnet50"]
         if args.output:
             print(
                 "Warning: --output argument ignored because --model='all' was selected."

+#!/usr/bin/env python3
 import argparse
 import os
 import time
 from mlx_resnet50 import ResNet50
 from mlx_vgg16 import VGG16
 from mlx_vgg19 import VGG19
+from mlx_alexnet import AlexNet
 IMAGENET_MEAN = mx.array([0.485, 0.456, 0.406])
 IMAGENET_STD = mx.array([0.229, 0.224, 0.225])
 def get_weights_path(model_name, explicit_path=None):
     if explicit_path:
         return explicit_path
+    # 1. Try standard MLX export (float16/bf16 default)
+    path = f"{model_name}_mlx.npz"
+    if os.path.exists(path):
+        return path
+    # 2. Try explicit bf16 suffix (legacy)
     bf16_path = f"{model_name}_mlx_bf16.npz"
     if os.path.exists(bf16_path):
         return bf16_path
+    return path # Return default for error message context
 def run_dream_for_model(model_name, args, img_np):
         weights = get_weights_path("resnet50", args.weights)
         default_layers = ["layer4_2"]
+    elif model_name == "alexnet":
+        model = AlexNet()
+        weights = get_weights_path("alexnet", args.weights)
+        default_layers = ["relu5"]
     else:  # googlenet
         model = GoogLeNet()
         weights = get_weights_path("googlenet", args.weights)
     p.add_argument(
         "--model",
+        choices=["vgg16", "vgg19", "googlenet", "resnet50", "alexnet", "all"],
         default="vgg16",
         help="Model to use. 'all' runs all models.",
     )
     img_np = load_image(args.input, args.width)
     if args.model == "all":
+        models = ["vgg16", "vgg19", "googlenet", "resnet50", "alexnet"]
         if args.output:
             print(
                 "Warning: --output argument ignored because --model='all' was selected."

dream_video.py ADDED Viewed

	@@ -0,0 +1,130 @@

+#!/usr/bin/env python3
+import argparse
+import os
+import time
+import numpy as np
+import mlx.core as mx
+import scipy.ndimage as nd
+from PIL import Image
+from dream import deepdream, load_image, deprocess, get_weights_path
+from mlx_googlenet import GoogLeNet
+from mlx_resnet50 import ResNet50
+from mlx_vgg16 import VGG16
+from mlx_vgg19 import VGG19
+from mlx_alexnet import AlexNet
+def run_video_dream(args):
+    print(f"--- DeepDream Video Generator ---")
+    print(f"Model: {args.model}")
+    print(f"Zoom: {args.zoom_factor}")
+    print(f"Frames: {args.frames}")
+    # 1. Load Model
+    if args.model == "vgg16":
+        model = VGG16()
+        default_layers = ["relu4_3"]
+    elif args.model == "vgg19":
+        model = VGG19()
+        default_layers = ["relu4_4"]
+    elif args.model == "resnet50":
+        model = ResNet50()
+        default_layers = ["layer4_2"]
+    elif args.model == "alexnet":
+        model = AlexNet()
+        default_layers = ["relu5"]
+    else:
+        model = GoogLeNet()
+        default_layers = ["inception4c"]
+    weights = get_weights_path(args.model, args.weights)
+    if not os.path.exists(weights):
+        print(f"Error: Weights {weights} not found.")
+        return
+    print(f"Loading weights: {weights}")
+    model.load_npz(weights)
+    # 2. Prepare Input
+    img_np = load_image(args.input, args.width)
+    # 3. Prepare Output Dir
+    if not os.path.exists(args.output_dir):
+        os.makedirs(args.output_dir)
+    current_img = img_np.astype(np.float32)
+    # 4. Loop
+    for i in range(args.frames):
+        start_t = time.time()
+        # Dream
+        dreamed = deepdream(
+            model,
+            current_img,
+            layers=args.layers or default_layers,
+            steps=args.steps,
+            lr=args.lr,
+            num_octaves=args.octaves,
+            scale=args.scale,
+            jitter=args.jitter,
+            smoothing=args.smoothing
+        )
+        # Save Frame
+        frame_name = f"frame_{i:04d}.jpg"
+        out_path = os.path.join(args.output_dir, frame_name)
+        Image.fromarray(dreamed).save(out_path)
+        elapsed = time.time() - start_t
+        print(f"Frame {i+1}/{args.frames}: {frame_name} ({elapsed:.2f}s)")
+        # Transform for next frame (Zoom)
+        # Zooming involves:
+        # 1. Scaling up by zoom_factor
+        # 2. Cropping back to original size (center crop)
+        if i < args.frames - 1:
+            # dreamed is (H, W, 3) uint8
+            # Convert back to float for zoom to avoid precision loss
+            next_input = dreamed.astype(np.float32)
+            # Scipy Zoom (order=1 is bilinear, usually sufficient and fast)
+            # Zoom H and W dimensions, keep Channel dimension (zoom=1)
+            zf = args.zoom_factor
+            next_input = nd.zoom(next_input, (zf, zf, 1), order=1)
+            # Crop Center
+            h_new, w_new, _ = next_input.shape
+            h_orig, w_orig, _ = img_np.shape
+            start_h = (h_new - h_orig) // 2
+            start_w = (w_new - w_orig) // 2
+            current_img = next_input[start_h:start_h+h_orig, start_w:start_w+w_orig, :]
+    print(f"\nDone! Frames saved to {args.output_dir}/\n")
+    print(f"To create video (requires ffmpeg):")
+    print(f"ffmpeg -framerate 15 -i {args.output_dir}/frame_%04d.jpg -c:v libx264 -pix_fmt yuv420p video.mp4")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--output_dir", default="frames")
+    parser.add_argument("--frames", type=int, default=30)
+    parser.add_argument("--zoom_factor", type=float, default=1.05)
+    # Shared dream args
+    parser.add_argument("--width", type=int, default=None)
+    parser.add_argument("--model", default="googlenet")
+    parser.add_argument("--weights", default=None)
+    parser.add_argument("--layers", nargs="+ ")
+    parser.add_argument("--steps", type=int, default=5) # Fewer steps for video usually smoother
+    parser.add_argument("--lr", type=float, default=0.05)
+    parser.add_argument("--octaves", type=int, default=2) # Fewer octaves for speed
+    parser.add_argument("--scale", type=float, default=1.4)
+    parser.add_argument("--jitter", type=int, default=32)
+    parser.add_argument("--smoothing", type=float, default=0.5)
+    args = parser.parse_args()
+    run_video_dream(args)

mlx_alexnet.py ADDED Viewed

	@@ -0,0 +1,88 @@

+"""
+AlexNet in MLX with endpoints for relu1, relu2, relu3, relu4, relu5.
+Loads weights from a torchvision-exported npz.
+"""
+import mlx.core as mx
+import mlx.nn as nn
+import numpy as np
+def _conv(in_ch, out_ch, kernel_size, stride=1, padding=0):
+    return nn.Conv2d(
+        in_ch,
+        out_ch,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=padding,
+        bias=True,
+    )
+class AlexNet(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.layers = [
+            _conv(3, 64, kernel_size=11, stride=4, padding=2),  # 0
+            nn.ReLU(),  # 1 (relu1)
+            nn.MaxPool2d(kernel_size=3, stride=2),  # 2
+            _conv(64, 192, kernel_size=5, padding=2),  # 3
+            nn.ReLU(),  # 4 (relu2)
+            nn.MaxPool2d(kernel_size=3, stride=2),  # 5
+            _conv(192, 384, kernel_size=3, padding=1),  # 6
+            nn.ReLU(),  # 7 (relu3)
+            _conv(384, 256, kernel_size=3, padding=1),  # 8
+            nn.ReLU(),  # 9 (relu4)
+            _conv(256, 256, kernel_size=3, padding=1),  # 10
+            nn.ReLU(),  # 11 (relu5)
+            nn.MaxPool2d(kernel_size=3, stride=2),  # 12
+        ]
+        self.endpoint_indices = {
+            "relu1": 1,
+            "relu2": 4,
+            "relu3": 7,
+            "relu4": 9,
+            "relu5": 11,
+        }
+    def forward_with_endpoints(self, x):
+        endpoints = {}
+        for idx, layer in enumerate(self.layers):
+            x = layer(x)
+            for name, i in self.endpoint_indices.items():
+                if idx == i:
+                    endpoints[name] = x
+        return x, endpoints
+    def __call__(self, x):
+        _, endpoints = self.forward_with_endpoints(x)
+        return endpoints
+    def load_npz(self, path: str):
+        data = np.load(path)
+        def load_weight(key, transpose=False):
+            if key in data:
+                w = data[key]
+            elif f"{key}_int8" in data:
+                w_int8 = data[f"{key}_int8"]
+                scale = data[f"{key}_scale"]
+                w = w_int8.astype(scale.dtype) * scale
+            else:
+                raise ValueError(f"Missing key {key} in npz")
+            if transpose and w.ndim == 4:
+                w = np.transpose(w, (0, 2, 3, 1))
+            return mx.array(w)
+        # Map layer indices to 'features.X' in standard torchvision keys
+        conv_indices = [0, 3, 6, 8, 10]
+        for idx in conv_indices:
+            conv = self.layers[idx]
+            weight_key = f"features.{idx}.weight"
+            bias_key = f"features.{idx}.bias"
+            conv.weight = load_weight(weight_key, transpose=True)
+            conv.bias = load_weight(bias_key)

resnet50_places365.pth_mlx.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7e4496e460a4cbec41e02f169c7be9c0e3cebe28036ac917105ba386471c47b
+size 48691562

resnet50_places365_mlx.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7e4496e460a4cbec41e02f169c7be9c0e3cebe28036ac917105ba386471c47b
+size 48691562

resnet50_places365_t7_mlx.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbfc6e4d63fb8824df3a8c60d82581106679b2061a654fd9d9ab62d798b94f99
+size 48536532

toConvert/.gitkeep ADDED Viewed

File without changes