Spaces:

Teoman21
/

Lab-5

Sleeping

App Files Files Community

Teoman21 commited on Nov 15, 2025

Commit

e589d21

1 Parent(s): 2bcf1e5

docs: Update README and profiling analysis with installation, usage instructions, and performance benchmarks

Browse files

Files changed (2) hide show

README.md +98 -1
profiling_analysis.ipynb +17 -0

README.md CHANGED Viewed

@@ -4,8 +4,105 @@ emoji: 🧩
 colorFrom: indigo
 colorTo: purple
 sdk: gradio
-app_file: app.py
 pinned: false
 ---

 colorFrom: indigo
 colorTo: purple
 sdk: gradio
+app_file: app.py
 pinned: false
 ---
+# Lab 5 – Mosaic Generator
+A fully refactored and optimized version of the Lab 1 mosaic pipeline. This release adds strict vectorization, caching, profiling evidence, and a polished Gradio front end.
+## 1. Installation
+```bash
+# 1. Create and activate a Python 3.10+ virtual environment
+python3 -m venv .venv
+source .venv/bin/activate
+# 2. Install the project dependencies
+pip install --upgrade pip
+pip install -r requirements.txt
+```
+Optional extras:
+- `pip install line_profiler` if you want to re-run the profiling notebook.
+- `pip install jupyterlab` if you prefer to explore the notebooks interactively.
+## 2. Usage
+### Run the Gradio App Locally
+```bash
+cd lab-5
+python app.py
+```
+Visit http://localhost:7860 to upload an image, tweak grid/tile settings, and view the generated mosaic, quality metrics, and timing stats live.
+### Programmatic Pipeline Example
+```python
+from pathlib import Path
+from PIL import Image
+from src.config import Config
+from src.pipeline import MosaicPipeline
+cfg = Config(
+    grid=32,
+    tile_size=32,
+    out_w=768,
+    out_h=768,
+    tiles_cache_dir="tile_cache"
+)
+pipeline = MosaicPipeline(cfg)
+image = Image.open(Path("test_images/copley.png")).convert("RGB")
+results = pipeline.run_full_pipeline(image)
+results["outputs"]["mosaic"].save("outputs/mosaic.png")
+print(results["timing"], results["metrics"])
+```
+### Profiling Notebook
+Open `profiling_analysis.ipynb` to reproduce the cProfile / line_profiler runs, before-vs-after timings, and plots used in the assessment.
+## 3. Performance Benchmarks (vs Lab 1)
+Benchmarks compare the original Lab 1 implementation (“Legacy”) with this optimized Lab 5 pipeline on the same MacBook Pro (M3 Pro, Python 3.11). Each entry averages three runs with cached tiles.
+| Image Size | Grid | Legacy Time (s) | Lab 5 Time (s) | Speedup |
+|------------|------|-----------------|----------------|---------|
+| 256×256    | 16×16| 0.063           | 0.038          | 1.6×    |
+| 512×512    | 32×32| 0.149           | 0.140          | 1.1×    |
+| 1024×1024  | 64×64| 0.576           | 0.542          | 1.1×    |
+Key optimizations that produced the gains:
+1. **Vectorized grid analysis** – replaces nested loops with `numpy.lib.stride_tricks.block_view` and weighted reductions, eliminating thousands of Python iterations per frame.
+2. **Vectorized tile matching** – stacks the tile bank once, computes LAB/RGB distances with NumPy, and gathers tiles in bulk.
+3. **Tile caching** – persist Hugging Face tiles to disk (`tile_cache/`) and reuse them across runs, avoiding repeated dataset downloads/resizing.
+4. **Configurable quantization** – optional uniform or k-means quantization reduces the color-space variance before tiling.
+Refer to the notebook for raw profiler dumps, bottleneck analysis, and charts illustrating how the optimized pipeline scales more gracefully as grids grow.
+## 4. Deployed Demo
+A live Gradio demo is hosted on Hugging Face Spaces:
+👉 https://huggingface.co/spaces/Teoman21/Lab-5
+The hosted build runs the same `app.py` entry point, with tiles cached in the Space storage. Use it for quick testing or to share results without cloning the repo.
+## 5. Repository Map
+- `app.py` – launches the Gradio interface.
+- `src/` – reusable package (`mosaic.py`, `tiles.py`, `pipeline.py`, `metrics.py`, `gradio_interface.py`, etc.).
+- `tile_cache/` – on-disk cache of Hugging Face tiles (populated at runtime).
+- `test_images/` – sample photos for local testing.
+- `profiling_analysis.ipynb` – notebook covering profiling, benchmarks, and plots.
+- `helpers/download_tiles.py` – utility to pre-download HF dataset tiles.
+## 6. Support & Notes
+- First run may take longer while tiles download from Hugging Face. Subsequent runs use the cache.
+- If you see dataset download errors, set `HF_HOME` or edit `Config.hf_cache_dir` to point at a writable cache folder.
+- The project targets Python 3.10+ and macOS/Linux; Windows should work but has not been profiled extensively.

profiling_analysis.ipynb CHANGED Viewed

@@ -308,6 +308,23 @@
     "lp.print_stats()\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "27c13039",

     "lp.print_stats()\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c0c5ba66",
+   "metadata": {},
+   "source": [
+    "### 6. Bottleneck Analysis\n",
+    "\n",
+    "The profiler runs show that map_tiles_to_grid dominates the legacy runtime. More than 80% of that function’s cost comes from _find_all_tile_matches_vectorized, which loops over every grid cell and compares it to every tile. The second biggest hotspot is analyze_grid_cells, where the old version walks each pixel with nested Python loops. Smaller spikes appear in repeated Image.resize calls and in the metric helpers, but they contribute less than 10% each.\n",
+    "\n",
+    "From these numbers I focused on two priorities:\n",
+    "\n",
+    "1. Replace the nested loops in analyze_grid_cells with a NumPy stride view that computes all cell means at once.\n",
+    "2. Restructure tile matching so we stack the entire tile bank once, compute distances with pure NumPy, and gather results in bulk instead of looping per cell.\n",
+    "\n",
+    "These are the areas where optimization would have the largest impact on total runtime.\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "27c13039",