Teoman21 commited on
Commit
e589d21
·
1 Parent(s): 2bcf1e5

docs: Update README and profiling analysis with installation, usage instructions, and performance benchmarks

Browse files
Files changed (2) hide show
  1. README.md +98 -1
  2. profiling_analysis.ipynb +17 -0
README.md CHANGED
@@ -4,8 +4,105 @@ emoji: 🧩
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
- app_file: app.py
8
  pinned: false
9
  ---
10
 
 
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: gradio
7
+ app_file: app.py
8
  pinned: false
9
  ---
10
 
11
+ # Lab 5 – Mosaic Generator
12
 
13
+ A fully refactored and optimized version of the Lab 1 mosaic pipeline. This release adds strict vectorization, caching, profiling evidence, and a polished Gradio front end.
14
+
15
+ ## 1. Installation
16
+
17
+ ```bash
18
+ # 1. Create and activate a Python 3.10+ virtual environment
19
+ python3 -m venv .venv
20
+ source .venv/bin/activate
21
+
22
+ # 2. Install the project dependencies
23
+ pip install --upgrade pip
24
+ pip install -r requirements.txt
25
+ ```
26
+
27
+ Optional extras:
28
+
29
+ - `pip install line_profiler` if you want to re-run the profiling notebook.
30
+ - `pip install jupyterlab` if you prefer to explore the notebooks interactively.
31
+
32
+ ## 2. Usage
33
+
34
+ ### Run the Gradio App Locally
35
+
36
+ ```bash
37
+ cd lab-5
38
+ python app.py
39
+ ```
40
+
41
+ Visit http://localhost:7860 to upload an image, tweak grid/tile settings, and view the generated mosaic, quality metrics, and timing stats live.
42
+
43
+ ### Programmatic Pipeline Example
44
+
45
+ ```python
46
+ from pathlib import Path
47
+ from PIL import Image
48
+ from src.config import Config
49
+ from src.pipeline import MosaicPipeline
50
+
51
+ cfg = Config(
52
+ grid=32,
53
+ tile_size=32,
54
+ out_w=768,
55
+ out_h=768,
56
+ tiles_cache_dir="tile_cache"
57
+ )
58
+ pipeline = MosaicPipeline(cfg)
59
+ image = Image.open(Path("test_images/copley.png")).convert("RGB")
60
+ results = pipeline.run_full_pipeline(image)
61
+ results["outputs"]["mosaic"].save("outputs/mosaic.png")
62
+ print(results["timing"], results["metrics"])
63
+ ```
64
+
65
+ ### Profiling Notebook
66
+
67
+ Open `profiling_analysis.ipynb` to reproduce the cProfile / line_profiler runs, before-vs-after timings, and plots used in the assessment.
68
+
69
+ ## 3. Performance Benchmarks (vs Lab 1)
70
+
71
+ Benchmarks compare the original Lab 1 implementation (“Legacy”) with this optimized Lab 5 pipeline on the same MacBook Pro (M3 Pro, Python 3.11). Each entry averages three runs with cached tiles.
72
+
73
+ | Image Size | Grid | Legacy Time (s) | Lab 5 Time (s) | Speedup |
74
+ |------------|------|-----------------|----------------|---------|
75
+ | 256×256 | 16×16| 0.063 | 0.038 | 1.6× |
76
+ | 512×512 | 32×32| 0.149 | 0.140 | 1.1× |
77
+ | 1024×1024 | 64×64| 0.576 | 0.542 | 1.1× |
78
+
79
+ Key optimizations that produced the gains:
80
+
81
+ 1. **Vectorized grid analysis** – replaces nested loops with `numpy.lib.stride_tricks.block_view` and weighted reductions, eliminating thousands of Python iterations per frame.
82
+ 2. **Vectorized tile matching** – stacks the tile bank once, computes LAB/RGB distances with NumPy, and gathers tiles in bulk.
83
+ 3. **Tile caching** – persist Hugging Face tiles to disk (`tile_cache/`) and reuse them across runs, avoiding repeated dataset downloads/resizing.
84
+ 4. **Configurable quantization** – optional uniform or k-means quantization reduces the color-space variance before tiling.
85
+
86
+ Refer to the notebook for raw profiler dumps, bottleneck analysis, and charts illustrating how the optimized pipeline scales more gracefully as grids grow.
87
+
88
+ ## 4. Deployed Demo
89
+
90
+ A live Gradio demo is hosted on Hugging Face Spaces:
91
+ 👉 https://huggingface.co/spaces/Teoman21/Lab-5
92
+
93
+ The hosted build runs the same `app.py` entry point, with tiles cached in the Space storage. Use it for quick testing or to share results without cloning the repo.
94
+
95
+ ## 5. Repository Map
96
+
97
+ - `app.py` – launches the Gradio interface.
98
+ - `src/` – reusable package (`mosaic.py`, `tiles.py`, `pipeline.py`, `metrics.py`, `gradio_interface.py`, etc.).
99
+ - `tile_cache/` – on-disk cache of Hugging Face tiles (populated at runtime).
100
+ - `test_images/` – sample photos for local testing.
101
+ - `profiling_analysis.ipynb` – notebook covering profiling, benchmarks, and plots.
102
+ - `helpers/download_tiles.py` – utility to pre-download HF dataset tiles.
103
+
104
+ ## 6. Support & Notes
105
+
106
+ - First run may take longer while tiles download from Hugging Face. Subsequent runs use the cache.
107
+ - If you see dataset download errors, set `HF_HOME` or edit `Config.hf_cache_dir` to point at a writable cache folder.
108
+ - The project targets Python 3.10+ and macOS/Linux; Windows should work but has not been profiled extensively.
profiling_analysis.ipynb CHANGED
@@ -308,6 +308,23 @@
308
  "lp.print_stats()\n"
309
  ]
310
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
311
  {
312
  "cell_type": "markdown",
313
  "id": "27c13039",
 
308
  "lp.print_stats()\n"
309
  ]
310
  },
311
+ {
312
+ "cell_type": "markdown",
313
+ "id": "c0c5ba66",
314
+ "metadata": {},
315
+ "source": [
316
+ "### 6. Bottleneck Analysis\n",
317
+ "\n",
318
+ "The profiler runs show that map_tiles_to_grid dominates the legacy runtime. More than 80% of that function’s cost comes from _find_all_tile_matches_vectorized, which loops over every grid cell and compares it to every tile. The second biggest hotspot is analyze_grid_cells, where the old version walks each pixel with nested Python loops. Smaller spikes appear in repeated Image.resize calls and in the metric helpers, but they contribute less than 10% each.\n",
319
+ "\n",
320
+ "From these numbers I focused on two priorities:\n",
321
+ "\n",
322
+ "1. Replace the nested loops in analyze_grid_cells with a NumPy stride view that computes all cell means at once.\n",
323
+ "2. Restructure tile matching so we stack the entire tile bank once, compute distances with pure NumPy, and gather results in bulk instead of looping per cell.\n",
324
+ "\n",
325
+ "These are the areas where optimization would have the largest impact on total runtime.\n"
326
+ ]
327
+ },
328
  {
329
  "cell_type": "markdown",
330
  "id": "27c13039",