Lab-5 / README.md
Teoman21's picture
docs: Update README and profiling analysis with installation, usage instructions, and performance benchmarks
e589d21
---
title: Mosaic Generator
emoji: 🧩
colorFrom: indigo
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---
# Lab 5 – Mosaic Generator
A fully refactored and optimized version of the Lab 1 mosaic pipeline. This release adds strict vectorization, caching, profiling evidence, and a polished Gradio front end.
## 1. Installation
```bash
# 1. Create and activate a Python 3.10+ virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 2. Install the project dependencies
pip install --upgrade pip
pip install -r requirements.txt
```
Optional extras:
- `pip install line_profiler` if you want to re-run the profiling notebook.
- `pip install jupyterlab` if you prefer to explore the notebooks interactively.
## 2. Usage
### Run the Gradio App Locally
```bash
cd lab-5
python app.py
```
Visit http://localhost:7860 to upload an image, tweak grid/tile settings, and view the generated mosaic, quality metrics, and timing stats live.
### Programmatic Pipeline Example
```python
from pathlib import Path
from PIL import Image
from src.config import Config
from src.pipeline import MosaicPipeline
cfg = Config(
grid=32,
tile_size=32,
out_w=768,
out_h=768,
tiles_cache_dir="tile_cache"
)
pipeline = MosaicPipeline(cfg)
image = Image.open(Path("test_images/copley.png")).convert("RGB")
results = pipeline.run_full_pipeline(image)
results["outputs"]["mosaic"].save("outputs/mosaic.png")
print(results["timing"], results["metrics"])
```
### Profiling Notebook
Open `profiling_analysis.ipynb` to reproduce the cProfile / line_profiler runs, before-vs-after timings, and plots used in the assessment.
## 3. Performance Benchmarks (vs Lab 1)
Benchmarks compare the original Lab 1 implementation (β€œLegacy”) with this optimized Lab 5 pipeline on the same MacBook Pro (M3 Pro, Python 3.11). Each entry averages three runs with cached tiles.
| Image Size | Grid | Legacy Time (s) | Lab 5 Time (s) | Speedup |
|------------|------|-----------------|----------------|---------|
| 256Γ—256 | 16Γ—16| 0.063 | 0.038 | 1.6Γ— |
| 512Γ—512 | 32Γ—32| 0.149 | 0.140 | 1.1Γ— |
| 1024Γ—1024 | 64Γ—64| 0.576 | 0.542 | 1.1Γ— |
Key optimizations that produced the gains:
1. **Vectorized grid analysis** – replaces nested loops with `numpy.lib.stride_tricks.block_view` and weighted reductions, eliminating thousands of Python iterations per frame.
2. **Vectorized tile matching** – stacks the tile bank once, computes LAB/RGB distances with NumPy, and gathers tiles in bulk.
3. **Tile caching** – persist Hugging Face tiles to disk (`tile_cache/`) and reuse them across runs, avoiding repeated dataset downloads/resizing.
4. **Configurable quantization** – optional uniform or k-means quantization reduces the color-space variance before tiling.
Refer to the notebook for raw profiler dumps, bottleneck analysis, and charts illustrating how the optimized pipeline scales more gracefully as grids grow.
## 4. Deployed Demo
A live Gradio demo is hosted on Hugging Face Spaces:
πŸ‘‰ https://huggingface.co/spaces/Teoman21/Lab-5
The hosted build runs the same `app.py` entry point, with tiles cached in the Space storage. Use it for quick testing or to share results without cloning the repo.
## 5. Repository Map
- `app.py` – launches the Gradio interface.
- `src/` – reusable package (`mosaic.py`, `tiles.py`, `pipeline.py`, `metrics.py`, `gradio_interface.py`, etc.).
- `tile_cache/` – on-disk cache of Hugging Face tiles (populated at runtime).
- `test_images/` – sample photos for local testing.
- `profiling_analysis.ipynb` – notebook covering profiling, benchmarks, and plots.
- `helpers/download_tiles.py` – utility to pre-download HF dataset tiles.
## 6. Support & Notes
- First run may take longer while tiles download from Hugging Face. Subsequent runs use the cache.
- If you see dataset download errors, set `HF_HOME` or edit `Config.hf_cache_dir` to point at a writable cache folder.
- The project targets Python 3.10+ and macOS/Linux; Windows should work but has not been profiled extensively.