Spaces:

LevyJonas
/

SurfaceChangePredictor

Sleeping

App Files Files Community

SurfaceChangePredictor / README.md

LevyJonas

Update README.md

8afa490 verified 27 days ago

preview code

raw

history blame contribute delete

10.2 kB

	---
	title: Satellite Patch Retrieve + Generate
	emoji: 🛰️
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: "5.50.0"
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---

	# Final Project Summary (Satellite Patch Retrieval + Generation)

	This document summarizes Parts 1–4 of our project: dataset generation, EDA, embeddings, and the end-to-end pipeline.

	---

	## Part 1 — Synthetic Data Generation (with key terms)

	In Part 1, we built a synthetic satellite-like image dataset using a pre-trained Hugging Face generative model. We used `stabilityai/sd-turbo` (a fast Stable Diffusion “Turbo” model) to generate 30 land-type classes with 50 images per class (1500 images total). Each label had its own prompt (e.g., forest, water, urban, runway), and we used a negative prompt to reduce unwanted artifacts such as text, logos, or cartoonish styles. The images were saved in a clean folder structure (`images/<label>/...jpg`) and documented in `metadata.csv` (`id`, `filename`, `label`, `prompt`, `seed`, `model_id`) so later parts (EDA, embeddings, and the app) could load and reuse the dataset easily.

	### Key terms used
	- Diffusers: Hugging Face library providing ready-to-use pipelines for diffusion-based generative models (e.g., Stable Diffusion). It loads the model and generates images from prompts.
	- Transformers: Hugging Face library for Transformer-based models across text and vision. Used both as a dependency and later for embedding models (CLIP/ViT/DINOv2).
	- Tokenizers: Converts text prompts into tokens/IDs the model can process; required for text-conditioned models (e.g., text-to-image).
	- Pillow (PIL): Python imaging library for loading/manipulating/saving images (JPG/PNG), resizing, and file I/O.
	- `stabilityai/sd-turbo`: Chosen because it is optimized for speed and can generate strong results with 1–2 inference steps, enabling fast large-scale dataset creation.

	---

	## Part 2 — Exploratory Data Analysis (EDA)

	- Loaded and inspected metadata: Read `metadata.csv` (1500 rows) with expected columns (`id`, `filename`, `label`, `prompt`, `seed`, `model_id`) and confirmed 30 classes.
	- Integrity validation: Verified 0 missing image files, 0 duplicate ids, 0 duplicate filenames, and 0 duplicate (label, seed) pairs.
	- Class balance check: Confirmed a perfectly balanced dataset with 50 images per label (min/max = 50/50).
	- Image consistency: Confirmed all images have the same resolution (384×384).
	- Global image statistics: Computed per-image RGB mean/std, brightness (luminance proxy), and a sharpness proxy (gradient-based), then reviewed distributions and summaries.
	- Outlier analysis: Observed meaningful extremes consistent with labels:
	- darkest samples mainly DenseForest
	- brightest samples mainly SnowIce
	- lowest-sharpness samples often from smoother-texture classes like Grassland / DesertSand / SeaOpenWater
	- Class-level insights: Aggregated statistics by label (brightness/color tendencies) and used a simple PCA projection to visualize similarity/overlap between visually related classes.

	---

	## Part 3 — Embeddings (Similarity Search)

	- Goal: Convert each satellite patch image into a compact vector (embedding) to enable similarity search / retrieval and support the later app pipeline.
	- Models tested (HF backbones):
	- CLIP ViT-B/32 (`openai/clip-vit-base-patch32`)
	- ViT-Base (`google/vit-base-patch16-224-in21k`)
	- DINOv2-Small (`facebook/dinov2-small`)
	- Embedding extraction:
	- Used the CLS token from `last_hidden_state` as a single global image representation (standard for ViT-style models).
	- Applied L2-normalization so cosine similarity becomes a fast dot product (stable and efficient retrieval).
	- Evaluation metric (retrieval-focused): `label_agree@5` and `label_agree@10`
	- For each image, retrieve its top-k nearest neighbors (cosine similarity).
	- Measure the fraction of neighbors with the same label as the query.
	- Average across all 1,500 images.
	- This measures retrieval quality directly (not classifier accuracy).
	- Key results (quality + efficiency):
	- DINOv2-Small performed best: `agree@5 ≈ 0.9247`, `agree@10 ≈ 0.9006`
	- Also produced smaller embeddings (384-dim) than CLIP/ViT (768-dim), reducing storage and improving retrieval efficiency.
	- Selected DINOv2-Small as the optimal embedding model.
	- Saved outputs (reusable):
	- Embeddings: `*_embeddings.npy` (NumPy)
	- Metadata mapping: `*_metadata.csv` (CSV)
	- Comparison table: `embedding_model_comparison.csv` (CSV)
	- Qualitative validation:
	- PCA scatter plot to visualize clustering in 2D (sanity check for overlap/separability).
	- Nearest-neighbor gallery to confirm retrieved results make sense visually and align with labels.

	---

	## Part 4 — End-to-End Pipeline (Retrieve + Generate)

	- Goal: Build a production-style Input → Processing → Output pipeline that can be plugged directly into an app.
	The user provides a satellite patch image plus a text prompt, and the system returns:
	1) Most similar images from the dataset (retrieval)
	2) Newly generated images via image-to-image and text-to-image
	with user-controlled counts (0–5 each).

	- System architecture: two engines working together
	- Retrieval engine (embedding-based):
	- Embed the user image with DINOv2-Small (best model from Part 3).
	- Compare the query embedding against the stored embedding index:
	- `best_embeddings.npy` (vectors) + `best_metadata.csv` (filename/label mapping).
	- Compute similarity using cosine similarity (dot product due to L2 normalization).
	- Return Top-K results (K ≤ 5), each including image, label, similarity score, and filename.
	- Generation engine (Diffusers):
	- Use `stabilityai/sd-turbo` for fast generation (works well with 1–2 steps).
	- Support two generation modes:
	- img2img: generates variants that stay visually close to the user image, guided by the prompt.
	- txt2img: generates new images purely from the prompt.
	- User controls how many images to generate (0–5 each).

	- Pipeline inputs:
	- `user_img` — user-provided PIL image
	- `user_prompt` — user-provided prompt (required for generation)
	- `k_retrieve` — number of retrieved images (0–5)
	- `n_i2i`, `n_t2i` — generated image counts (0–5 each)
	- `strength_i2i` — img2img closeness (lower = closer to input)
	- `steps` — generation steps (sd-turbo typically 1–2)
	- `gen_size` — output size (e.g., 384 or 512)
	- `seed` — reproducibility

	- Stability safeguards (app-ready):
	- Hard caps on counts (0–5) for retrieval and generation to prevent overload.
	- A safe-step rule for img2img to avoid the “0 effective steps” Diffusers crash when strength is low.
	- GPU optimizations when available: fp16 + `torch.autocast` for speed.

	- Reading the dataset directly from HF (course requirement):
	- Instead of local files, dataset images are loaded using `hf_hub_download` from:
	- `LevyJonas/sat_land_patches`
	- A cache directory is used to avoid repeated downloads.

	- Pipeline outputs:
	- `retrieved`: up to 5 retrieved items (PIL image, label, similarity, filename)
	- `gen_i2i`: up to 5 generated img2img images
	- `gen_t2i`: up to 5 generated txt2img images
	- `info`: summary dictionary (prompt, counts, steps/strength, dataset id, etc.)

	- Key takeaway:
	- Part 4 combines retrieval (real examples from the dataset) with generation (new synthetic variants) in one workflow, and is modular/UI-ready for Part 5 (Gradio sliders + galleries).

	---

	## Part 5 — Application (HF Space with Gradio)

	- Goal: Deploy an interactive application that demonstrates the full workflow:
	Upload image + prompt → retrieve similar examples → generate new variants.
	This turns the pipeline from Part 4 into a user-facing product-like demo.

	- Platform: Hugging Face Spaces using Gradio (`app.py` as the entry point).

	- UI Inputs (user controls):
	- Image upload: user provides a satellite patch (PIL image).
	- Prompt textbox: user writes the prompt (required for generation).
	- Sliders (0–5):
	- `k_retrieve`: number of retrieved dataset images (0–5)
	- `n_i2i`: number of img2img generated images (0–5)
	- `n_t2i`: number of txt2img generated images (0–5)
	- Generation settings:
	- `strength_i2i`: controls how close img2img stays to the input (lower = closer)
	- `steps`: generation steps (1–2 recommended for sd-turbo)
	- `gen_size`: output size (384 or 512)
	- `seed`: reproducibility

	- Backend logic (connected to Part 4):
	- `app.py` calls `run_search_and_generate(...)` from `pipeline.py`.
	- The pipeline:
	- Embeds the uploaded image (DINOv2-Small)
	- Retrieves Top-K similar images from the embedding index (`best_embeddings.npy` + `best_metadata.csv`)
	- Generates new images using `stabilityai/sd-turbo` with:
	- img2img conditioned on the uploaded image + prompt
	- txt2img conditioned on the prompt only

	- Outputs shown to the user:
	- Gallery 1 (Retrieved from dataset): Top-K nearest neighbors with labels + cosine similarity scores.
	- Gallery 2 (Generated img2img): New image variants close to the uploaded input.
	- Gallery 3 (Generated txt2img): New images generated from the prompt.
	- Summary panel: displays the chosen parameters and pipeline metadata (counts, steps, strength, dataset id, etc.).

	- Course requirement: read directly from HF dataset repo
	- Dataset images are loaded at runtime using `hf_hub_download` from:
	- `LevyJonas/sat_land_patches`
	- A local cache is used in the Space to avoid repeated downloads.

	- Deployment notes:
	- For practical generation speed, the Space should run on GPU hardware.
	- Embedding files (`best_embeddings.npy`, `best_metadata.csv`) are stored in the Space repo so the app can start instantly.

	---
	title: Satellite Patch Retrieve + Generate
	emoji: 🛰️
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: "5.50.0"
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---

	# Final Project Summary (Satellite Patch Retrieval + Generation)

	This document summarizes Parts 1–4 of our project: dataset generation, EDA, embeddings, and the end-to-end pipeline.

	---

	## Part 1 — Synthetic Data Generation (with key terms)

	In Part 1, we built a synthetic satellite-like image dataset using a pre-trained Hugging Face generative model. We used `stabilityai/sd-turbo` (a fast Stable Diffusion “Turbo” model) to generate 30 land-type classes with 50 images per class (1500 images total). Each label had its own prompt (e.g., forest, water, urban, runway), and we used a negative prompt to reduce unwanted artifacts such as text, logos, or cartoonish styles. The images were saved in a clean folder structure (`images/<label>/...jpg`) and documented in `metadata.csv` (`id`, `filename`, `label`, `prompt`, `seed`, `model_id`) so later parts (EDA, embeddings, and the app) could load and reuse the dataset easily.

	### Key terms used
	- Diffusers: Hugging Face library providing ready-to-use pipelines for diffusion-based generative models (e.g., Stable Diffusion). It loads the model and generates images from prompts.
	- Transformers: Hugging Face library for Transformer-based models across text and vision. Used both as a dependency and later for embedding models (CLIP/ViT/DINOv2).
	- Tokenizers: Converts text prompts into tokens/IDs the model can process; required for text-conditioned models (e.g., text-to-image).
	- Pillow (PIL): Python imaging library for loading/manipulating/saving images (JPG/PNG), resizing, and file I/O.
	- `stabilityai/sd-turbo`: Chosen because it is optimized for speed and can generate strong results with 1–2 inference steps, enabling fast large-scale dataset creation.

	---

	## Part 2 — Exploratory Data Analysis (EDA)

	- Loaded and inspected metadata: Read `metadata.csv` (1500 rows) with expected columns (`id`, `filename`, `label`, `prompt`, `seed`, `model_id`) and confirmed 30 classes.
	- Integrity validation: Verified 0 missing image files, 0 duplicate ids, 0 duplicate filenames, and 0 duplicate (label, seed) pairs.
	- Class balance check: Confirmed a perfectly balanced dataset with 50 images per label (min/max = 50/50).
	- Image consistency: Confirmed all images have the same resolution (384×384).
	- Global image statistics: Computed per-image RGB mean/std, brightness (luminance proxy), and a sharpness proxy (gradient-based), then reviewed distributions and summaries.
	- Outlier analysis: Observed meaningful extremes consistent with labels:
	- darkest samples mainly DenseForest
	- brightest samples mainly SnowIce
	- lowest-sharpness samples often from smoother-texture classes like Grassland / DesertSand / SeaOpenWater
	- Class-level insights: Aggregated statistics by label (brightness/color tendencies) and used a simple PCA projection to visualize similarity/overlap between visually related classes.

	---

	## Part 3 — Embeddings (Similarity Search)

	- Goal: Convert each satellite patch image into a compact vector (embedding) to enable similarity search / retrieval and support the later app pipeline.
	- Models tested (HF backbones):
	- CLIP ViT-B/32 (`openai/clip-vit-base-patch32`)
	- ViT-Base (`google/vit-base-patch16-224-in21k`)
	- DINOv2-Small (`facebook/dinov2-small`)
	- Embedding extraction:
	- Used the CLS token from `last_hidden_state` as a single global image representation (standard for ViT-style models).
	- Applied L2-normalization so cosine similarity becomes a fast dot product (stable and efficient retrieval).
	- Evaluation metric (retrieval-focused): `label_agree@5` and `label_agree@10`
	- For each image, retrieve its top-k nearest neighbors (cosine similarity).
	- Measure the fraction of neighbors with the same label as the query.
	- Average across all 1,500 images.
	- This measures retrieval quality directly (not classifier accuracy).
	- Key results (quality + efficiency):
	- DINOv2-Small performed best: `agree@5 ≈ 0.9247`, `agree@10 ≈ 0.9006`
	- Also produced smaller embeddings (384-dim) than CLIP/ViT (768-dim), reducing storage and improving retrieval efficiency.
	- Selected DINOv2-Small as the optimal embedding model.
	- Saved outputs (reusable):
	- Embeddings: `*_embeddings.npy` (NumPy)
	- Metadata mapping: `*_metadata.csv` (CSV)
	- Comparison table: `embedding_model_comparison.csv` (CSV)
	- Qualitative validation:
	- PCA scatter plot to visualize clustering in 2D (sanity check for overlap/separability).
	- Nearest-neighbor gallery to confirm retrieved results make sense visually and align with labels.

	---

	## Part 4 — End-to-End Pipeline (Retrieve + Generate)

	- Goal: Build a production-style Input → Processing → Output pipeline that can be plugged directly into an app.
	The user provides a satellite patch image plus a text prompt, and the system returns:
	1) Most similar images from the dataset (retrieval)
	2) Newly generated images via image-to-image and text-to-image
	with user-controlled counts (0–5 each).

	- System architecture: two engines working together
	- Retrieval engine (embedding-based):
	- Embed the user image with DINOv2-Small (best model from Part 3).
	- Compare the query embedding against the stored embedding index:
	- `best_embeddings.npy` (vectors) + `best_metadata.csv` (filename/label mapping).
	- Compute similarity using cosine similarity (dot product due to L2 normalization).
	- Return Top-K results (K ≤ 5), each including image, label, similarity score, and filename.
	- Generation engine (Diffusers):
	- Use `stabilityai/sd-turbo` for fast generation (works well with 1–2 steps).
	- Support two generation modes:
	- img2img: generates variants that stay visually close to the user image, guided by the prompt.
	- txt2img: generates new images purely from the prompt.
	- User controls how many images to generate (0–5 each).

	- Pipeline inputs:
	- `user_img` — user-provided PIL image
	- `user_prompt` — user-provided prompt (required for generation)
	- `k_retrieve` — number of retrieved images (0–5)
	- `n_i2i`, `n_t2i` — generated image counts (0–5 each)
	- `strength_i2i` — img2img closeness (lower = closer to input)
	- `steps` — generation steps (sd-turbo typically 1–2)
	- `gen_size` — output size (e.g., 384 or 512)
	- `seed` — reproducibility

	- Stability safeguards (app-ready):
	- Hard caps on counts (0–5) for retrieval and generation to prevent overload.
	- A safe-step rule for img2img to avoid the “0 effective steps” Diffusers crash when strength is low.
	- GPU optimizations when available: fp16 + `torch.autocast` for speed.

	- Reading the dataset directly from HF (course requirement):
	- Instead of local files, dataset images are loaded using `hf_hub_download` from:
	- `LevyJonas/sat_land_patches`
	- A cache directory is used to avoid repeated downloads.

	- Pipeline outputs:
	- `retrieved`: up to 5 retrieved items (PIL image, label, similarity, filename)
	- `gen_i2i`: up to 5 generated img2img images
	- `gen_t2i`: up to 5 generated txt2img images
	- `info`: summary dictionary (prompt, counts, steps/strength, dataset id, etc.)

	- Key takeaway:
	- Part 4 combines retrieval (real examples from the dataset) with generation (new synthetic variants) in one workflow, and is modular/UI-ready for Part 5 (Gradio sliders + galleries).

	---

	## Part 5 — Application (HF Space with Gradio)

	- Goal: Deploy an interactive application that demonstrates the full workflow:
	Upload image + prompt → retrieve similar examples → generate new variants.
	This turns the pipeline from Part 4 into a user-facing product-like demo.

	- Platform: Hugging Face Spaces using Gradio (`app.py` as the entry point).

	- UI Inputs (user controls):
	- Image upload: user provides a satellite patch (PIL image).
	- Prompt textbox: user writes the prompt (required for generation).
	- Sliders (0–5):
	- `k_retrieve`: number of retrieved dataset images (0–5)
	- `n_i2i`: number of img2img generated images (0–5)
	- `n_t2i`: number of txt2img generated images (0–5)
	- Generation settings:
	- `strength_i2i`: controls how close img2img stays to the input (lower = closer)
	- `steps`: generation steps (1–2 recommended for sd-turbo)
	- `gen_size`: output size (384 or 512)
	- `seed`: reproducibility

	- Backend logic (connected to Part 4):
	- `app.py` calls `run_search_and_generate(...)` from `pipeline.py`.
	- The pipeline:
	- Embeds the uploaded image (DINOv2-Small)
	- Retrieves Top-K similar images from the embedding index (`best_embeddings.npy` + `best_metadata.csv`)
	- Generates new images using `stabilityai/sd-turbo` with:
	- img2img conditioned on the uploaded image + prompt
	- txt2img conditioned on the prompt only

	- Outputs shown to the user:
	- Gallery 1 (Retrieved from dataset): Top-K nearest neighbors with labels + cosine similarity scores.
	- Gallery 2 (Generated img2img): New image variants close to the uploaded input.
	- Gallery 3 (Generated txt2img): New images generated from the prompt.
	- Summary panel: displays the chosen parameters and pipeline metadata (counts, steps, strength, dataset id, etc.).

	- Course requirement: read directly from HF dataset repo
	- Dataset images are loaded at runtime using `hf_hub_download` from:
	- `LevyJonas/sat_land_patches`
	- A local cache is used in the Space to avoid repeated downloads.

	- Deployment notes:
	- For practical generation speed, the Space should run on GPU hardware.
	- Embedding files (`best_embeddings.npy`, `best_metadata.csv`) are stored in the Space repo so the app can start instantly.