Spaces:

LevyJonas
/

SurfaceChangePredictor

Sleeping

App Files Files Community

SurfaceChangePredictor / README.md

LevyJonas

Update README.md

8afa490 verified 26 days ago

preview code

raw

history blame contribute delete

10.2 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: Satellite Patch Retrieve + Generate
emoji: 🛰️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.50.0
python_version: '3.10'
app_file: app.py
pinned: false

Final Project Summary (Satellite Patch Retrieval + Generation)

This document summarizes Parts 1–4 of our project: dataset generation, EDA, embeddings, and the end-to-end pipeline.

Part 1 — Synthetic Data Generation (with key terms)

In Part 1, we built a synthetic satellite-like image dataset using a pre-trained Hugging Face generative model. We used stabilityai/sd-turbo (a fast Stable Diffusion “Turbo” model) to generate 30 land-type classes with 50 images per class (1500 images total). Each label had its own prompt (e.g., forest, water, urban, runway), and we used a negative prompt to reduce unwanted artifacts such as text, logos, or cartoonish styles. The images were saved in a clean folder structure (images/<label>/...jpg) and documented in metadata.csv (id, filename, label, prompt, seed, model_id) so later parts (EDA, embeddings, and the app) could load and reuse the dataset easily.

Key terms used

Diffusers: Hugging Face library providing ready-to-use pipelines for diffusion-based generative models (e.g., Stable Diffusion). It loads the model and generates images from prompts.
Transformers: Hugging Face library for Transformer-based models across text and vision. Used both as a dependency and later for embedding models (CLIP/ViT/DINOv2).
Tokenizers: Converts text prompts into tokens/IDs the model can process; required for text-conditioned models (e.g., text-to-image).
Pillow (PIL): Python imaging library for loading/manipulating/saving images (JPG/PNG), resizing, and file I/O.
stabilityai/sd-turbo: Chosen because it is optimized for speed and can generate strong results with 1–2 inference steps, enabling fast large-scale dataset creation.

Part 2 — Exploratory Data Analysis (EDA)

Loaded and inspected metadata: Read metadata.csv (1500 rows) with expected columns (id, filename, label, prompt, seed, model_id) and confirmed 30 classes.
Integrity validation: Verified 0 missing image files, 0 duplicate ids, 0 duplicate filenames, and 0 duplicate (label, seed) pairs.
Class balance check: Confirmed a perfectly balanced dataset with 50 images per label (min/max = 50/50).
Image consistency: Confirmed all images have the same resolution (384×384).
Global image statistics: Computed per-image RGB mean/std, brightness (luminance proxy), and a sharpness proxy (gradient-based), then reviewed distributions and summaries.
Outlier analysis: Observed meaningful extremes consistent with labels:
- darkest samples mainly DenseForest
- brightest samples mainly SnowIce
- lowest-sharpness samples often from smoother-texture classes like Grassland / DesertSand / SeaOpenWater
Class-level insights: Aggregated statistics by label (brightness/color tendencies) and used a simple PCA projection to visualize similarity/overlap between visually related classes.

Part 3 — Embeddings (Similarity Search)

Goal: Convert each satellite patch image into a compact vector (embedding) to enable similarity search / retrieval and support the later app pipeline.
Models tested (HF backbones):
- CLIP ViT-B/32 (openai/clip-vit-base-patch32)
- ViT-Base (google/vit-base-patch16-224-in21k)
- DINOv2-Small (facebook/dinov2-small)
Embedding extraction:
- Used the CLS token from last_hidden_state as a single global image representation (standard for ViT-style models).
- Applied L2-normalization so cosine similarity becomes a fast dot product (stable and efficient retrieval).
Evaluation metric (retrieval-focused): label_agree@5 and label_agree@10
- For each image, retrieve its top-k nearest neighbors (cosine similarity).
- Measure the fraction of neighbors with the same label as the query.
- Average across all 1,500 images.
- This measures retrieval quality directly (not classifier accuracy).
Key results (quality + efficiency):
- DINOv2-Small performed best: agree@5 ≈ 0.9247, agree@10 ≈ 0.9006
- Also produced smaller embeddings (384-dim) than CLIP/ViT (768-dim), reducing storage and improving retrieval efficiency.
- Selected DINOv2-Small as the optimal embedding model.
Saved outputs (reusable):
- Embeddings: *_embeddings.npy (NumPy)
- Metadata mapping: *_metadata.csv (CSV)
- Comparison table: embedding_model_comparison.csv (CSV)
Qualitative validation:
- PCA scatter plot to visualize clustering in 2D (sanity check for overlap/separability).
- Nearest-neighbor gallery to confirm retrieved results make sense visually and align with labels.

Part 4 — End-to-End Pipeline (Retrieve + Generate)

Goal: Build a production-style Input → Processing → Output pipeline that can be plugged directly into an app.
The user provides a satellite patch image plus a text prompt, and the system returns:
1. Most similar images from the dataset (retrieval)
2. Newly generated images via image-to-image and text-to-image
  with user-controlled counts (0–5 each).
System architecture: two engines working together
- Retrieval engine (embedding-based):
  - Embed the user image with DINOv2-Small (best model from Part 3).
  - Compare the query embedding against the stored embedding index:
    - best_embeddings.npy (vectors) + best_metadata.csv (filename/label mapping).
  - Compute similarity using cosine similarity (dot product due to L2 normalization).
  - Return Top-K results (K ≤ 5), each including image, label, similarity score, and filename.
- Generation engine (Diffusers):
  - Use stabilityai/sd-turbo for fast generation (works well with 1–2 steps).
  - Support two generation modes:
    - img2img: generates variants that stay visually close to the user image, guided by the prompt.
    - txt2img: generates new images purely from the prompt.
  - User controls how many images to generate (0–5 each).
Pipeline inputs:
- user_img — user-provided PIL image
- user_prompt — user-provided prompt (required for generation)
- k_retrieve — number of retrieved images (0–5)
- n_i2i, n_t2i — generated image counts (0–5 each)
- strength_i2i — img2img closeness (lower = closer to input)
- steps — generation steps (sd-turbo typically 1–2)
- gen_size — output size (e.g., 384 or 512)
- seed — reproducibility
Stability safeguards (app-ready):
- Hard caps on counts (0–5) for retrieval and generation to prevent overload.
- A safe-step rule for img2img to avoid the “0 effective steps” Diffusers crash when strength is low.
- GPU optimizations when available: fp16 + torch.autocast for speed.
Reading the dataset directly from HF (course requirement):
- Instead of local files, dataset images are loaded using hf_hub_download from:
  - LevyJonas/sat_land_patches
- A cache directory is used to avoid repeated downloads.
Pipeline outputs:
- retrieved: up to 5 retrieved items (PIL image, label, similarity, filename)
- gen_i2i: up to 5 generated img2img images
- gen_t2i: up to 5 generated txt2img images
- info: summary dictionary (prompt, counts, steps/strength, dataset id, etc.)
Key takeaway:
- Part 4 combines retrieval (real examples from the dataset) with generation (new synthetic variants) in one workflow, and is modular/UI-ready for Part 5 (Gradio sliders + galleries).

Part 5 — Application (HF Space with Gradio)

Goal: Deploy an interactive application that demonstrates the full workflow: Upload image + prompt → retrieve similar examples → generate new variants. This turns the pipeline from Part 4 into a user-facing product-like demo.
Platform: Hugging Face Spaces using Gradio (app.py as the entry point).
UI Inputs (user controls):
- Image upload: user provides a satellite patch (PIL image).
- Prompt textbox: user writes the prompt (required for generation).
- Sliders (0–5):
  - k_retrieve: number of retrieved dataset images (0–5)
  - n_i2i: number of img2img generated images (0–5)
  - n_t2i: number of txt2img generated images (0–5)
- Generation settings:
  - strength_i2i: controls how close img2img stays to the input (lower = closer)
  - steps: generation steps (1–2 recommended for sd-turbo)
  - gen_size: output size (384 or 512)
  - seed: reproducibility
Backend logic (connected to Part 4):
- app.py calls run_search_and_generate(...) from pipeline.py.
- The pipeline:
  - Embeds the uploaded image (DINOv2-Small)
  - Retrieves Top-K similar images from the embedding index (best_embeddings.npy + best_metadata.csv)
  - Generates new images using stabilityai/sd-turbo with:
    - img2img conditioned on the uploaded image + prompt
    - txt2img conditioned on the prompt only
Outputs shown to the user:
- Gallery 1 (Retrieved from dataset): Top-K nearest neighbors with labels + cosine similarity scores.
- Gallery 2 (Generated img2img): New image variants close to the uploaded input.
- Gallery 3 (Generated txt2img): New images generated from the prompt.
- Summary panel: displays the chosen parameters and pipeline metadata (counts, steps, strength, dataset id, etc.).
Course requirement: read directly from HF dataset repo
- Dataset images are loaded at runtime using hf_hub_download from:
  - LevyJonas/sat_land_patches
- A local cache is used in the Space to avoid repeated downloads.
Deployment notes:
- For practical generation speed, the Space should run on GPU hardware.
- Embedding files (best_embeddings.npy, best_metadata.csv) are stored in the Space repo so the app can start instantly.