Spaces:

hyper3labs
/

HyperView

Running

App Files Files Community

mnm-matin commited on 3 days ago

Commit

4dfa1ed

verified ·

1 Parent(s): d9af93d

Update HyperView main demo to iNat24 geometry showcase

Browse files

Files changed (5) hide show

.dockerignore +3 -0
Dockerfile +5 -3
README.md +39 -39
__pycache__/demo.cpython-312.pyc +0 -0
demo.py +165 -30

.dockerignore CHANGED Viewed

@@ -11,5 +11,8 @@ venv
 .mypy_cache
 .pytest_cache
 # Misc
 .DS_Store

 .mypy_cache
 .pytest_cache
+# Local runtime artifacts
+demo_data
 # Misc
 .DS_Store

Dockerfile CHANGED Viewed

@@ -21,10 +21,12 @@ WORKDIR $HOME/app
 RUN pip install --upgrade pip
-ARG HYPERVIEW_VERSION=0.3.1
-ARG HYPER_MODELS_VERSION=0.1.0
-# Pin package versions so Docker cache cannot silently hold an older PyPI release.
 RUN pip install "hyperview==${HYPERVIEW_VERSION}" && python -c "import hyperview; print('hyperview', hyperview.__version__)"
 RUN pip install "hyper-models==${HYPER_MODELS_VERSION}" && python -c "import hyper_models; print('hyper_models', hyper_models.__version__)"

 RUN pip install --upgrade pip
+ARG HYPERVIEW_VERSION=0.4.2
+ARG HYPER_MODELS_VERSION=0.2.0
+# Install CPU-only PyTorch first so the Space does not pull the default CUDA bundle,
+# then pin released HyperView packages so Docker cache cannot hold an older release.
+RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
 RUN pip install "hyperview==${HYPERVIEW_VERSION}" && python -c "import hyperview; print('hyperview', hyperview.__version__)"
 RUN pip install "hyper-models==${HYPER_MODELS_VERSION}" && python -c "import hyper_models; print('hyper_models', hyper_models.__version__)"

README.md CHANGED Viewed

@@ -8,56 +8,56 @@ app_port: 7860
 pinned: false
 ---
-# HyperView — Imagenette (CLIP + HyCoCLIP)
-This folder is the simplest copyable HyperView Space example in this repo.
-It keeps all dataset-specific settings in the constants block at the top of
-[demo.py](demo.py), so a coding agent can usually adapt it by editing one file.
-This example runs HyperView with:
-- CLIP embeddings (`openai/clip-vit-base-patch32`) for Euclidean layout
-- HyCoCLIP embeddings (`hycoclip-vit-s`) for Poincaré layout
-The Docker image installs released HyperView packages from PyPI. The dataset,
-embeddings, and layouts are computed at first startup.
-## Reuse This Template
-When you copy this folder for your own dataset, change these parts first:
-1. Edit the constants block in [demo.py](demo.py).
-2. Rename the copied Space from `HyperView` to your own project name such as `yourproject-HyperView` or `HyperView-yourproject`.
-3. Update this README frontmatter, title, and H1.
-4. Point a deploy workflow at your new folder.
-This starter currently installs `hyperview==0.3.1` and `hyper-models==0.1.0`.
-The defaults in [demo.py](demo.py) are:
-- Hugging Face dataset: `Multimodal-Fatima/Imagenette_validation`
-- Split: `validation`
-- Image field: `image`
-- Label field: `label`
-- Sample count: `300`
-- Layouts: CLIP + Euclidean, HyCoCLIP + Poincaré
-If you only want one model in your own Space, keep a single entry in
-`EMBEDDING_LAYOUTS` and delete the rest.
-When contributing your own Space back to this repository, add a row to the
-community table in the root `README.md` and include your Hugging Face Space ID
-in the pull request description.
-## Build Model
-The Dockerfile runs `build_dataset()` during image build. That means:
-- the first expensive download/embedding pass happens at build time
-- the runtime container mostly just launches HyperView
-- there is no extra runtime configuration path to keep in sync
-## Deploy source
-This folder is synchronized to Hugging Face Spaces by GitHub Actions from the
-`hyperview-spaces` deployment repository.

 pinned: false
 ---
+# HyperView - iNat24 Tiny Geometry Showcase
+This is the main HyperView demo Space. It shows the same taxonomy-backed image
+sample through multiple geometric views:
+- CLIP (`openai/clip-vit-base-patch32`) in Euclidean 3D
+- CLIP (`openai/clip-vit-base-patch32`) in spherical 3D
+- HyCoCLIP (`hycoclip-vit-s`) in Poincare 2D
+The sample is drawn from `evendrow/inat24_tiny`, a compact iNaturalist 2024
+subset with 1,000 images, 100 species, and taxonomy metadata. The visible label
+is the broad `supercategory`, while sample metadata keeps common name, species,
+kingdom, phylum, class, order, family, genus, location fields, license, and
+rights holder.
+The Docker image installs released packages from PyPI:
+- `hyperview==0.4.2`
+- `hyper-models==0.2.0`
+## Dataset
+The default stratified sample contains 300 images:
+| Label | Samples |
+| --- | ---: |
+| plants | 50 |
+| insects | 50 |
+| birds | 42 |
+| arachnids | 36 |
+| amphibians | 30 |
+| reptiles | 26 |
+| fungi | 26 |
+| mammals | 20 |
+| fish | 10 |
+| mollusks | 10 |
+This keeps the demo small enough for Hugging Face CPU Spaces while preserving a
+real biological hierarchy for geometry comparison.
+## Reuse This Template
+When copying this folder for another dataset:
+1. Edit the constants block at the top of [demo.py](demo.py).
+2. Update the stratification labels and target counts.
+3. Rename the copied Space from `HyperView` to your project name.
+4. Point a deploy workflow at the new folder.
+## Deploy Source
+This folder is synchronized to `hyper3labs/HyperView` by GitHub Actions from
+the `hyperview-spaces` deployment repository.

__pycache__/demo.cpython-312.pyc ADDED Viewed

Binary file (8.82 kB). View file

demo.py CHANGED Viewed

@@ -1,74 +1,209 @@
 #!/usr/bin/env python
-"""HyperView Hugging Face Space template example.
-Copy this folder, then edit the constants below for your dataset.
-"""
 from __future__ import annotations
 import hyperview as hv
-# Edit this block when you reuse the template for another Space.
 SPACE_HOST = "0.0.0.0"
 SPACE_PORT = 7860
-DATASET_NAME = "imagenette_clip_hycoclip"
-HF_DATASET = "Multimodal-Fatima/Imagenette_validation"
-HF_SPLIT = "validation"
-HF_IMAGE_KEY = "image"
-HF_LABEL_KEY = "label"
-SAMPLE_COUNT = 300
 SAMPLE_SEED = 42
-# Keep one or more entries here. Most reuses only need one model/layout pair.
 EMBEDDING_LAYOUTS = [
     {
         "name": "CLIP",
         "provider": "embed-anything",
         "model": "openai/clip-vit-base-patch32",
-        "layout": "euclidean",
     },
     {
         "name": "HyCoCLIP",
         "provider": "hyper-models",
         "model": "hycoclip-vit-s",
-        "layout": "poincare",
     },
 ]
 def build_dataset() -> hv.Dataset:
     dataset = hv.Dataset(DATASET_NAME)
-    if len(dataset) == 0:
-        print(f"Loading {SAMPLE_COUNT} samples from {HF_DATASET} ({HF_SPLIT})...")
-        dataset.add_from_huggingface(
-            HF_DATASET,
-            split=HF_SPLIT,
-            image_key=HF_IMAGE_KEY,
-            label_key=HF_LABEL_KEY,
-            max_samples=SAMPLE_COUNT,
-            shuffle=True,
-            seed=SAMPLE_SEED,
-        )
     for embedding in EMBEDDING_LAYOUTS:
-        print(f"Ensuring {embedding['name']} embeddings ({embedding['model']})...")
         space_key = dataset.compute_embeddings(
             model=embedding["model"],
             provider=embedding["provider"],
             show_progress=True,
         )
-        print(f"Ensuring {embedding['layout']} layout...")
-        dataset.compute_visualization(space_key=space_key, layout=embedding["layout"])
     return dataset
 def main() -> None:
     dataset = build_dataset()
-    print(f"Starting HyperView on {SPACE_HOST}:{SPACE_PORT}")
     hv.launch(dataset, host=SPACE_HOST, port=SPACE_PORT, open_browser=False)

 #!/usr/bin/env python
+"""HyperView main Hugging Face Space geometry demo."""
 from __future__ import annotations
+import os
+import re
+from collections import Counter
+from pathlib import Path
+from datasets import load_dataset
+from PIL import Image, ImageOps
 import hyperview as hv
 SPACE_HOST = "0.0.0.0"
 SPACE_PORT = 7860
+DATASET_NAME = "inat24_tiny_geometry_showcase"
+HF_DATASET = "evendrow/inat24_tiny"
+HF_SPLIT = "train"
 SAMPLE_SEED = 42
+TARGET_SUPERCATEGORY_COUNTS = {
+    "plants": 50,
+    "insects": 50,
+    "birds": 42,
+    "arachnids": 36,
+    "amphibians": 30,
+    "reptiles": 26,
+    "fungi": 26,
+    "mammals": 20,
+    "fish": 10,
+    "mollusks": 10,
+}
+SAMPLE_COUNT = sum(TARGET_SUPERCATEGORY_COUNTS.values())
+IMAGE_MAX_SIZE = (768, 768)
 EMBEDDING_LAYOUTS = [
     {
         "name": "CLIP",
         "provider": "embed-anything",
         "model": "openai/clip-vit-base-patch32",
+        "layouts": ["euclidean:3d", "spherical"],
     },
     {
         "name": "HyCoCLIP",
         "provider": "hyper-models",
         "model": "hycoclip-vit-s",
+        "layouts": ["poincare"],
     },
 ]
+METADATA_FIELDS = (
+    "common_name",
+    "id",
+    "width",
+    "height",
+    "license",
+    "rights_holder",
+    "date",
+    "latitude",
+    "longitude",
+    "location_uncertainty",
+    "category_id",
+    "supercategory",
+    "kingdom",
+    "phylum",
+    "class",
+    "order",
+    "family",
+    "genus",
+    "specific_epithet",
+)
+def media_root() -> Path:
+    root = Path(os.environ.get("HYPERVIEW_MEDIA_DIR", "./demo_data/media"))
+    path = root / DATASET_NAME
+    path.mkdir(parents=True, exist_ok=True)
+    return path
+def safe_sample_id(row: dict, index: int) -> str:
+    raw_id = row.get("id", index)
+    normalized = re.sub(r"[^A-Za-z0-9_.-]+", "_", str(raw_id)).strip("_")
+    return f"inat24_{normalized}"
+def species_name(row: dict, features) -> str:
+    label = row.get("label")
+    if label is None:
+        return "unknown"
+    return features["label"].int2str(label)
+def save_image(row: dict, destination: Path) -> None:
+    if destination.exists():
+        return
+    image = row["image"]
+    if not isinstance(image, Image.Image):
+        raise TypeError(f"Expected a PIL image, got {type(image)!r}")
+    image = ImageOps.exif_transpose(image).convert("RGB")
+    image.thumbnail(IMAGE_MAX_SIZE, Image.Resampling.LANCZOS)
+    image.save(destination, format="JPEG", quality=90, optimize=True)
+def existing_label_counts(dataset: hv.Dataset) -> Counter[str]:
+    return Counter(sample.label for sample in dataset.samples if sample.label)
+def target_reached(counts: Counter[str]) -> bool:
+    return all(
+        counts[group] >= quota
+        for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
+    )
+def add_inat24_samples(dataset: hv.Dataset) -> None:
+    counts = existing_label_counts(dataset)
+    if target_reached(counts):
+        print(f"Dataset already has the target stratified sample ({len(dataset)} samples).")
+        return
+    existing_ids = {sample.id for sample in dataset.samples}
+    print(
+        f"Building a stratified {SAMPLE_COUNT}-sample iNat24 Tiny subset from {HF_DATASET}...",
+        flush=True,
+    )
+    print(f"Current counts: {dict(counts)}", flush=True)
+    source = load_dataset(HF_DATASET, split=HF_SPLIT)
+    source = source.shuffle(seed=SAMPLE_SEED)
+    root = media_root()
+    for index, row in enumerate(source):
+        group = row.get("supercategory")
+        if group not in TARGET_SUPERCATEGORY_COUNTS:
+            continue
+        if counts[group] >= TARGET_SUPERCATEGORY_COUNTS[group]:
+            continue
+        sample_id = safe_sample_id(row, index)
+        if sample_id in existing_ids:
+            continue
+        image_path = root / f"{sample_id}.jpg"
+        save_image(row, image_path)
+        metadata = {field: row.get(field) for field in METADATA_FIELDS}
+        metadata["scientific_name"] = species_name(row, source.features)
+        metadata["source_dataset"] = HF_DATASET
+        metadata["sample_strategy"] = "stratified_by_inat24_supercategory"
+        dataset.add_image(
+            str(image_path),
+            label=group,
+            metadata=metadata,
+            sample_id=sample_id,
+        )
+        counts[group] += 1
+        existing_ids.add(sample_id)
+        loaded = sum(
+            min(counts[group], quota)
+            for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
+        )
+        if loaded == 1 or loaded % 25 == 0 or target_reached(counts):
+            print(f"Loaded {loaded}/{SAMPLE_COUNT} samples: {dict(counts)}", flush=True)
+        if target_reached(counts):
+            break
+    if not target_reached(counts):
+        missing = {
+            group: quota - counts[group]
+            for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
+            if counts[group] < quota
+        }
+        raise RuntimeError(f"Could not build the target iNat24 Tiny sample. Missing: {missing}.")
 def build_dataset() -> hv.Dataset:
     dataset = hv.Dataset(DATASET_NAME)
+    add_inat24_samples(dataset)
     for embedding in EMBEDDING_LAYOUTS:
+        print(f"Ensuring {embedding['name']} embeddings ({embedding['model']})...", flush=True)
         space_key = dataset.compute_embeddings(
             model=embedding["model"],
             provider=embedding["provider"],
             show_progress=True,
         )
+        for layout in embedding["layouts"]:
+            print(f"Ensuring {embedding['name']} {layout} layout...", flush=True)
+            dataset.compute_visualization(space_key=space_key, layout=layout)
     return dataset
 def main() -> None:
     dataset = build_dataset()
+    print(f"Starting HyperView on {SPACE_HOST}:{SPACE_PORT}", flush=True)
     hv.launch(dataset, host=SPACE_HOST, port=SPACE_PORT, open_browser=False)