Spaces:
Running
Running
Update HyperView main demo to iNat24 geometry showcase
Browse files- .dockerignore +3 -0
- Dockerfile +5 -3
- README.md +39 -39
- __pycache__/demo.cpython-312.pyc +0 -0
- demo.py +165 -30
.dockerignore
CHANGED
|
@@ -11,5 +11,8 @@ venv
|
|
| 11 |
.mypy_cache
|
| 12 |
.pytest_cache
|
| 13 |
|
|
|
|
|
|
|
|
|
|
| 14 |
# Misc
|
| 15 |
.DS_Store
|
|
|
|
| 11 |
.mypy_cache
|
| 12 |
.pytest_cache
|
| 13 |
|
| 14 |
+
# Local runtime artifacts
|
| 15 |
+
demo_data
|
| 16 |
+
|
| 17 |
# Misc
|
| 18 |
.DS_Store
|
Dockerfile
CHANGED
|
@@ -21,10 +21,12 @@ WORKDIR $HOME/app
|
|
| 21 |
|
| 22 |
RUN pip install --upgrade pip
|
| 23 |
|
| 24 |
-
ARG HYPERVIEW_VERSION=0.
|
| 25 |
-
ARG HYPER_MODELS_VERSION=0.
|
| 26 |
|
| 27 |
-
#
|
|
|
|
|
|
|
| 28 |
RUN pip install "hyperview==${HYPERVIEW_VERSION}" && python -c "import hyperview; print('hyperview', hyperview.__version__)"
|
| 29 |
RUN pip install "hyper-models==${HYPER_MODELS_VERSION}" && python -c "import hyper_models; print('hyper_models', hyper_models.__version__)"
|
| 30 |
|
|
|
|
| 21 |
|
| 22 |
RUN pip install --upgrade pip
|
| 23 |
|
| 24 |
+
ARG HYPERVIEW_VERSION=0.4.2
|
| 25 |
+
ARG HYPER_MODELS_VERSION=0.2.0
|
| 26 |
|
| 27 |
+
# Install CPU-only PyTorch first so the Space does not pull the default CUDA bundle,
|
| 28 |
+
# then pin released HyperView packages so Docker cache cannot hold an older release.
|
| 29 |
+
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
|
| 30 |
RUN pip install "hyperview==${HYPERVIEW_VERSION}" && python -c "import hyperview; print('hyperview', hyperview.__version__)"
|
| 31 |
RUN pip install "hyper-models==${HYPER_MODELS_VERSION}" && python -c "import hyper_models; print('hyper_models', hyper_models.__version__)"
|
| 32 |
|
README.md
CHANGED
|
@@ -8,56 +8,56 @@ app_port: 7860
|
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# HyperView
|
| 12 |
|
| 13 |
-
This
|
| 14 |
-
|
| 15 |
-
[demo.py](demo.py), so a coding agent can usually adapt it by editing one file.
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
The Docker image installs released
|
| 23 |
-
embeddings, and layouts are computed at first startup.
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
When you copy this folder for your own dataset, change these parts first:
|
| 28 |
-
|
| 29 |
-
1. Edit the constants block in [demo.py](demo.py).
|
| 30 |
-
2. Rename the copied Space from `HyperView` to your own project name such as `yourproject-HyperView` or `HyperView-yourproject`.
|
| 31 |
-
3. Update this README frontmatter, title, and H1.
|
| 32 |
-
4. Point a deploy workflow at your new folder.
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
The
|
| 37 |
|
| 38 |
-
|
| 39 |
-
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
| 47 |
|
| 48 |
-
|
| 49 |
-
community table in the root `README.md` and include your Hugging Face Space ID
|
| 50 |
-
in the pull request description.
|
| 51 |
-
|
| 52 |
-
## Build Model
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
|
|
|
| 59 |
|
| 60 |
-
## Deploy
|
| 61 |
|
| 62 |
-
This folder is synchronized to
|
| 63 |
-
`hyperview-spaces` deployment repository.
|
|
|
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# HyperView - iNat24 Tiny Geometry Showcase
|
| 12 |
|
| 13 |
+
This is the main HyperView demo Space. It shows the same taxonomy-backed image
|
| 14 |
+
sample through multiple geometric views:
|
|
|
|
| 15 |
|
| 16 |
+
- CLIP (`openai/clip-vit-base-patch32`) in Euclidean 3D
|
| 17 |
+
- CLIP (`openai/clip-vit-base-patch32`) in spherical 3D
|
| 18 |
+
- HyCoCLIP (`hycoclip-vit-s`) in Poincare 2D
|
| 19 |
|
| 20 |
+
The sample is drawn from `evendrow/inat24_tiny`, a compact iNaturalist 2024
|
| 21 |
+
subset with 1,000 images, 100 species, and taxonomy metadata. The visible label
|
| 22 |
+
is the broad `supercategory`, while sample metadata keeps common name, species,
|
| 23 |
+
kingdom, phylum, class, order, family, genus, location fields, license, and
|
| 24 |
+
rights holder.
|
| 25 |
|
| 26 |
+
The Docker image installs released packages from PyPI:
|
|
|
|
| 27 |
|
| 28 |
+
- `hyperview==0.4.2`
|
| 29 |
+
- `hyper-models==0.2.0`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
## Dataset
|
| 32 |
|
| 33 |
+
The default stratified sample contains 300 images:
|
| 34 |
|
| 35 |
+
| Label | Samples |
|
| 36 |
+
| --- | ---: |
|
| 37 |
+
| plants | 50 |
|
| 38 |
+
| insects | 50 |
|
| 39 |
+
| birds | 42 |
|
| 40 |
+
| arachnids | 36 |
|
| 41 |
+
| amphibians | 30 |
|
| 42 |
+
| reptiles | 26 |
|
| 43 |
+
| fungi | 26 |
|
| 44 |
+
| mammals | 20 |
|
| 45 |
+
| fish | 10 |
|
| 46 |
+
| mollusks | 10 |
|
| 47 |
|
| 48 |
+
This keeps the demo small enough for Hugging Face CPU Spaces while preserving a
|
| 49 |
+
real biological hierarchy for geometry comparison.
|
| 50 |
|
| 51 |
+
## Reuse This Template
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
When copying this folder for another dataset:
|
| 54 |
|
| 55 |
+
1. Edit the constants block at the top of [demo.py](demo.py).
|
| 56 |
+
2. Update the stratification labels and target counts.
|
| 57 |
+
3. Rename the copied Space from `HyperView` to your project name.
|
| 58 |
+
4. Point a deploy workflow at the new folder.
|
| 59 |
|
| 60 |
+
## Deploy Source
|
| 61 |
|
| 62 |
+
This folder is synchronized to `hyper3labs/HyperView` by GitHub Actions from
|
| 63 |
+
the `hyperview-spaces` deployment repository.
|
__pycache__/demo.cpython-312.pyc
ADDED
|
Binary file (8.82 kB). View file
|
|
|
demo.py
CHANGED
|
@@ -1,74 +1,209 @@
|
|
| 1 |
#!/usr/bin/env python
|
| 2 |
-
"""HyperView Hugging Face Space
|
| 3 |
-
|
| 4 |
-
Copy this folder, then edit the constants below for your dataset.
|
| 5 |
-
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
import hyperview as hv
|
| 10 |
|
| 11 |
-
# Edit this block when you reuse the template for another Space.
|
| 12 |
SPACE_HOST = "0.0.0.0"
|
| 13 |
SPACE_PORT = 7860
|
| 14 |
|
| 15 |
-
DATASET_NAME = "
|
| 16 |
-
HF_DATASET = "
|
| 17 |
-
HF_SPLIT = "
|
| 18 |
-
HF_IMAGE_KEY = "image"
|
| 19 |
-
HF_LABEL_KEY = "label"
|
| 20 |
-
SAMPLE_COUNT = 300
|
| 21 |
SAMPLE_SEED = 42
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
EMBEDDING_LAYOUTS = [
|
| 25 |
{
|
| 26 |
"name": "CLIP",
|
| 27 |
"provider": "embed-anything",
|
| 28 |
"model": "openai/clip-vit-base-patch32",
|
| 29 |
-
"
|
| 30 |
},
|
| 31 |
{
|
| 32 |
"name": "HyCoCLIP",
|
| 33 |
"provider": "hyper-models",
|
| 34 |
"model": "hycoclip-vit-s",
|
| 35 |
-
"
|
| 36 |
},
|
| 37 |
]
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
def build_dataset() -> hv.Dataset:
|
| 41 |
dataset = hv.Dataset(DATASET_NAME)
|
| 42 |
-
|
| 43 |
-
if len(dataset) == 0:
|
| 44 |
-
print(f"Loading {SAMPLE_COUNT} samples from {HF_DATASET} ({HF_SPLIT})...")
|
| 45 |
-
dataset.add_from_huggingface(
|
| 46 |
-
HF_DATASET,
|
| 47 |
-
split=HF_SPLIT,
|
| 48 |
-
image_key=HF_IMAGE_KEY,
|
| 49 |
-
label_key=HF_LABEL_KEY,
|
| 50 |
-
max_samples=SAMPLE_COUNT,
|
| 51 |
-
shuffle=True,
|
| 52 |
-
seed=SAMPLE_SEED,
|
| 53 |
-
)
|
| 54 |
|
| 55 |
for embedding in EMBEDDING_LAYOUTS:
|
| 56 |
-
print(f"Ensuring {embedding['name']} embeddings ({embedding['model']})...")
|
| 57 |
space_key = dataset.compute_embeddings(
|
| 58 |
model=embedding["model"],
|
| 59 |
provider=embedding["provider"],
|
| 60 |
show_progress=True,
|
| 61 |
)
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
|
|
|
| 65 |
|
| 66 |
return dataset
|
| 67 |
|
| 68 |
|
| 69 |
def main() -> None:
|
| 70 |
dataset = build_dataset()
|
| 71 |
-
print(f"Starting HyperView on {SPACE_HOST}:{SPACE_PORT}")
|
| 72 |
hv.launch(dataset, host=SPACE_HOST, port=SPACE_PORT, open_browser=False)
|
| 73 |
|
| 74 |
|
|
|
|
| 1 |
#!/usr/bin/env python
|
| 2 |
+
"""HyperView main Hugging Face Space geometry demo."""
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
from __future__ import annotations
|
| 5 |
|
| 6 |
+
import os
|
| 7 |
+
import re
|
| 8 |
+
from collections import Counter
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
from datasets import load_dataset
|
| 12 |
+
from PIL import Image, ImageOps
|
| 13 |
+
|
| 14 |
import hyperview as hv
|
| 15 |
|
|
|
|
| 16 |
SPACE_HOST = "0.0.0.0"
|
| 17 |
SPACE_PORT = 7860
|
| 18 |
|
| 19 |
+
DATASET_NAME = "inat24_tiny_geometry_showcase"
|
| 20 |
+
HF_DATASET = "evendrow/inat24_tiny"
|
| 21 |
+
HF_SPLIT = "train"
|
|
|
|
|
|
|
|
|
|
| 22 |
SAMPLE_SEED = 42
|
| 23 |
|
| 24 |
+
TARGET_SUPERCATEGORY_COUNTS = {
|
| 25 |
+
"plants": 50,
|
| 26 |
+
"insects": 50,
|
| 27 |
+
"birds": 42,
|
| 28 |
+
"arachnids": 36,
|
| 29 |
+
"amphibians": 30,
|
| 30 |
+
"reptiles": 26,
|
| 31 |
+
"fungi": 26,
|
| 32 |
+
"mammals": 20,
|
| 33 |
+
"fish": 10,
|
| 34 |
+
"mollusks": 10,
|
| 35 |
+
}
|
| 36 |
+
SAMPLE_COUNT = sum(TARGET_SUPERCATEGORY_COUNTS.values())
|
| 37 |
+
IMAGE_MAX_SIZE = (768, 768)
|
| 38 |
+
|
| 39 |
EMBEDDING_LAYOUTS = [
|
| 40 |
{
|
| 41 |
"name": "CLIP",
|
| 42 |
"provider": "embed-anything",
|
| 43 |
"model": "openai/clip-vit-base-patch32",
|
| 44 |
+
"layouts": ["euclidean:3d", "spherical"],
|
| 45 |
},
|
| 46 |
{
|
| 47 |
"name": "HyCoCLIP",
|
| 48 |
"provider": "hyper-models",
|
| 49 |
"model": "hycoclip-vit-s",
|
| 50 |
+
"layouts": ["poincare"],
|
| 51 |
},
|
| 52 |
]
|
| 53 |
|
| 54 |
+
METADATA_FIELDS = (
|
| 55 |
+
"common_name",
|
| 56 |
+
"id",
|
| 57 |
+
"width",
|
| 58 |
+
"height",
|
| 59 |
+
"license",
|
| 60 |
+
"rights_holder",
|
| 61 |
+
"date",
|
| 62 |
+
"latitude",
|
| 63 |
+
"longitude",
|
| 64 |
+
"location_uncertainty",
|
| 65 |
+
"category_id",
|
| 66 |
+
"supercategory",
|
| 67 |
+
"kingdom",
|
| 68 |
+
"phylum",
|
| 69 |
+
"class",
|
| 70 |
+
"order",
|
| 71 |
+
"family",
|
| 72 |
+
"genus",
|
| 73 |
+
"specific_epithet",
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def media_root() -> Path:
|
| 78 |
+
root = Path(os.environ.get("HYPERVIEW_MEDIA_DIR", "./demo_data/media"))
|
| 79 |
+
path = root / DATASET_NAME
|
| 80 |
+
path.mkdir(parents=True, exist_ok=True)
|
| 81 |
+
return path
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
def safe_sample_id(row: dict, index: int) -> str:
|
| 85 |
+
raw_id = row.get("id", index)
|
| 86 |
+
normalized = re.sub(r"[^A-Za-z0-9_.-]+", "_", str(raw_id)).strip("_")
|
| 87 |
+
return f"inat24_{normalized}"
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def species_name(row: dict, features) -> str:
|
| 91 |
+
label = row.get("label")
|
| 92 |
+
if label is None:
|
| 93 |
+
return "unknown"
|
| 94 |
+
return features["label"].int2str(label)
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def save_image(row: dict, destination: Path) -> None:
|
| 98 |
+
if destination.exists():
|
| 99 |
+
return
|
| 100 |
+
|
| 101 |
+
image = row["image"]
|
| 102 |
+
if not isinstance(image, Image.Image):
|
| 103 |
+
raise TypeError(f"Expected a PIL image, got {type(image)!r}")
|
| 104 |
+
|
| 105 |
+
image = ImageOps.exif_transpose(image).convert("RGB")
|
| 106 |
+
image.thumbnail(IMAGE_MAX_SIZE, Image.Resampling.LANCZOS)
|
| 107 |
+
image.save(destination, format="JPEG", quality=90, optimize=True)
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def existing_label_counts(dataset: hv.Dataset) -> Counter[str]:
|
| 111 |
+
return Counter(sample.label for sample in dataset.samples if sample.label)
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
def target_reached(counts: Counter[str]) -> bool:
|
| 115 |
+
return all(
|
| 116 |
+
counts[group] >= quota
|
| 117 |
+
for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
|
| 118 |
+
)
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
def add_inat24_samples(dataset: hv.Dataset) -> None:
|
| 122 |
+
counts = existing_label_counts(dataset)
|
| 123 |
+
if target_reached(counts):
|
| 124 |
+
print(f"Dataset already has the target stratified sample ({len(dataset)} samples).")
|
| 125 |
+
return
|
| 126 |
+
|
| 127 |
+
existing_ids = {sample.id for sample in dataset.samples}
|
| 128 |
+
print(
|
| 129 |
+
f"Building a stratified {SAMPLE_COUNT}-sample iNat24 Tiny subset from {HF_DATASET}...",
|
| 130 |
+
flush=True,
|
| 131 |
+
)
|
| 132 |
+
print(f"Current counts: {dict(counts)}", flush=True)
|
| 133 |
+
|
| 134 |
+
source = load_dataset(HF_DATASET, split=HF_SPLIT)
|
| 135 |
+
source = source.shuffle(seed=SAMPLE_SEED)
|
| 136 |
+
root = media_root()
|
| 137 |
+
|
| 138 |
+
for index, row in enumerate(source):
|
| 139 |
+
group = row.get("supercategory")
|
| 140 |
+
if group not in TARGET_SUPERCATEGORY_COUNTS:
|
| 141 |
+
continue
|
| 142 |
+
if counts[group] >= TARGET_SUPERCATEGORY_COUNTS[group]:
|
| 143 |
+
continue
|
| 144 |
+
|
| 145 |
+
sample_id = safe_sample_id(row, index)
|
| 146 |
+
if sample_id in existing_ids:
|
| 147 |
+
continue
|
| 148 |
+
|
| 149 |
+
image_path = root / f"{sample_id}.jpg"
|
| 150 |
+
save_image(row, image_path)
|
| 151 |
+
|
| 152 |
+
metadata = {field: row.get(field) for field in METADATA_FIELDS}
|
| 153 |
+
metadata["scientific_name"] = species_name(row, source.features)
|
| 154 |
+
metadata["source_dataset"] = HF_DATASET
|
| 155 |
+
metadata["sample_strategy"] = "stratified_by_inat24_supercategory"
|
| 156 |
+
|
| 157 |
+
dataset.add_image(
|
| 158 |
+
str(image_path),
|
| 159 |
+
label=group,
|
| 160 |
+
metadata=metadata,
|
| 161 |
+
sample_id=sample_id,
|
| 162 |
+
)
|
| 163 |
+
counts[group] += 1
|
| 164 |
+
existing_ids.add(sample_id)
|
| 165 |
+
|
| 166 |
+
loaded = sum(
|
| 167 |
+
min(counts[group], quota)
|
| 168 |
+
for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
|
| 169 |
+
)
|
| 170 |
+
if loaded == 1 or loaded % 25 == 0 or target_reached(counts):
|
| 171 |
+
print(f"Loaded {loaded}/{SAMPLE_COUNT} samples: {dict(counts)}", flush=True)
|
| 172 |
+
|
| 173 |
+
if target_reached(counts):
|
| 174 |
+
break
|
| 175 |
+
|
| 176 |
+
if not target_reached(counts):
|
| 177 |
+
missing = {
|
| 178 |
+
group: quota - counts[group]
|
| 179 |
+
for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
|
| 180 |
+
if counts[group] < quota
|
| 181 |
+
}
|
| 182 |
+
raise RuntimeError(f"Could not build the target iNat24 Tiny sample. Missing: {missing}.")
|
| 183 |
+
|
| 184 |
|
| 185 |
def build_dataset() -> hv.Dataset:
|
| 186 |
dataset = hv.Dataset(DATASET_NAME)
|
| 187 |
+
add_inat24_samples(dataset)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
|
| 189 |
for embedding in EMBEDDING_LAYOUTS:
|
| 190 |
+
print(f"Ensuring {embedding['name']} embeddings ({embedding['model']})...", flush=True)
|
| 191 |
space_key = dataset.compute_embeddings(
|
| 192 |
model=embedding["model"],
|
| 193 |
provider=embedding["provider"],
|
| 194 |
show_progress=True,
|
| 195 |
)
|
| 196 |
|
| 197 |
+
for layout in embedding["layouts"]:
|
| 198 |
+
print(f"Ensuring {embedding['name']} {layout} layout...", flush=True)
|
| 199 |
+
dataset.compute_visualization(space_key=space_key, layout=layout)
|
| 200 |
|
| 201 |
return dataset
|
| 202 |
|
| 203 |
|
| 204 |
def main() -> None:
|
| 205 |
dataset = build_dataset()
|
| 206 |
+
print(f"Starting HyperView on {SPACE_HOST}:{SPACE_PORT}", flush=True)
|
| 207 |
hv.launch(dataset, host=SPACE_HOST, port=SPACE_PORT, open_browser=False)
|
| 208 |
|
| 209 |
|