mnm-matin commited on
Commit
4dfa1ed
·
verified ·
1 Parent(s): d9af93d

Update HyperView main demo to iNat24 geometry showcase

Browse files
Files changed (5) hide show
  1. .dockerignore +3 -0
  2. Dockerfile +5 -3
  3. README.md +39 -39
  4. __pycache__/demo.cpython-312.pyc +0 -0
  5. demo.py +165 -30
.dockerignore CHANGED
@@ -11,5 +11,8 @@ venv
11
  .mypy_cache
12
  .pytest_cache
13
 
 
 
 
14
  # Misc
15
  .DS_Store
 
11
  .mypy_cache
12
  .pytest_cache
13
 
14
+ # Local runtime artifacts
15
+ demo_data
16
+
17
  # Misc
18
  .DS_Store
Dockerfile CHANGED
@@ -21,10 +21,12 @@ WORKDIR $HOME/app
21
 
22
  RUN pip install --upgrade pip
23
 
24
- ARG HYPERVIEW_VERSION=0.3.1
25
- ARG HYPER_MODELS_VERSION=0.1.0
26
 
27
- # Pin package versions so Docker cache cannot silently hold an older PyPI release.
 
 
28
  RUN pip install "hyperview==${HYPERVIEW_VERSION}" && python -c "import hyperview; print('hyperview', hyperview.__version__)"
29
  RUN pip install "hyper-models==${HYPER_MODELS_VERSION}" && python -c "import hyper_models; print('hyper_models', hyper_models.__version__)"
30
 
 
21
 
22
  RUN pip install --upgrade pip
23
 
24
+ ARG HYPERVIEW_VERSION=0.4.2
25
+ ARG HYPER_MODELS_VERSION=0.2.0
26
 
27
+ # Install CPU-only PyTorch first so the Space does not pull the default CUDA bundle,
28
+ # then pin released HyperView packages so Docker cache cannot hold an older release.
29
+ RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
30
  RUN pip install "hyperview==${HYPERVIEW_VERSION}" && python -c "import hyperview; print('hyperview', hyperview.__version__)"
31
  RUN pip install "hyper-models==${HYPER_MODELS_VERSION}" && python -c "import hyper_models; print('hyper_models', hyper_models.__version__)"
32
 
README.md CHANGED
@@ -8,56 +8,56 @@ app_port: 7860
8
  pinned: false
9
  ---
10
 
11
- # HyperView Imagenette (CLIP + HyCoCLIP)
12
 
13
- This folder is the simplest copyable HyperView Space example in this repo.
14
- It keeps all dataset-specific settings in the constants block at the top of
15
- [demo.py](demo.py), so a coding agent can usually adapt it by editing one file.
16
 
17
- This example runs HyperView with:
 
 
18
 
19
- - CLIP embeddings (`openai/clip-vit-base-patch32`) for Euclidean layout
20
- - HyCoCLIP embeddings (`hycoclip-vit-s`) for Poincaré layout
 
 
 
21
 
22
- The Docker image installs released HyperView packages from PyPI. The dataset,
23
- embeddings, and layouts are computed at first startup.
24
 
25
- ## Reuse This Template
26
-
27
- When you copy this folder for your own dataset, change these parts first:
28
-
29
- 1. Edit the constants block in [demo.py](demo.py).
30
- 2. Rename the copied Space from `HyperView` to your own project name such as `yourproject-HyperView` or `HyperView-yourproject`.
31
- 3. Update this README frontmatter, title, and H1.
32
- 4. Point a deploy workflow at your new folder.
33
 
34
- This starter currently installs `hyperview==0.3.1` and `hyper-models==0.1.0`.
35
 
36
- The defaults in [demo.py](demo.py) are:
37
 
38
- - Hugging Face dataset: `Multimodal-Fatima/Imagenette_validation`
39
- - Split: `validation`
40
- - Image field: `image`
41
- - Label field: `label`
42
- - Sample count: `300`
43
- - Layouts: CLIP + Euclidean, HyCoCLIP + Poincaré
 
 
 
 
 
 
44
 
45
- If you only want one model in your own Space, keep a single entry in
46
- `EMBEDDING_LAYOUTS` and delete the rest.
47
 
48
- When contributing your own Space back to this repository, add a row to the
49
- community table in the root `README.md` and include your Hugging Face Space ID
50
- in the pull request description.
51
-
52
- ## Build Model
53
 
54
- The Dockerfile runs `build_dataset()` during image build. That means:
55
 
56
- - the first expensive download/embedding pass happens at build time
57
- - the runtime container mostly just launches HyperView
58
- - there is no extra runtime configuration path to keep in sync
 
59
 
60
- ## Deploy source
61
 
62
- This folder is synchronized to Hugging Face Spaces by GitHub Actions from the
63
- `hyperview-spaces` deployment repository.
 
8
  pinned: false
9
  ---
10
 
11
+ # HyperView - iNat24 Tiny Geometry Showcase
12
 
13
+ This is the main HyperView demo Space. It shows the same taxonomy-backed image
14
+ sample through multiple geometric views:
 
15
 
16
+ - CLIP (`openai/clip-vit-base-patch32`) in Euclidean 3D
17
+ - CLIP (`openai/clip-vit-base-patch32`) in spherical 3D
18
+ - HyCoCLIP (`hycoclip-vit-s`) in Poincare 2D
19
 
20
+ The sample is drawn from `evendrow/inat24_tiny`, a compact iNaturalist 2024
21
+ subset with 1,000 images, 100 species, and taxonomy metadata. The visible label
22
+ is the broad `supercategory`, while sample metadata keeps common name, species,
23
+ kingdom, phylum, class, order, family, genus, location fields, license, and
24
+ rights holder.
25
 
26
+ The Docker image installs released packages from PyPI:
 
27
 
28
+ - `hyperview==0.4.2`
29
+ - `hyper-models==0.2.0`
 
 
 
 
 
 
30
 
31
+ ## Dataset
32
 
33
+ The default stratified sample contains 300 images:
34
 
35
+ | Label | Samples |
36
+ | --- | ---: |
37
+ | plants | 50 |
38
+ | insects | 50 |
39
+ | birds | 42 |
40
+ | arachnids | 36 |
41
+ | amphibians | 30 |
42
+ | reptiles | 26 |
43
+ | fungi | 26 |
44
+ | mammals | 20 |
45
+ | fish | 10 |
46
+ | mollusks | 10 |
47
 
48
+ This keeps the demo small enough for Hugging Face CPU Spaces while preserving a
49
+ real biological hierarchy for geometry comparison.
50
 
51
+ ## Reuse This Template
 
 
 
 
52
 
53
+ When copying this folder for another dataset:
54
 
55
+ 1. Edit the constants block at the top of [demo.py](demo.py).
56
+ 2. Update the stratification labels and target counts.
57
+ 3. Rename the copied Space from `HyperView` to your project name.
58
+ 4. Point a deploy workflow at the new folder.
59
 
60
+ ## Deploy Source
61
 
62
+ This folder is synchronized to `hyper3labs/HyperView` by GitHub Actions from
63
+ the `hyperview-spaces` deployment repository.
__pycache__/demo.cpython-312.pyc ADDED
Binary file (8.82 kB). View file
 
demo.py CHANGED
@@ -1,74 +1,209 @@
1
  #!/usr/bin/env python
2
- """HyperView Hugging Face Space template example.
3
-
4
- Copy this folder, then edit the constants below for your dataset.
5
- """
6
 
7
  from __future__ import annotations
8
 
 
 
 
 
 
 
 
 
9
  import hyperview as hv
10
 
11
- # Edit this block when you reuse the template for another Space.
12
  SPACE_HOST = "0.0.0.0"
13
  SPACE_PORT = 7860
14
 
15
- DATASET_NAME = "imagenette_clip_hycoclip"
16
- HF_DATASET = "Multimodal-Fatima/Imagenette_validation"
17
- HF_SPLIT = "validation"
18
- HF_IMAGE_KEY = "image"
19
- HF_LABEL_KEY = "label"
20
- SAMPLE_COUNT = 300
21
  SAMPLE_SEED = 42
22
 
23
- # Keep one or more entries here. Most reuses only need one model/layout pair.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  EMBEDDING_LAYOUTS = [
25
  {
26
  "name": "CLIP",
27
  "provider": "embed-anything",
28
  "model": "openai/clip-vit-base-patch32",
29
- "layout": "euclidean",
30
  },
31
  {
32
  "name": "HyCoCLIP",
33
  "provider": "hyper-models",
34
  "model": "hycoclip-vit-s",
35
- "layout": "poincare",
36
  },
37
  ]
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  def build_dataset() -> hv.Dataset:
41
  dataset = hv.Dataset(DATASET_NAME)
42
-
43
- if len(dataset) == 0:
44
- print(f"Loading {SAMPLE_COUNT} samples from {HF_DATASET} ({HF_SPLIT})...")
45
- dataset.add_from_huggingface(
46
- HF_DATASET,
47
- split=HF_SPLIT,
48
- image_key=HF_IMAGE_KEY,
49
- label_key=HF_LABEL_KEY,
50
- max_samples=SAMPLE_COUNT,
51
- shuffle=True,
52
- seed=SAMPLE_SEED,
53
- )
54
 
55
  for embedding in EMBEDDING_LAYOUTS:
56
- print(f"Ensuring {embedding['name']} embeddings ({embedding['model']})...")
57
  space_key = dataset.compute_embeddings(
58
  model=embedding["model"],
59
  provider=embedding["provider"],
60
  show_progress=True,
61
  )
62
 
63
- print(f"Ensuring {embedding['layout']} layout...")
64
- dataset.compute_visualization(space_key=space_key, layout=embedding["layout"])
 
65
 
66
  return dataset
67
 
68
 
69
  def main() -> None:
70
  dataset = build_dataset()
71
- print(f"Starting HyperView on {SPACE_HOST}:{SPACE_PORT}")
72
  hv.launch(dataset, host=SPACE_HOST, port=SPACE_PORT, open_browser=False)
73
 
74
 
 
1
  #!/usr/bin/env python
2
+ """HyperView main Hugging Face Space geometry demo."""
 
 
 
3
 
4
  from __future__ import annotations
5
 
6
+ import os
7
+ import re
8
+ from collections import Counter
9
+ from pathlib import Path
10
+
11
+ from datasets import load_dataset
12
+ from PIL import Image, ImageOps
13
+
14
  import hyperview as hv
15
 
 
16
  SPACE_HOST = "0.0.0.0"
17
  SPACE_PORT = 7860
18
 
19
+ DATASET_NAME = "inat24_tiny_geometry_showcase"
20
+ HF_DATASET = "evendrow/inat24_tiny"
21
+ HF_SPLIT = "train"
 
 
 
22
  SAMPLE_SEED = 42
23
 
24
+ TARGET_SUPERCATEGORY_COUNTS = {
25
+ "plants": 50,
26
+ "insects": 50,
27
+ "birds": 42,
28
+ "arachnids": 36,
29
+ "amphibians": 30,
30
+ "reptiles": 26,
31
+ "fungi": 26,
32
+ "mammals": 20,
33
+ "fish": 10,
34
+ "mollusks": 10,
35
+ }
36
+ SAMPLE_COUNT = sum(TARGET_SUPERCATEGORY_COUNTS.values())
37
+ IMAGE_MAX_SIZE = (768, 768)
38
+
39
  EMBEDDING_LAYOUTS = [
40
  {
41
  "name": "CLIP",
42
  "provider": "embed-anything",
43
  "model": "openai/clip-vit-base-patch32",
44
+ "layouts": ["euclidean:3d", "spherical"],
45
  },
46
  {
47
  "name": "HyCoCLIP",
48
  "provider": "hyper-models",
49
  "model": "hycoclip-vit-s",
50
+ "layouts": ["poincare"],
51
  },
52
  ]
53
 
54
+ METADATA_FIELDS = (
55
+ "common_name",
56
+ "id",
57
+ "width",
58
+ "height",
59
+ "license",
60
+ "rights_holder",
61
+ "date",
62
+ "latitude",
63
+ "longitude",
64
+ "location_uncertainty",
65
+ "category_id",
66
+ "supercategory",
67
+ "kingdom",
68
+ "phylum",
69
+ "class",
70
+ "order",
71
+ "family",
72
+ "genus",
73
+ "specific_epithet",
74
+ )
75
+
76
+
77
+ def media_root() -> Path:
78
+ root = Path(os.environ.get("HYPERVIEW_MEDIA_DIR", "./demo_data/media"))
79
+ path = root / DATASET_NAME
80
+ path.mkdir(parents=True, exist_ok=True)
81
+ return path
82
+
83
+
84
+ def safe_sample_id(row: dict, index: int) -> str:
85
+ raw_id = row.get("id", index)
86
+ normalized = re.sub(r"[^A-Za-z0-9_.-]+", "_", str(raw_id)).strip("_")
87
+ return f"inat24_{normalized}"
88
+
89
+
90
+ def species_name(row: dict, features) -> str:
91
+ label = row.get("label")
92
+ if label is None:
93
+ return "unknown"
94
+ return features["label"].int2str(label)
95
+
96
+
97
+ def save_image(row: dict, destination: Path) -> None:
98
+ if destination.exists():
99
+ return
100
+
101
+ image = row["image"]
102
+ if not isinstance(image, Image.Image):
103
+ raise TypeError(f"Expected a PIL image, got {type(image)!r}")
104
+
105
+ image = ImageOps.exif_transpose(image).convert("RGB")
106
+ image.thumbnail(IMAGE_MAX_SIZE, Image.Resampling.LANCZOS)
107
+ image.save(destination, format="JPEG", quality=90, optimize=True)
108
+
109
+
110
+ def existing_label_counts(dataset: hv.Dataset) -> Counter[str]:
111
+ return Counter(sample.label for sample in dataset.samples if sample.label)
112
+
113
+
114
+ def target_reached(counts: Counter[str]) -> bool:
115
+ return all(
116
+ counts[group] >= quota
117
+ for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
118
+ )
119
+
120
+
121
+ def add_inat24_samples(dataset: hv.Dataset) -> None:
122
+ counts = existing_label_counts(dataset)
123
+ if target_reached(counts):
124
+ print(f"Dataset already has the target stratified sample ({len(dataset)} samples).")
125
+ return
126
+
127
+ existing_ids = {sample.id for sample in dataset.samples}
128
+ print(
129
+ f"Building a stratified {SAMPLE_COUNT}-sample iNat24 Tiny subset from {HF_DATASET}...",
130
+ flush=True,
131
+ )
132
+ print(f"Current counts: {dict(counts)}", flush=True)
133
+
134
+ source = load_dataset(HF_DATASET, split=HF_SPLIT)
135
+ source = source.shuffle(seed=SAMPLE_SEED)
136
+ root = media_root()
137
+
138
+ for index, row in enumerate(source):
139
+ group = row.get("supercategory")
140
+ if group not in TARGET_SUPERCATEGORY_COUNTS:
141
+ continue
142
+ if counts[group] >= TARGET_SUPERCATEGORY_COUNTS[group]:
143
+ continue
144
+
145
+ sample_id = safe_sample_id(row, index)
146
+ if sample_id in existing_ids:
147
+ continue
148
+
149
+ image_path = root / f"{sample_id}.jpg"
150
+ save_image(row, image_path)
151
+
152
+ metadata = {field: row.get(field) for field in METADATA_FIELDS}
153
+ metadata["scientific_name"] = species_name(row, source.features)
154
+ metadata["source_dataset"] = HF_DATASET
155
+ metadata["sample_strategy"] = "stratified_by_inat24_supercategory"
156
+
157
+ dataset.add_image(
158
+ str(image_path),
159
+ label=group,
160
+ metadata=metadata,
161
+ sample_id=sample_id,
162
+ )
163
+ counts[group] += 1
164
+ existing_ids.add(sample_id)
165
+
166
+ loaded = sum(
167
+ min(counts[group], quota)
168
+ for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
169
+ )
170
+ if loaded == 1 or loaded % 25 == 0 or target_reached(counts):
171
+ print(f"Loaded {loaded}/{SAMPLE_COUNT} samples: {dict(counts)}", flush=True)
172
+
173
+ if target_reached(counts):
174
+ break
175
+
176
+ if not target_reached(counts):
177
+ missing = {
178
+ group: quota - counts[group]
179
+ for group, quota in TARGET_SUPERCATEGORY_COUNTS.items()
180
+ if counts[group] < quota
181
+ }
182
+ raise RuntimeError(f"Could not build the target iNat24 Tiny sample. Missing: {missing}.")
183
+
184
 
185
  def build_dataset() -> hv.Dataset:
186
  dataset = hv.Dataset(DATASET_NAME)
187
+ add_inat24_samples(dataset)
 
 
 
 
 
 
 
 
 
 
 
188
 
189
  for embedding in EMBEDDING_LAYOUTS:
190
+ print(f"Ensuring {embedding['name']} embeddings ({embedding['model']})...", flush=True)
191
  space_key = dataset.compute_embeddings(
192
  model=embedding["model"],
193
  provider=embedding["provider"],
194
  show_progress=True,
195
  )
196
 
197
+ for layout in embedding["layouts"]:
198
+ print(f"Ensuring {embedding['name']} {layout} layout...", flush=True)
199
+ dataset.compute_visualization(space_key=space_key, layout=layout)
200
 
201
  return dataset
202
 
203
 
204
  def main() -> None:
205
  dataset = build_dataset()
206
+ print(f"Starting HyperView on {SPACE_HOST}:{SPACE_PORT}", flush=True)
207
  hv.launch(dataset, host=SPACE_HOST, port=SPACE_PORT, open_browser=False)
208
 
209