Upload folder using huggingface_hub
Browse files- data/I1_simulation.zip +3 -0
- data/MDB.zip +3 -0
- data/aviris_hsi_balanced.zip +3 -0
- data/aviris_hsi_clean.zip +3 -0
- data/captions_balanced.zip +3 -0
- data/captions_clean.zip +3 -0
- data/readme.md +117 -0
- data/rgb_balanced.zip +3 -0
- data/rgb_clean.zip +3 -0
- data/simulated_s2_boarefl_balanced.zip +3 -0
- data/simulated_s2_boarefl_clean.zip +3 -0
- data/tim_generation_balanced.zip +3 -0
- data/tim_generation_clean.zip +3 -0
- data/truth_false_labels.xlsx +0 -0
data/I1_simulation.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:babf58d3c7b50f7b392891ae9b49679fc43366e1d432803cbe2c4f46ba3c8e12
|
| 3 |
+
size 2018739266
|
data/MDB.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8a2a9d7ae1ae7674b894ced60dc2fb9d3aad70bb61359ecf8968d30c4aa9b4e1
|
| 3 |
+
size 5582289686
|
data/aviris_hsi_balanced.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:479bd99baeb8b5f5406ccc265fad9a2a6900ee91c41032f9a07f6a604a3e4e8b
|
| 3 |
+
size 55956860020
|
data/aviris_hsi_clean.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2ae0c0bbba731a6015d478deab4501c2e55fb819037a40ed4d06eb7b3d0bfc17
|
| 3 |
+
size 34120339460
|
data/captions_balanced.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:767f55f5a74448368ee11dcad58e8cf67be82fa8ca0bdba734fd15d71e293169
|
| 3 |
+
size 95279
|
data/captions_clean.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aa087c23b9da9bc91b5f4314f367976d3e9cfeefff64a834a784e85c4a1788a2
|
| 3 |
+
size 157411
|
data/readme.md
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Methane Benchmark Dataset (PINEAPPLE + Clean)
|
| 2 |
+
|
| 3 |
+
This folder contains the **Methane Benchmark Dataset** in two variants:
|
| 4 |
+
- **balanced**: a balanced mix of methane and non-methane patches
|
| 5 |
+
- **clean**: **no-methane only** (negative patches)
|
| 6 |
+
|
| 7 |
+
The dataset combines multiple modalities (HSI and RGB), **simulated Sentinel-2 BOA reflectance (S2 BOA refl)** derived from HSI, **TerraMind TiM-generated products** (including **S2L2A** and **LULC**), text captions, and labels produced by different sources (LLM, human, and TiM/TerraMind). The clean split additionally contains **Intuition-1 simulated data**.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## 1. Dataset overview
|
| 12 |
+
|
| 13 |
+
### 1.1 balanced (PINEAPPLE: methane + non-methane)
|
| 14 |
+
- **178 patches**, **27 flights**
|
| 15 |
+
- **HSI**: AVIRIS-NG
|
| 16 |
+
- **RGB**: RGB renderings / visualizations aligned with the patches
|
| 17 |
+
- **Simulated Sentinel-2 (BOA reflectance)**: derived from HSI and stored under `simulated_s2_boarefl_balanced/`
|
| 18 |
+
- **TerraMind TiM products** (derived from simulated S2 BOA reflectance; stored under `tim_generation_balanced/`):
|
| 19 |
+
- **S2L2A** (TiM-generated)
|
| 20 |
+
- **LULC** (TiM-generated, pixel-level)
|
| 21 |
+
- Plots and auxiliary outputs
|
| 22 |
+
- **Annotations**
|
| 23 |
+
- Urban vs. non-urban (image-level): **LLM**
|
| 24 |
+
- Urban vs. non-urban (image-level): **human**
|
| 25 |
+
- Textual description: **LLM**
|
| 26 |
+
|
| 27 |
+
### 1.2 clean (no-methane only)
|
| 28 |
+
- **261 patches** (neighboring patches; center patch excluded), **20 flights**
|
| 29 |
+
- **HSI**: AVIRIS-NG
|
| 30 |
+
- **RGB**: RGB renderings / visualizations aligned with the patches
|
| 31 |
+
- **Simulated Sentinel-2 (BOA reflectance)**: derived from HSI and stored under `simulated_s2_boareflclean/` (folder name preserved as exported)
|
| 32 |
+
- **TerraMind TiM products** (derived from simulated S2 BOA reflectance; stored under `tim_generation_clean/`):
|
| 33 |
+
- **S2L2A** (TiM-generated)
|
| 34 |
+
- **LULC** (TiM-generated, pixel-level)
|
| 35 |
+
- Plots and auxiliary outputs
|
| 36 |
+
- **Intuition-1 simulated data (clean only)**: additional simulated modality for extended ablations and robustness checks (see notes in Section 2)
|
| 37 |
+
- **Annotations**
|
| 38 |
+
- Urban vs. non-urban (image-level): **LLM**
|
| 39 |
+
- Urban vs. non-urban (image-level): **human**
|
| 40 |
+
- Textual description: **LLM**
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## 2. Folder structure
|
| 45 |
+
|
| 46 |
+
Top-level directories:
|
| 47 |
+
- `aviris_hsi_balanced/`
|
| 48 |
+
AVIRIS-NG hyperspectral patches for the balanced split.
|
| 49 |
+
- `aviris_hsi_clean/`
|
| 50 |
+
AVIRIS-NG hyperspectral patches for the clean (no-methane) split.
|
| 51 |
+
|
| 52 |
+
- `rgb_balanced/`
|
| 53 |
+
RGB images for the balanced split (aligned to patches).
|
| 54 |
+
- `rgb_clean/`
|
| 55 |
+
RGB images for the clean split (aligned to patches).
|
| 56 |
+
|
| 57 |
+
- `captions_balanced/`
|
| 58 |
+
LLM-generated text captions/descriptions for the balanced split.
|
| 59 |
+
- `captions_clean/`
|
| 60 |
+
LLM-generated text captions/descriptions for the clean split.
|
| 61 |
+
|
| 62 |
+
- `simulated_s2_boarefl_balanced/`
|
| 63 |
+
Simulated Sentinel-2 BOA reflectance images for the balanced split (simulated from HSI).
|
| 64 |
+
- `simulated_s2_boareflclean/`
|
| 65 |
+
Simulated Sentinel-2 BOA reflectance images for the clean split (simulated from HSI; folder name preserved as exported).
|
| 66 |
+
|
| 67 |
+
- `tim_generation_balanced/`
|
| 68 |
+
TerraMind TiM outputs generated from simulated S2 BOA reflectance (balanced split).
|
| 69 |
+
Contains (at least): `s2l2a/`, `lulc/`, `classes/`, `plots/`, and auxiliary files (e.g., a legend script).
|
| 70 |
+
- `tim_generation_clean/`
|
| 71 |
+
TerraMind TiM outputs generated from simulated S2 BOA reflectance (clean split).
|
| 72 |
+
Contains the same product types as the balanced split.
|
| 73 |
+
|
| 74 |
+
- `I1_simulation`
|
| 75 |
+
Additional Intuition-1 simulated data aligned with clean split patches.
|
| 76 |
+
|
| 77 |
+
Other files:
|
| 78 |
+
- `truth_false_labels.xlsx`
|
| 79 |
+
A compact label file (yes/no style) aggregating selected annotations (LLM, human, TiM classes), depending on your export.
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## 3. Labels and annotation sources
|
| 85 |
+
|
| 86 |
+
The dataset provides yes/no labels and/or categorical classes from the following sources:
|
| 87 |
+
|
| 88 |
+
### 3.1 LLM labels (image-level)
|
| 89 |
+
- Urban vs. non-urban classification at image/patch level
|
| 90 |
+
- Stored in the exported label file and/or per-sample metadata (depending on your pipeline)
|
| 91 |
+
|
| 92 |
+
### 3.2 Human labels (image-level)
|
| 93 |
+
- Urban vs. non-urban classification at image/patch level
|
| 94 |
+
- Available for at least the clean split (and optionally balanced, depending on the export)
|
| 95 |
+
|
| 96 |
+
### 3.3 TerraMind TiM products (pixel-level and per-image products)
|
| 97 |
+
- **S2L2A** generated by TerraMind TiM from simulated S2 BOA reflectance
|
| 98 |
+
- **LULC** (pixel-level) generated by TerraMind TiM from simulated S2 BOA reflectance
|
| 99 |
+
- Stored under `tim_generation_*` (subfolders `s2l2a/`, `lulc/`, and `classes/`)
|
| 100 |
+
|
| 101 |
+
---
|
| 102 |
+
|
| 103 |
+
## 4. Modality relationships
|
| 104 |
+
|
| 105 |
+
- **HSI (AVIRIS-NG)** is the primary observation modality.
|
| 106 |
+
- **RGB** is a visualization or derived view aligned to the same patch footprint.
|
| 107 |
+
- **Simulated Sentinel-2 BOA reflectance (S2 BOA refl)** is simulated from HSI and used as input to TiM/TerraMind.
|
| 108 |
+
- **S2L2A** is not directly stored as a standalone raw simulation in the root; it is produced by **TerraMind TiM** and stored inside `tim_generation_*`.
|
| 109 |
+
- **LULC** is produced by **TerraMind TiM** (pixel-level) and stored inside `tim_generation_*`.
|
| 110 |
+
- **Captions** provide text descriptions for multimodal experiments (retrieval, captioning, instruction-following, VLM/LLM alignment).
|
| 111 |
+
- **Intuition-1 simulated data** (clean only) provides an extra modality for robustness and domain-shift experiments.
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## 5. Warning
|
| 116 |
+
|
| 117 |
+
Before using check dataset class if there was any changes with naming convention of the files.
|
data/rgb_balanced.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:80eb453237fda6d1357641cc57e0ef7e2d2ca8ec8741b72fc56dd87bed13ea09
|
| 3 |
+
size 79739401
|
data/rgb_clean.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a95ad50a176d49b45fb06debb31dbd26b7872e5a9d2765fb585898c0e6c23375
|
| 3 |
+
size 107771981
|
data/simulated_s2_boarefl_balanced.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f9474513a0f13aca907674ae54fd71b875d1758e143037fae55e9f166ded6c51
|
| 3 |
+
size 92085964
|
data/simulated_s2_boarefl_clean.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4497cba5b8607b4da1918867daf157f172993648d8d2ef5567e8390e55783395
|
| 3 |
+
size 121582695
|
data/tim_generation_balanced.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b2b92cf4de8b1eff54d85a7feb3cf843f12ae87adbfc521226a5ff35cd28102e
|
| 3 |
+
size 208484041
|
data/tim_generation_clean.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:119d16495ebe0d14bac3c89b00341d4211b9b773475a6deb89e5ea51346fd4ff
|
| 3 |
+
size 284308187
|
data/truth_false_labels.xlsx
ADDED
|
Binary file (31.8 kB). View file
|
|
|