KPLabs commited on
Commit
bc7df31
·
verified ·
1 Parent(s): 3e10a2c

Upload folder using huggingface_hub

Browse files
data/I1_simulation.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:babf58d3c7b50f7b392891ae9b49679fc43366e1d432803cbe2c4f46ba3c8e12
3
+ size 2018739266
data/MDB.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a2a9d7ae1ae7674b894ced60dc2fb9d3aad70bb61359ecf8968d30c4aa9b4e1
3
+ size 5582289686
data/aviris_hsi_balanced.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:479bd99baeb8b5f5406ccc265fad9a2a6900ee91c41032f9a07f6a604a3e4e8b
3
+ size 55956860020
data/aviris_hsi_clean.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ae0c0bbba731a6015d478deab4501c2e55fb819037a40ed4d06eb7b3d0bfc17
3
+ size 34120339460
data/captions_balanced.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:767f55f5a74448368ee11dcad58e8cf67be82fa8ca0bdba734fd15d71e293169
3
+ size 95279
data/captions_clean.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa087c23b9da9bc91b5f4314f367976d3e9cfeefff64a834a784e85c4a1788a2
3
+ size 157411
data/readme.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Methane Benchmark Dataset (PINEAPPLE + Clean)
2
+
3
+ This folder contains the **Methane Benchmark Dataset** in two variants:
4
+ - **balanced**: a balanced mix of methane and non-methane patches
5
+ - **clean**: **no-methane only** (negative patches)
6
+
7
+ The dataset combines multiple modalities (HSI and RGB), **simulated Sentinel-2 BOA reflectance (S2 BOA refl)** derived from HSI, **TerraMind TiM-generated products** (including **S2L2A** and **LULC**), text captions, and labels produced by different sources (LLM, human, and TiM/TerraMind). The clean split additionally contains **Intuition-1 simulated data**.
8
+
9
+ ---
10
+
11
+ ## 1. Dataset overview
12
+
13
+ ### 1.1 balanced (PINEAPPLE: methane + non-methane)
14
+ - **178 patches**, **27 flights**
15
+ - **HSI**: AVIRIS-NG
16
+ - **RGB**: RGB renderings / visualizations aligned with the patches
17
+ - **Simulated Sentinel-2 (BOA reflectance)**: derived from HSI and stored under `simulated_s2_boarefl_balanced/`
18
+ - **TerraMind TiM products** (derived from simulated S2 BOA reflectance; stored under `tim_generation_balanced/`):
19
+ - **S2L2A** (TiM-generated)
20
+ - **LULC** (TiM-generated, pixel-level)
21
+ - Plots and auxiliary outputs
22
+ - **Annotations**
23
+ - Urban vs. non-urban (image-level): **LLM**
24
+ - Urban vs. non-urban (image-level): **human**
25
+ - Textual description: **LLM**
26
+
27
+ ### 1.2 clean (no-methane only)
28
+ - **261 patches** (neighboring patches; center patch excluded), **20 flights**
29
+ - **HSI**: AVIRIS-NG
30
+ - **RGB**: RGB renderings / visualizations aligned with the patches
31
+ - **Simulated Sentinel-2 (BOA reflectance)**: derived from HSI and stored under `simulated_s2_boareflclean/` (folder name preserved as exported)
32
+ - **TerraMind TiM products** (derived from simulated S2 BOA reflectance; stored under `tim_generation_clean/`):
33
+ - **S2L2A** (TiM-generated)
34
+ - **LULC** (TiM-generated, pixel-level)
35
+ - Plots and auxiliary outputs
36
+ - **Intuition-1 simulated data (clean only)**: additional simulated modality for extended ablations and robustness checks (see notes in Section 2)
37
+ - **Annotations**
38
+ - Urban vs. non-urban (image-level): **LLM**
39
+ - Urban vs. non-urban (image-level): **human**
40
+ - Textual description: **LLM**
41
+
42
+ ---
43
+
44
+ ## 2. Folder structure
45
+
46
+ Top-level directories:
47
+ - `aviris_hsi_balanced/`
48
+ AVIRIS-NG hyperspectral patches for the balanced split.
49
+ - `aviris_hsi_clean/`
50
+ AVIRIS-NG hyperspectral patches for the clean (no-methane) split.
51
+
52
+ - `rgb_balanced/`
53
+ RGB images for the balanced split (aligned to patches).
54
+ - `rgb_clean/`
55
+ RGB images for the clean split (aligned to patches).
56
+
57
+ - `captions_balanced/`
58
+ LLM-generated text captions/descriptions for the balanced split.
59
+ - `captions_clean/`
60
+ LLM-generated text captions/descriptions for the clean split.
61
+
62
+ - `simulated_s2_boarefl_balanced/`
63
+ Simulated Sentinel-2 BOA reflectance images for the balanced split (simulated from HSI).
64
+ - `simulated_s2_boareflclean/`
65
+ Simulated Sentinel-2 BOA reflectance images for the clean split (simulated from HSI; folder name preserved as exported).
66
+
67
+ - `tim_generation_balanced/`
68
+ TerraMind TiM outputs generated from simulated S2 BOA reflectance (balanced split).
69
+ Contains (at least): `s2l2a/`, `lulc/`, `classes/`, `plots/`, and auxiliary files (e.g., a legend script).
70
+ - `tim_generation_clean/`
71
+ TerraMind TiM outputs generated from simulated S2 BOA reflectance (clean split).
72
+ Contains the same product types as the balanced split.
73
+
74
+ - `I1_simulation`
75
+ Additional Intuition-1 simulated data aligned with clean split patches.
76
+
77
+ Other files:
78
+ - `truth_false_labels.xlsx`
79
+ A compact label file (yes/no style) aggregating selected annotations (LLM, human, TiM classes), depending on your export.
80
+
81
+
82
+ ---
83
+
84
+ ## 3. Labels and annotation sources
85
+
86
+ The dataset provides yes/no labels and/or categorical classes from the following sources:
87
+
88
+ ### 3.1 LLM labels (image-level)
89
+ - Urban vs. non-urban classification at image/patch level
90
+ - Stored in the exported label file and/or per-sample metadata (depending on your pipeline)
91
+
92
+ ### 3.2 Human labels (image-level)
93
+ - Urban vs. non-urban classification at image/patch level
94
+ - Available for at least the clean split (and optionally balanced, depending on the export)
95
+
96
+ ### 3.3 TerraMind TiM products (pixel-level and per-image products)
97
+ - **S2L2A** generated by TerraMind TiM from simulated S2 BOA reflectance
98
+ - **LULC** (pixel-level) generated by TerraMind TiM from simulated S2 BOA reflectance
99
+ - Stored under `tim_generation_*` (subfolders `s2l2a/`, `lulc/`, and `classes/`)
100
+
101
+ ---
102
+
103
+ ## 4. Modality relationships
104
+
105
+ - **HSI (AVIRIS-NG)** is the primary observation modality.
106
+ - **RGB** is a visualization or derived view aligned to the same patch footprint.
107
+ - **Simulated Sentinel-2 BOA reflectance (S2 BOA refl)** is simulated from HSI and used as input to TiM/TerraMind.
108
+ - **S2L2A** is not directly stored as a standalone raw simulation in the root; it is produced by **TerraMind TiM** and stored inside `tim_generation_*`.
109
+ - **LULC** is produced by **TerraMind TiM** (pixel-level) and stored inside `tim_generation_*`.
110
+ - **Captions** provide text descriptions for multimodal experiments (retrieval, captioning, instruction-following, VLM/LLM alignment).
111
+ - **Intuition-1 simulated data** (clean only) provides an extra modality for robustness and domain-shift experiments.
112
+
113
+ ---
114
+
115
+ ## 5. Warning
116
+
117
+ Before using check dataset class if there was any changes with naming convention of the files.
data/rgb_balanced.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80eb453237fda6d1357641cc57e0ef7e2d2ca8ec8741b72fc56dd87bed13ea09
3
+ size 79739401
data/rgb_clean.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a95ad50a176d49b45fb06debb31dbd26b7872e5a9d2765fb585898c0e6c23375
3
+ size 107771981
data/simulated_s2_boarefl_balanced.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9474513a0f13aca907674ae54fd71b875d1758e143037fae55e9f166ded6c51
3
+ size 92085964
data/simulated_s2_boarefl_clean.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4497cba5b8607b4da1918867daf157f172993648d8d2ef5567e8390e55783395
3
+ size 121582695
data/tim_generation_balanced.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2b92cf4de8b1eff54d85a7feb3cf843f12ae87adbfc521226a5ff35cd28102e
3
+ size 208484041
data/tim_generation_clean.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:119d16495ebe0d14bac3c89b00341d4211b9b773475a6deb89e5ea51346fd4ff
3
+ size 284308187
data/truth_false_labels.xlsx ADDED
Binary file (31.8 kB). View file