anway commited on
Commit
05fdb87
·
verified ·
1 Parent(s): 93818e9

h5ad_viewer

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ data/Mouse_Adult_Brain_M9_70_15um_adata_brain_15um.h5ad filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Spatial Transcriptomics Viewer Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,14 +1,303 @@
1
  ---
2
- title: Spatial Omics Viewer
3
- emoji: 📚
4
- colorFrom: red
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.2.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: Visualize spatial expression from .h5ad files (AnnData forma
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Spatial Transcriptomics Viewer
3
+ emoji: 🧬
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.0.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
11
  ---
12
 
13
+ # Spatial Transcriptomics Viewer
14
+
15
+ A web-based tool for visualizing spatial gene expression from AnnData (.h5ad) files.
16
+
17
+ ## Features
18
+
19
+ - **Interactive Visualization**: Explore spatial gene expression with interactive Plotly plots
20
+ - **Memory Efficient**: Uses AnnData backed mode for handling large datasets
21
+ - **Flexible Input**: Load data from URLs (HuggingFace, Zenodo) or upload files
22
+ - **Single-Gene Queries**: Visualize expression of individual genes across spatial coordinates
23
+ - **Expression Statistics**: Get detailed statistics for each gene
24
+ - **Customizable**: Adjust point size, color scale, and transformations
25
+
26
+ ## Quick Start
27
+
28
+ ### Using the Public Demo
29
+
30
+ 1. Visit the Space URL
31
+ 2. Load your data:
32
+ - **URL**: Paste a link to your h5ad file
33
+ - **Upload**: Upload your h5ad file directly (< 2GB recommended)
34
+ 3. Enter a gene name and visualize!
35
+
36
+ ### For Heavy Usage: Duplicate This Space
37
+
38
+ For large files or frequent use, we recommend duplicating this Space to your account:
39
+
40
+ 1. Click the **⋮** menu at the top right
41
+ 2. Select **"Duplicate this Space"**
42
+ 3. Choose your HuggingFace account
43
+ 4. (Optional) Upgrade to persistent storage for better performance
44
+
45
+ **Benefits of Duplicating:**
46
+ - Independent computing resources
47
+ - No queueing with other users
48
+ - Private data processing
49
+ - Customizable settings
50
+ - Optional paid upgrades for more resources
51
+
52
+ ## Data Requirements
53
+
54
+ Your h5ad file must contain:
55
+ - `adata.obsm['spatial']`: 2D spatial coordinates (N × 2 array)
56
+ - Gene expression data in `adata.X`
57
+ - Gene names in `adata.var_names`
58
+
59
+ **Supported formats:**
60
+ - Visium (10x Genomics)
61
+ - MERFISH
62
+ - seqFISH
63
+ - Any spatial transcriptomics data in AnnData format
64
+
65
+ ## How It Works
66
+
67
+ ### Architecture
68
+
69
+ ```
70
+ User Input (URL/Upload)
71
+
72
+ Load h5ad with backed='r' (memory efficient)
73
+
74
+ Validate spatial coordinates
75
+
76
+ Query single gene expression
77
+
78
+ Plotly interactive visualization
79
+ ```
80
+
81
+ ### Memory Efficiency
82
+
83
+ This tool uses AnnData's **backed mode** (`backed='r'`), which means:
84
+ - Files are read from disk on-demand
85
+ - Only requested data is loaded into memory
86
+ - Can handle files much larger than available RAM
87
+ - Suitable for large-scale spatial transcriptomics datasets
88
+
89
+ ## Technical Details
90
+
91
+ ### Stack
92
+ - **Frontend**: Gradio 4.0+
93
+ - **Backend**: Python 3.9+
94
+ - **Data**: AnnData, scanpy
95
+ - **Visualization**: Plotly
96
+ - **Platform**: Hugging Face Spaces
97
+
98
+ ### File Size Limits
99
+
100
+ **Public Space:**
101
+ - Recommended: < 2GB
102
+ - Maximum: ~10GB (may be slow)
103
+
104
+ **Duplicated Space (Free):**
105
+ - Recommended: < 5GB
106
+ - With persistent storage upgrade: 50GB+
107
+
108
+ ### URL Sources
109
+
110
+ Supported domains for URL input:
111
+ - `huggingface.co` - HuggingFace Datasets
112
+ - `zenodo.org` - Zenodo repositories
113
+ - `s3.amazonaws.com` - S3 buckets
114
+
115
+ ## Usage Examples
116
+
117
+ ### Example 1: Visualize from HuggingFace Dataset
118
+
119
+ ```python
120
+ # If you have a h5ad file in a HuggingFace dataset:
121
+ URL = "https://huggingface.co/datasets/{username}/{dataset}/resolve/main/data.h5ad"
122
+
123
+ # Paste this URL in the tool and load
124
+ # Then enter gene names like: "GAPDH", "ACTB", "MYC"
125
+ ```
126
+
127
+ ### Example 2: Prepare Your Own Data
128
+
129
+ ```python
130
+ import scanpy as sc
131
+ import numpy as np
132
+
133
+ # Load your data
134
+ adata = sc.read_10x_h5("your_data.h5")
135
+
136
+ # Add spatial coordinates (if not already present)
137
+ # Example: load from spatial folder
138
+ spatial = sc.read_visium("path/to/spatial_folder")
139
+ adata.obsm['spatial'] = spatial.obsm['spatial']
140
+
141
+ # Save as h5ad
142
+ adata.write("your_spatial_data.h5ad")
143
+
144
+ # Upload to HuggingFace Dataset or use directly
145
+ ```
146
+
147
+ ## Privacy & Data Security
148
+
149
+ ### Public Space
150
+ - Files are processed in **temporary storage**
151
+ - No permanent data retention
152
+ - Cleared after session ends
153
+ - Not suitable for sensitive data
154
+
155
+ ### Duplicated Private Space
156
+ - Data stays in your account
157
+ - Full control over access
158
+ - Suitable for private research data
159
+ - Can delete anytime
160
+
161
+ ## Limitations
162
+
163
+ - **No preprocessing**: Tool does not normalize, scale, or transform data
164
+ - **Read-only**: Cannot modify or save h5ad files
165
+ - **Single gene**: Visualize one gene at a time
166
+ - **2D spatial only**: Requires 2D coordinates in `obsm['spatial']`
167
+
168
+ ## Troubleshooting
169
+
170
+ ### "Spatial coordinates not found"
171
+ - Check that your h5ad contains `adata.obsm['spatial']`
172
+ - Ensure it's a 2D array (N × 2)
173
+
174
+ ### "Gene not found"
175
+ - Check gene name spelling
176
+ - Use exact gene names from `adata.var_names`
177
+ - Tool will suggest similar gene names
178
+
179
+ ### "File too large" or slow loading
180
+ - Try duplicating the Space for more resources
181
+ - Consider subsetting your data
182
+ - Use URL input instead of upload
183
+
184
+ ### Memory errors
185
+ - Ensure backed mode is working (check file size limits)
186
+ - Duplicate Space for more RAM
187
+ - Consider downsampling your dataset
188
+
189
+ ## Development
190
+
191
+ ### Local Setup
192
+
193
+ ```bash
194
+ # Clone the repository
195
+ git clone <repo_url>
196
+ cd spatial-viewer
197
+
198
+ # Install dependencies
199
+ pip install -r requirements.txt
200
+
201
+ # Run locally
202
+ python app.py
203
+ ```
204
+
205
+ ### Project Structure
206
+
207
+ ```
208
+ spatial-viewer/
209
+ ├── app.py # Main Gradio application
210
+ ├── utils/
211
+ │ ├── __init__.py
212
+ │ ├── loader.py # H5ad loading with backed mode
213
+ │ ├── validator.py # AnnData validation
214
+ │ └── plot.py # Plotly visualization
215
+ ├── data/
216
+ │ └── demo.h5ad # (Optional) Demo dataset
217
+ ├── requirements.txt # Python dependencies
218
+ ├── README.md # This file
219
+ └── .huggingface/
220
+ └── space_config.yaml # HF Space configuration
221
+ ```
222
+
223
+ ## Contributing
224
+
225
+ Contributions welcome! Areas for improvement:
226
+ - Multi-gene visualization
227
+ - Additional plot types
228
+ - Performance optimizations
229
+ - UI enhancements
230
+ - Documentation
231
+
232
+ ## Citation
233
+
234
+ If you use this tool in your research, please cite:
235
+
236
+ ```bibtex
237
+ @software{spatial_viewer,
238
+ title = {Spatial Transcriptomics Viewer},
239
+ author = {Your Name},
240
+ year = {2025},
241
+ url = {https://huggingface.co/spaces/...}
242
+ }
243
+ ```
244
+
245
+ ## License
246
+
247
+ MIT License - see LICENSE file for details
248
+
249
+ ## Acknowledgments
250
+
251
+ - Built with [Gradio](https://gradio.app/)
252
+ - Uses [AnnData](https://anndata.readthedocs.io/) and [Scanpy](https://scanpy.readthedocs.io/)
253
+ - Hosted on [Hugging Face Spaces](https://huggingface.co/spaces)
254
+
255
+ ---
256
+
257
+ ## 中文说明
258
+
259
+ ### 功能特点
260
+
261
+ 这是一个基于网页的空间转录组基因表达可视化工具,支持 AnnData (.h5ad) 格式。
262
+
263
+ **主要特性:**
264
+ - 交互式可视化
265
+ - 内存高效(支持大文件)
266
+ - 灵活的输入方式(URL 或上传)
267
+ - 单基因表达查询
268
+ - 表达量统计分析
269
+
270
+ ### 使用方法
271
+
272
+ 1. **加载数据**:通过 URL 或上传 h5ad 文件
273
+ 2. **输入基因名**:输入您想查看的基因
274
+ 3. **可视化**:查看空间表达图和统计信息
275
+
276
+ ### 大文件或高频使用
277
+
278
+ 对于大型 h5ad 文件(>2GB)或频繁使用,建议 **复制此 Space** 到您的账户:
279
+ - 独立计算资源
280
+ - 无需排队
281
+ - 数据隐私保护
282
+ - 可选付费升级
283
+
284
+ ### 数据要求
285
+
286
+ 您的 h5ad 文件必须包含:
287
+ - `adata.obsm['spatial']`:空间坐标(N × 2)
288
+ - `adata.X`:基因表达数据
289
+ - `adata.var_names`:基因名称
290
+
291
+ 支持 Visium、MERFISH、seqFISH 等格式。
292
+
293
+ ### 技术原理
294
+
295
+ 使用 AnnData 的 **backed 模式**(`backed='r'`):
296
+ - 按需从磁盘读取数据
297
+ - 内存占用最小化
298
+ - 可处理大于内存的文件
299
+ - 适合大规模空间转录组数据
300
+
301
+ ---
302
+
303
+ **为空间转录组研究社区构建** 🧬
app.py ADDED
@@ -0,0 +1,1494 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ import io
4
+ import zipfile
5
+ import tempfile
6
+ import csv
7
+ import datetime
8
+ from pathlib import Path
9
+ from typing import Optional, Tuple, List, Dict
10
+ import numpy as np
11
+ import plotly.graph_objects as go
12
+
13
+ from utils.loader import H5adLoader
14
+ from utils.validator import AnnDataValidator
15
+ from utils.plot import SpatialPlotter, SpatialImageExtractor
16
+ from utils.data_source_manager import DataSourceManager
17
+
18
+
19
+ class SpatialViewer:
20
+ """Main application class for spatial transcriptomics viewer"""
21
+
22
+ # Default demo dataset to load on startup
23
+ DEFAULT_DEMO = "Cerebellum-MALDI-MSI.h5ad"
24
+
25
+ def __init__(self):
26
+ self.data_manager = DataSourceManager()
27
+ self.current_source = None
28
+
29
+ def load_default_demo(self) -> Tuple[str, Optional[gr.Plot], gr.update, gr.update, str]:
30
+ """
31
+ Load default demo dataset on app startup
32
+
33
+ Returns:
34
+ Tuple of (status, overview_plot, selector_update, row_visibility, dataset_info)
35
+ """
36
+ demo_path = Path("data") / self.DEFAULT_DEMO
37
+ if not demo_path.exists():
38
+ return (
39
+ "Demo dataset not found. Please load data manually.",
40
+ None,
41
+ gr.update(),
42
+ gr.update(visible=False),
43
+ "No dataset loaded"
44
+ )
45
+
46
+ try:
47
+ adata = H5adLoader.load_from_source(str(demo_path))
48
+
49
+ # Validate data
50
+ is_valid, errors = AnnDataValidator.validate(adata)
51
+ if not is_valid:
52
+ return (
53
+ "Demo dataset validation failed: " + "; ".join(errors),
54
+ None,
55
+ gr.update(),
56
+ gr.update(visible=False),
57
+ "No dataset loaded"
58
+ )
59
+
60
+ # Add to data manager
61
+ source_id = self.data_manager.add_source(
62
+ name=self.DEFAULT_DEMO,
63
+ source_type="demo",
64
+ source_path=str(demo_path),
65
+ adata=adata
66
+ )
67
+
68
+ # Create overview plot
69
+ spatial_coords = adata.obsm["spatial"]
70
+ overview_fig = SpatialPlotter.create_overview_plot(spatial_coords)
71
+
72
+ status = (
73
+ f"✅ Auto-loaded demo dataset!\n"
74
+ f"- Dataset: {self.DEFAULT_DEMO}\n"
75
+ f"- Observations (spots/cells): {adata.n_obs:,}\n"
76
+ f"- Variables (genes): {adata.n_vars:,}\n"
77
+ f"- Spatial coordinates: {spatial_coords.shape}\n"
78
+ f"\nReady to visualize gene expression. Switch to 'Visualize Gene' tab."
79
+ )
80
+
81
+ # Dataset selector update
82
+ choices = self.data_manager.get_source_choices()
83
+ selector_update = gr.update(
84
+ choices=choices,
85
+ value=self.data_manager.current_id,
86
+ visible=True
87
+ )
88
+
89
+ # Dataset info for Visualize tab
90
+ current_source = self.data_manager.get_current_source()
91
+ dataset_info = f"📊 Current: {current_source.name}\n({current_source.n_obs:,} cells, {current_source.n_vars:,} genes)"
92
+
93
+ return status, overview_fig, selector_update, gr.update(visible=True), dataset_info
94
+
95
+ except Exception as e:
96
+ return (
97
+ f"Failed to load demo dataset: {str(e)}",
98
+ None,
99
+ gr.update(),
100
+ gr.update(visible=False),
101
+ "No dataset loaded"
102
+ )
103
+
104
+ def load_data(
105
+ self, source_type: str, demo_dataset: Optional[str] = None, url: Optional[str] = None, file_path: Optional[str] = None
106
+ ) -> Tuple[str, Optional[gr.Plot], gr.update]:
107
+ """
108
+ Load h5ad data from various sources
109
+ Now supports ZIP files containing multiple h5ad files
110
+
111
+ Args:
112
+ source_type: Type of source ('demo', 'url', 'upload')
113
+ demo_dataset: Selected demo dataset name (if source_type is 'demo')
114
+ url: URL to h5ad file (if source_type is 'url')
115
+ file_path: Path to uploaded file (if source_type is 'upload')
116
+
117
+ Returns:
118
+ Tuple of (status_message, overview_plot, dataset_selector_update)
119
+ """
120
+ try:
121
+ # Determine source
122
+ if source_type == "demo":
123
+ if not demo_dataset:
124
+ return "Please select a demo dataset.", None, gr.update()
125
+ demo_path = Path("data") / demo_dataset
126
+ if not demo_path.exists():
127
+ return f"Demo dataset not found: {demo_dataset}", None, gr.update()
128
+ source = str(demo_path)
129
+ display_name = demo_dataset
130
+
131
+ elif source_type == "url":
132
+ if not url or url.strip() == "":
133
+ return "Please provide a valid URL.", None, gr.update()
134
+ source = url.strip()
135
+ display_name = source.split("/")[-1] or "URL Dataset"
136
+
137
+ elif source_type == "upload":
138
+ if not file_path:
139
+ return "Please upload a file.", None, gr.update()
140
+ source = file_path
141
+ display_name = Path(file_path).name
142
+
143
+ else:
144
+ return f"Unknown source type: {source_type}", None, gr.update()
145
+
146
+ # Load data
147
+ loaded_data = H5adLoader.load_from_source(source)
148
+
149
+ # Handle multiple datasets (from ZIP file)
150
+ if isinstance(loaded_data, list):
151
+ # Multiple h5ad files loaded from ZIP
152
+ status_messages = []
153
+ loaded_count = 0
154
+
155
+ for idx, adata in enumerate(loaded_data):
156
+ # Validate each dataset
157
+ is_valid, errors = AnnDataValidator.validate(adata)
158
+ if not is_valid:
159
+ status_messages.append(
160
+ f"Dataset {idx + 1} validation failed:\n" + "\n".join(f" - {e}" for e in errors)
161
+ )
162
+ continue
163
+
164
+ # Add to data manager
165
+ file_name = f"{display_name} - Part {idx + 1}"
166
+ source_id = self.data_manager.add_source(
167
+ name=file_name,
168
+ source_type=source_type,
169
+ source_path=source,
170
+ adata=adata
171
+ )
172
+ loaded_count += 1
173
+
174
+ if loaded_count == 0:
175
+ return "No valid datasets found in ZIP file.\n" + "\n".join(status_messages), None, gr.update()
176
+
177
+ # Get current (latest loaded) dataset
178
+ current_source = self.data_manager.get_current_source()
179
+ spatial_coords = current_source.adata.obsm["spatial"]
180
+ overview_fig = SpatialPlotter.create_overview_plot(spatial_coords)
181
+
182
+ status = (
183
+ f"Successfully loaded {loaded_count} dataset(s) from ZIP file!\n\n"
184
+ f"Current dataset: {current_source.name}\n"
185
+ f"- Observations (spots/cells): {current_source.n_obs:,}\n"
186
+ f"- Variables (genes): {current_source.n_vars:,}\n"
187
+ f"- Spatial coordinates: {spatial_coords.shape}\n"
188
+ f"\nUse the dataset selector above to switch between datasets.\n"
189
+ f"Ready to visualize gene expression."
190
+ )
191
+
192
+ else:
193
+ # Single h5ad file
194
+ adata = loaded_data
195
+
196
+ # Validate data
197
+ is_valid, errors = AnnDataValidator.validate(adata)
198
+ if not is_valid:
199
+ error_msg = "Validation errors:\n" + "\n".join(f"- {e}" for e in errors)
200
+ return error_msg, None, gr.update()
201
+
202
+ # Add to data manager
203
+ source_id = self.data_manager.add_source(
204
+ name=display_name,
205
+ source_type=source_type,
206
+ source_path=source,
207
+ adata=adata
208
+ )
209
+
210
+ # Create overview plot
211
+ spatial_coords = adata.obsm["spatial"]
212
+ overview_fig = SpatialPlotter.create_overview_plot(spatial_coords)
213
+
214
+ status = (
215
+ f"Successfully loaded data!\n"
216
+ f"- Dataset: {display_name}\n"
217
+ f"- Observations (spots/cells): {adata.n_obs:,}\n"
218
+ f"- Variables (genes): {adata.n_vars:,}\n"
219
+ f"- Spatial coordinates: {spatial_coords.shape}\n"
220
+ f"\nReady to visualize gene expression."
221
+ )
222
+
223
+ # Update dataset selector
224
+ choices = self.data_manager.get_source_choices()
225
+ selector_update = gr.update(
226
+ choices=choices,
227
+ value=self.data_manager.current_id,
228
+ visible=True
229
+ )
230
+
231
+ return status, overview_fig, selector_update
232
+
233
+ except Exception as e:
234
+ return f"Error loading data: {str(e)}", None, gr.update()
235
+
236
+ def switch_dataset(self, source_id: str) -> Tuple[str, Optional[gr.Plot]]:
237
+ """
238
+ Switch to a different loaded dataset
239
+
240
+ Args:
241
+ source_id: ID of the dataset to switch to
242
+
243
+ Returns:
244
+ Tuple of (info_message, overview_plot)
245
+ """
246
+ if not source_id:
247
+ return "No dataset selected.", None
248
+
249
+ success = self.data_manager.set_current(source_id)
250
+ if not success:
251
+ return f"Dataset not found: {source_id}", None
252
+
253
+ current_source = self.data_manager.get_current_source()
254
+ spatial_coords = current_source.adata.obsm["spatial"]
255
+ overview_fig = SpatialPlotter.create_overview_plot(spatial_coords)
256
+
257
+ info = current_source.get_info()
258
+ return info, overview_fig
259
+
260
+ def visualize_gene(
261
+ self,
262
+ gene_name: str,
263
+ point_size: int = 5,
264
+ use_log: bool = True,
265
+ colorscale: str = "Viridis",
266
+ show_background: bool = False,
267
+ background_opacity: float = 0.5,
268
+ ) -> Tuple[str, Optional[gr.Plot], str, str]:
269
+ """
270
+ Visualize gene expression in spatial context
271
+ """
272
+ current_source = self.data_manager.get_current_source()
273
+
274
+ if current_source is None:
275
+ return "❌ Please load data first.", None, "", ""
276
+
277
+ if current_source.adata is None:
278
+ return "❌ Dataset registered but not yet loaded. Please select it in the 'Select Dataset' tab first.", None, "", ""
279
+
280
+ if not gene_name or gene_name.strip() == "":
281
+ return "❓ Please enter a gene name.", None, "", ""
282
+
283
+ gene_name = gene_name.strip()
284
+
285
+ try:
286
+ adata = current_source.adata
287
+
288
+ # Get gene expression
289
+ expression = AnnDataValidator.get_gene_expression(adata, gene_name)
290
+
291
+ # Get spatial coordinates
292
+ spatial_coords = adata.obsm["spatial"]
293
+
294
+ # Extract background image from h5ad if requested
295
+ background_image = None
296
+ scalefactors = None
297
+ bg_status = ""
298
+
299
+ if show_background:
300
+ result = SpatialImageExtractor.get_spatial_image(adata, prefer_lowres=True)
301
+ if result is not None:
302
+ background_image, scalefactors, image_key = result
303
+ # Pass image_key to scalefactors so plot knows which scale to use
304
+ scalefactors = dict(scalefactors) # Make a copy
305
+ scalefactors['_image_key'] = image_key
306
+ bg_status = f" (with {image_key} tissue background)"
307
+ else:
308
+ bg_status = " (no background image in h5ad)"
309
+
310
+ # Create plot
311
+ fig = SpatialPlotter.plot_spatial_gene(
312
+ spatial_coords=spatial_coords,
313
+ expression=expression,
314
+ gene_name=gene_name,
315
+ point_size=point_size,
316
+ use_log=use_log,
317
+ colorscale=colorscale,
318
+ background_image=background_image,
319
+ scalefactors=scalefactors,
320
+ background_opacity=background_opacity,
321
+ )
322
+
323
+ # Get statistics
324
+ stats = SpatialPlotter.get_expression_stats(expression)
325
+ stats_text = (
326
+ f"Expression Statistics for {gene_name}:\n"
327
+ f"- Min: {stats['min']:.4f}\n"
328
+ f"- Max: {stats['max']:.4f}\n"
329
+ f"- Mean: {stats['mean']:.4f}\n"
330
+ f"- Median: {stats['median']:.4f}\n"
331
+ f"- Std Dev: {stats['std']:.4f}\n"
332
+ f"- Non-zero: {stats['non_zero_count']:,} ({stats['non_zero_percent']:.1f}%)"
333
+ )
334
+
335
+ # Current dataset info
336
+ dataset_info = f"Current dataset: {current_source.name}\n({current_source.n_obs:,} cells, {current_source.n_vars:,} genes)"
337
+
338
+ return f"Successfully visualized gene: {gene_name}{bg_status}", fig, stats_text, dataset_info
339
+
340
+ except ValueError as e:
341
+ return str(e), None, "", ""
342
+ except Exception as e:
343
+ return f"Error visualizing gene: {str(e)}", None, "", ""
344
+
345
+ def check_spatial_image_available(self) -> bool:
346
+ """Check if current dataset has spatial background image"""
347
+ current_source = self.data_manager.get_current_source()
348
+ if current_source is None or current_source.adata is None:
349
+ return False
350
+ return SpatialImageExtractor.has_spatial_image(current_source.adata)
351
+
352
+ def get_gene_suggestions(self, limit: int = 100) -> list:
353
+ """Get list of available genes for autocomplete"""
354
+ current_source = self.data_manager.get_current_source()
355
+ if current_source is None or current_source.adata is None:
356
+ return []
357
+ return AnnDataValidator.get_gene_list(current_source.adata, limit=limit)
358
+
359
+ def get_current_dataset_info(self) -> str:
360
+ """Get formatted info string for current dataset"""
361
+ current_source = self.data_manager.get_current_source()
362
+ if current_source is None:
363
+ return "No dataset loaded. Please load data first."
364
+ if current_source.adata is None:
365
+ return f"📊 Current: {current_source.name}\n(Not yet loaded)"
366
+ return f"📊 Current: {current_source.name}\n({current_source.n_obs:,} cells, {current_source.n_vars:,} genes)"
367
+
368
+ def get_all_genes(self) -> List[str]:
369
+ """Get full list of genes for autocomplete dropdown"""
370
+ current_source = self.data_manager.get_current_source()
371
+ if current_source is None or current_source.adata is None:
372
+ return []
373
+ return list(current_source.adata.var_names)
374
+
375
+ def search_genes(self, query: str, limit: int = 50) -> List[str]:
376
+ """
377
+ Search genes by prefix or substring match
378
+ """
379
+ current_source = self.data_manager.get_current_source()
380
+ if current_source is None or current_source.adata is None:
381
+ return []
382
+
383
+ if not query or query.strip() == "":
384
+ # Return first N genes if no query
385
+ return list(current_source.adata.var_names[:limit])
386
+
387
+ query = query.strip().upper()
388
+ all_genes = list(current_source.adata.var_names)
389
+
390
+ # First: exact prefix matches (prioritized)
391
+ prefix_matches = [g for g in all_genes if g.upper().startswith(query)]
392
+
393
+ # Second: substring matches (lower priority)
394
+ substring_matches = [g for g in all_genes if query in g.upper() and g not in prefix_matches]
395
+
396
+ # Combine and limit
397
+ results = prefix_matches + substring_matches
398
+ return results[:limit]
399
+
400
+ def get_adata_summary(self) -> str:
401
+ """
402
+ Get detailed summary of current AnnData object
403
+
404
+ Returns:
405
+ Formatted string with h5ad file details
406
+ """
407
+ current_source = self.data_manager.get_current_source()
408
+ if current_source is None:
409
+ return "No dataset loaded"
410
+
411
+ if current_source.adata is None:
412
+ return f"📊 **{current_source.name}**\n\n*Dataset registered but not yet loaded. Select it in the list to load.*"
413
+
414
+ adata = current_source.adata
415
+
416
+ lines = []
417
+ lines.append(f"📊 **{current_source.name}**")
418
+ lines.append("")
419
+
420
+ # Basic info
421
+ lines.append("### 📈 Dimensions")
422
+ lines.append(f"- Observations (cells/spots): **{adata.n_obs:,}**")
423
+ lines.append(f"- Variables (features): **{adata.n_vars:,}**")
424
+
425
+ # Spatial coordinates
426
+ if "spatial" in adata.obsm:
427
+ spatial_shape = adata.obsm["spatial"].shape
428
+ lines.append(f"- Spatial coordinates: **{spatial_shape}**")
429
+
430
+ lines.append("")
431
+
432
+ # Variables info (first 5)
433
+ lines.append("### 🧬 Variables (first 5)")
434
+ var_names = list(adata.var_names[:5])
435
+ lines.append(f"`{', '.join(var_names)}`")
436
+ if adata.n_vars > 5:
437
+ lines.append(f"... and {adata.n_vars - 5:,} more")
438
+
439
+ lines.append("")
440
+
441
+ # obsm keys
442
+ if len(adata.obsm.keys()) > 0:
443
+ lines.append("### 📍 obsm (embeddings)")
444
+ for key in list(adata.obsm.keys())[:5]:
445
+ shape = adata.obsm[key].shape
446
+ lines.append(f"- `{key}`: {shape}")
447
+
448
+ # obsp keys
449
+ if hasattr(adata, 'obsp') and len(adata.obsp.keys()) > 0:
450
+ lines.append("")
451
+ lines.append("### 🔗 obsp (pairwise)")
452
+ for key in list(adata.obsp.keys())[:3]:
453
+ lines.append(f"- `{key}`")
454
+
455
+ # uns keys
456
+ if len(adata.uns.keys()) > 0:
457
+ lines.append("")
458
+ lines.append("### 📦 uns (unstructured)")
459
+ uns_keys = list(adata.uns.keys())[:6]
460
+ lines.append(f"`{', '.join(uns_keys)}`")
461
+ if len(adata.uns.keys()) > 6:
462
+ lines.append(f"... and {len(adata.uns.keys()) - 6} more")
463
+
464
+ # Check for spatial image
465
+ lines.append("")
466
+ lines.append("### 🖼️ Spatial Image")
467
+ if SpatialImageExtractor.has_spatial_image(adata):
468
+ libs = SpatialImageExtractor.get_available_libraries(adata)
469
+ lines.append(f"✅ Available (libraries: {', '.join(libs)})")
470
+ else:
471
+ lines.append("❌ Not available")
472
+
473
+ return "\n".join(lines)
474
+
475
+ def get_local_h5ad_files(self) -> List[str]:
476
+ """Get list of h5ad files in the data folder"""
477
+ data_dir = Path("data")
478
+ if not data_dir.exists():
479
+ return []
480
+ return [f.name for f in data_dir.glob("*.h5ad")]
481
+
482
+ def create_overview_with_background(self) -> Optional[go.Figure]:
483
+ """Create spatial overview plot with tissue background if available"""
484
+ current_source = self.data_manager.get_current_source()
485
+ if current_source is None or current_source.adata is None:
486
+ return None
487
+
488
+ adata = current_source.adata
489
+ spatial_coords = adata.obsm["spatial"]
490
+
491
+ # Try to get background image
492
+ background_image = None
493
+ scalefactors = None
494
+
495
+ result = SpatialImageExtractor.get_spatial_image(adata, prefer_lowres=True)
496
+ if result is not None:
497
+ background_image, scalefactors, image_key = result
498
+ scalefactors = dict(scalefactors)
499
+ scalefactors['_image_key'] = image_key
500
+
501
+ # Create overview plot with background
502
+ return SpatialPlotter.create_overview_plot_with_background(
503
+ spatial_coords=spatial_coords,
504
+ background_image=background_image,
505
+ scalefactors=scalefactors,
506
+ )
507
+
508
+ def parse_variables_list(self, input_text: str) -> Tuple[List[str], List[str], List[str]]:
509
+ """
510
+ Parse comma/space/newline separated variables list
511
+
512
+ Args:
513
+ input_text: Raw input text with variable names
514
+
515
+ Returns:
516
+ Tuple of (found_features, not_found_features, all_parsed)
517
+ """
518
+ current_source = self.data_manager.get_current_source()
519
+ if current_source is None:
520
+ return [], [], []
521
+
522
+ if not input_text or input_text.strip() == "":
523
+ return [], [], []
524
+
525
+ # Parse: split by comma, space, newline, tab
526
+ import re
527
+ raw_items = re.split(r'[,\s\n\t]+', input_text.strip())
528
+ all_parsed = [item.strip() for item in raw_items if item.strip()]
529
+
530
+ # Check which features exist in dataset
531
+ available_genes = set(current_source.adata.var_names)
532
+ found_features = [g for g in all_parsed if g in available_genes]
533
+ not_found_features = [g for g in all_parsed if g not in available_genes]
534
+
535
+ return found_features, not_found_features, all_parsed
536
+
537
+ def batch_visualize(
538
+ self,
539
+ variables_text: str,
540
+ point_size: int = 5,
541
+ use_log: bool = True,
542
+ colorscale: str = "Viridis",
543
+ show_background: bool = False,
544
+ background_opacity: float = 0.5,
545
+ progress=gr.Progress(track_tqdm=True),
546
+ ) -> Tuple[str, Optional[str], str, str]:
547
+ """
548
+ Perform batch visualization for multiple features
549
+
550
+ Args:
551
+ variables_text: Comma/space/newline separated feature names
552
+ point_size, use_log, colorscale, show_background, background_opacity: Plot settings
553
+ progress: Gradio progress tracker
554
+
555
+ Returns:
556
+ Tuple of (status, zip_file_path, summary_report, stats_csv)
557
+ """
558
+ current_source = self.data_manager.get_current_source()
559
+ if current_source is None:
560
+ return "❌ No dataset loaded. Please load data first.", None, "", ""
561
+
562
+ found_features, not_found_features, all_parsed = self.parse_variables_list(variables_text)
563
+
564
+ if not found_features:
565
+ return f"❌ No valid features found in dataset.\nParsed: {', '.join(all_parsed)}", None, "", ""
566
+
567
+ # Prepare output
568
+ adata = current_source.adata
569
+ spatial_coords = adata.obsm["spatial"]
570
+
571
+ # Get background image if needed
572
+ background_image = None
573
+ scalefactors = None
574
+ if show_background:
575
+ result = SpatialImageExtractor.get_spatial_image(adata, prefer_lowres=True)
576
+ if result is not None:
577
+ background_image, scalefactors, image_key = result
578
+ scalefactors = dict(scalefactors)
579
+ scalefactors['_image_key'] = image_key
580
+
581
+ # Create temp directory for outputs
582
+ temp_dir = tempfile.mkdtemp(prefix="batch_viz_")
583
+
584
+ # Track results
585
+ stats_records = []
586
+ successful_plots = []
587
+ failed_features = []
588
+
589
+ # Generate plots
590
+ total = len(found_features)
591
+ for idx, gene_name in enumerate(found_features):
592
+ progress((idx + 1) / total, desc=f"Processing {gene_name} ({idx + 1}/{total})")
593
+
594
+ try:
595
+ # Get expression
596
+ expression = AnnDataValidator.get_gene_expression(adata, gene_name)
597
+
598
+ # Create plot
599
+ fig = SpatialPlotter.plot_spatial_gene(
600
+ spatial_coords=spatial_coords,
601
+ expression=expression,
602
+ gene_name=gene_name,
603
+ point_size=point_size,
604
+ use_log=use_log,
605
+ colorscale=colorscale,
606
+ background_image=background_image,
607
+ scalefactors=scalefactors,
608
+ background_opacity=background_opacity,
609
+ )
610
+
611
+ # Save as PNG
612
+ png_path = os.path.join(temp_dir, f"{gene_name}.png")
613
+ fig.write_image(png_path, scale=2)
614
+ successful_plots.append((gene_name, png_path))
615
+
616
+ # Get statistics
617
+ stats = SpatialPlotter.get_expression_stats(expression)
618
+ stats['feature'] = gene_name
619
+ stats_records.append(stats)
620
+
621
+ except Exception as e:
622
+ failed_features.append((gene_name, str(e)))
623
+
624
+ # Generate summary report
625
+ report_lines = [
626
+ "# Batch Visualization Report",
627
+ f"Dataset: {current_source.name}",
628
+ f"Total cells/spots: {current_source.n_obs:,}",
629
+ f"Total features: {current_source.n_vars:,}",
630
+ "",
631
+ "## Settings",
632
+ f"- Point Size: {point_size}",
633
+ f"- Log Transform: {use_log}",
634
+ f"- Color Scale: {colorscale}",
635
+ f"- Background: {show_background}",
636
+ "",
637
+ "## Results Summary",
638
+ f"- Total requested: {len(all_parsed)}",
639
+ f"- Found in dataset: {len(found_features)}",
640
+ f"- Successfully visualized: {len(successful_plots)}",
641
+ f"- Failed: {len(failed_features)}",
642
+ "",
643
+ ]
644
+
645
+ if not_found_features:
646
+ report_lines.append("## Not Found Features")
647
+ for feat in not_found_features:
648
+ report_lines.append(f"- {feat}")
649
+ report_lines.append("")
650
+
651
+ if failed_features:
652
+ report_lines.append("## Failed Features")
653
+ for feat, err in failed_features:
654
+ report_lines.append(f"- {feat}: {err}")
655
+ report_lines.append("")
656
+
657
+ report_lines.append("## Successfully Visualized Features")
658
+ for feat, _ in successful_plots:
659
+ report_lines.append(f"- {feat}")
660
+
661
+ report_text = "\n".join(report_lines)
662
+
663
+ # Save report
664
+ report_path = os.path.join(temp_dir, "report.md")
665
+ with open(report_path, "w") as f:
666
+ f.write(report_text)
667
+
668
+ # Save statistics CSV
669
+ stats_csv_path = os.path.join(temp_dir, "expression_statistics.csv")
670
+ if stats_records:
671
+ with open(stats_csv_path, "w", newline="") as f:
672
+ fieldnames = ['feature', 'min', 'max', 'mean', 'median', 'std', 'non_zero_count', 'non_zero_percent']
673
+ writer = csv.DictWriter(f, fieldnames=fieldnames)
674
+ writer.writeheader()
675
+ writer.writerows(stats_records)
676
+
677
+ # Create ZIP file
678
+ zip_path = os.path.join(temp_dir, "batch_visualization.zip")
679
+ with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
680
+ # Add images
681
+ for gene_name, png_path in successful_plots:
682
+ zf.write(png_path, f"images/{gene_name}.png")
683
+
684
+ # Add report
685
+ zf.write(report_path, "report.md")
686
+
687
+ # Add stats CSV
688
+ if stats_records:
689
+ zf.write(stats_csv_path, "expression_statistics.csv")
690
+
691
+ # Format stats for display
692
+ stats_display = "Feature | Min | Max | Mean | Non-zero %\n"
693
+ stats_display += "--- | --- | --- | --- | ---\n"
694
+ for rec in stats_records:
695
+ stats_display += f"{rec['feature']} | {rec['min']:.4f} | {rec['max']:.4f} | {rec['mean']:.4f} | {rec['non_zero_percent']:.1f}%\n"
696
+
697
+ status = f"✅ Batch visualization complete!\n- Generated: {len(successful_plots)} plots\n- Failed: {len(failed_features)}"
698
+
699
+ return status, zip_path, report_text, stats_display
700
+
701
+
702
+ def create_interface():
703
+ """Create Gradio interface"""
704
+
705
+ viewer = SpatialViewer()
706
+
707
+ # Custom CSS
708
+ custom_css = """
709
+ .duplicate-notice {
710
+ background: linear-gradient(135deg, #fff8e1 0%, #ffecb3 100%);
711
+ color: #3e2723;
712
+ border: 1px solid #ffc107;
713
+ border-radius: 8px;
714
+ padding: 12px 16px;
715
+ margin: 12px 0;
716
+ font-size: 0.95rem;
717
+ line-height: 1.5;
718
+ }
719
+ .duplicate-notice b { color: #e65100; }
720
+
721
+ @media (prefers-color-scheme: dark) {
722
+ .duplicate-notice {
723
+ background: linear-gradient(135deg, rgba(50,40,20,0.9) 0%, rgba(40,30,10,0.9) 100%);
724
+ color: #ffffff;
725
+ border-color: #ffc107;
726
+ }
727
+ .duplicate-notice b { color: #ffd54f; }
728
+ }
729
+
730
+ .file-browser {
731
+ background: linear-gradient(180deg, #f8f9fa 0%, #e9ecef 100%);
732
+ border: 1px solid #dee2e6;
733
+ border-radius: 8px;
734
+ padding: 12px;
735
+ }
736
+ @media (prefers-color-scheme: dark) {
737
+ .file-browser {
738
+ background: linear-gradient(180deg, #2d2d2d 0%, #1a1a1a 100%);
739
+ border-color: #444;
740
+ }
741
+ }
742
+
743
+ .data-info-panel {
744
+ background: linear-gradient(180deg, #e3f2fd 0%, #bbdefb 100%);
745
+ border: 1px solid #90caf9;
746
+ border-radius: 8px;
747
+ padding: 12px;
748
+ }
749
+ @media (prefers-color-scheme: dark) {
750
+ .data-info-panel {
751
+ background: linear-gradient(180deg, rgba(33,150,243,0.15) 0%, rgba(33,150,243,0.05) 100%);
752
+ border-color: #1976d2;
753
+ }
754
+ }
755
+
756
+ .control-panel {
757
+ background: linear-gradient(180deg, #f5f5f5 0%, #eeeeee 100%);
758
+ border: 1px solid #e0e0e0;
759
+ border-radius: 8px;
760
+ padding: 16px;
761
+ }
762
+ @media (prefers-color-scheme: dark) {
763
+ .control-panel {
764
+ background: linear-gradient(180deg, #2a2a2a 0%, #1f1f1f 100%);
765
+ border-color: #444;
766
+ }
767
+ }
768
+ """
769
+
770
+ with gr.Blocks(
771
+ title="Spatial Omics Viewer",
772
+ theme=gr.themes.Soft(),
773
+ css=custom_css,
774
+ ) as app:
775
+ gr.Markdown(
776
+ """
777
+ # 🔬 Spatial Omics Viewer
778
+ Visualize spatial expression from .h5ad files (AnnData format)
779
+
780
+ <div class="duplicate-notice">
781
+ <b>Notice:</b> This is a public demo Space. For large h5ad files or heavy usage,
782
+ please <b>Duplicate this Space</b> to your account for better performance and privacy.
783
+ </div>
784
+ """
785
+ )
786
+
787
+ # ==================== Select Dataset Tab ====================
788
+ with gr.Tab("📂 Select Dataset"):
789
+ with gr.Row():
790
+ # Column 1: Dataset Browser
791
+ with gr.Column(scale=1, elem_classes="file-browser"):
792
+ gr.Markdown("### 📁 Available Datasets")
793
+ gr.Markdown("*Click to select and view*")
794
+
795
+ # All available datasets (loaded ones)
796
+ dataset_selector = gr.Radio(
797
+ choices=[],
798
+ label="📦 Datasets",
799
+ value=None,
800
+ info="Click to select",
801
+ )
802
+
803
+ gr.Markdown("---")
804
+ gr.Markdown("#### 📥 Import New Data")
805
+
806
+ import_type = gr.Radio(
807
+ choices=["URL", "Upload"],
808
+ value="URL",
809
+ label="Import Method",
810
+ info="Download from URL or upload file",
811
+ )
812
+
813
+ with gr.Group() as url_group:
814
+ url_input = gr.Textbox(
815
+ label="🔗 URL",
816
+ placeholder="https://... or Google Drive link",
817
+ info="HuggingFace, Zenodo, S3, Google Drive",
818
+ lines=1,
819
+ )
820
+ import_url_btn = gr.Button("📥 Import from URL", variant="secondary")
821
+
822
+ with gr.Group(visible=False) as upload_group:
823
+ file_input = gr.File(
824
+ label="📤 Upload File",
825
+ file_types=[".h5ad", ".zip"],
826
+ type="filepath",
827
+ )
828
+
829
+ load_status = gr.Textbox(
830
+ label="Status",
831
+ lines=2,
832
+ interactive=False,
833
+ )
834
+
835
+ # Column 2: Spatial Overview with background
836
+ with gr.Column(scale=2):
837
+ gr.Markdown("### 🗺️ Spatial Overview")
838
+ overview_plot = gr.Plot(label="Spatial Overview")
839
+
840
+ # Column 3: Dataset Info
841
+ with gr.Column(scale=1, elem_classes="data-info-panel"):
842
+ gr.Markdown("### 📊 Dataset Information")
843
+ dataset_summary = gr.Markdown(
844
+ value="*Select a dataset to see information*",
845
+ elem_id="dataset-summary",
846
+ )
847
+
848
+ # ==================== Visualize Tab ====================
849
+ with gr.Tab("🎨 Visualize") as visualize_tab:
850
+ with gr.Row():
851
+ # Column 1: Controls
852
+ with gr.Column(scale=1, elem_classes="control-panel"):
853
+ gr.Markdown("### ⚙️ Controls")
854
+ gr.Markdown("*Auto-renders when parameters change*", elem_id="auto-render-hint")
855
+
856
+ # Current dataset
857
+ current_dataset_display = gr.Textbox(
858
+ label="📊 Current Dataset",
859
+ value="No dataset loaded",
860
+ interactive=False,
861
+ lines=2,
862
+ )
863
+
864
+ # Gene input
865
+ gene_input = gr.Textbox(
866
+ label="🧬 Feature Name",
867
+ placeholder="Type to search (e.g., Pcp, Gab, Act)",
868
+ info="Start typing to see matching features",
869
+ )
870
+
871
+ gene_quick_picks = gr.Radio(
872
+ label="🔍 Quick Pick",
873
+ choices=[],
874
+ visible=False,
875
+ interactive=True,
876
+ )
877
+
878
+ # Plot Settings - default open
879
+ with gr.Accordion("🎛️ Plot Settings", open=True):
880
+ point_size = gr.Slider(
881
+ minimum=1,
882
+ maximum=20,
883
+ value=5,
884
+ step=1,
885
+ label="Point Size",
886
+ )
887
+
888
+ use_log = gr.Checkbox(
889
+ value=True,
890
+ label="Use log1p transformation",
891
+ info="Recommended for better visualization",
892
+ )
893
+
894
+ colorscale = gr.Dropdown(
895
+ choices=[
896
+ "Viridis", "Plasma", "Inferno", "Magma",
897
+ "Cividis", "Blues", "Reds", "YlOrRd", "RdYlBu",
898
+ ],
899
+ value="Viridis",
900
+ label="Color Scale",
901
+ )
902
+
903
+ # Tissue Background - default open
904
+ with gr.Accordion("🖼️ Tissue Background", open=True):
905
+ show_background = gr.Checkbox(
906
+ value=False,
907
+ label="Show tissue background",
908
+ info="From h5ad file (if available)",
909
+ )
910
+
911
+ background_opacity = gr.Slider(
912
+ minimum=0.1,
913
+ maximum=1.0,
914
+ value=0.5,
915
+ step=0.1,
916
+ label="Background Opacity",
917
+ )
918
+
919
+ # Column 2: Plot
920
+ with gr.Column(scale=2):
921
+ gr.Markdown("### 🔬 Spatial Omics Expression")
922
+ gene_plot = gr.Plot(label="Spatial Omics Expression")
923
+
924
+ # Column 3: Stats
925
+ with gr.Column(scale=1):
926
+ gr.Markdown("### 📈 Analysis")
927
+
928
+ vis_status = gr.Textbox(
929
+ label="Status",
930
+ lines=2,
931
+ interactive=False,
932
+ )
933
+
934
+ stats_output = gr.Textbox(
935
+ label="Expression Statistics",
936
+ lines=10,
937
+ interactive=False,
938
+ )
939
+
940
+ # ==================== Batch Visualize Tab ====================
941
+ with gr.Tab("📊 Batch Visualize") as batch_tab:
942
+ with gr.Row():
943
+ # Column 1: Input & Settings
944
+ with gr.Column(scale=1, elem_classes="control-panel"):
945
+ gr.Markdown("### 📝 Batch Input")
946
+ gr.Markdown("*Paste variable names (comma, space, or newline separated)*")
947
+
948
+ batch_current_dataset = gr.Textbox(
949
+ label="📊 Current Dataset",
950
+ value="No dataset loaded",
951
+ interactive=False,
952
+ lines=2,
953
+ )
954
+
955
+ batch_variables_input = gr.Textbox(
956
+ label="🧬 Paste Variables List",
957
+ placeholder="Gene1, Gene2, Gene3\nor\nGene1\nGene2\nGene3",
958
+ lines=10,
959
+ info="Supports comma, space, or newline separated values",
960
+ )
961
+
962
+ batch_parse_btn = gr.Button("🔍 Parse & Preview", variant="secondary")
963
+
964
+ batch_parse_result = gr.Markdown(
965
+ value="*Enter variables and click Parse to preview*",
966
+ elem_id="batch-parse-result",
967
+ )
968
+
969
+ gr.Markdown("---")
970
+ gr.Markdown("### ⚙️ Batch Settings")
971
+
972
+ with gr.Accordion("🎛️ Plot Settings", open=True):
973
+ batch_point_size = gr.Slider(
974
+ minimum=1,
975
+ maximum=20,
976
+ value=5,
977
+ step=1,
978
+ label="Point Size",
979
+ )
980
+
981
+ batch_use_log = gr.Checkbox(
982
+ value=True,
983
+ label="Use log1p transformation",
984
+ )
985
+
986
+ batch_colorscale = gr.Dropdown(
987
+ choices=[
988
+ "Viridis", "Plasma", "Inferno", "Magma",
989
+ "Cividis", "Blues", "Reds", "YlOrRd", "RdYlBu",
990
+ ],
991
+ value="Viridis",
992
+ label="Color Scale",
993
+ )
994
+
995
+ with gr.Accordion("🖼️ Tissue Background", open=True):
996
+ batch_show_background = gr.Checkbox(
997
+ value=False,
998
+ label="Show tissue background",
999
+ )
1000
+
1001
+ batch_background_opacity = gr.Slider(
1002
+ minimum=0.1,
1003
+ maximum=1.0,
1004
+ value=0.5,
1005
+ step=0.1,
1006
+ label="Background Opacity",
1007
+ )
1008
+
1009
+ batch_run_btn = gr.Button(
1010
+ "🚀 Run Batch Visualization", variant="primary", size="lg"
1011
+ )
1012
+
1013
+ # Column 2: Preview
1014
+ with gr.Column(scale=2):
1015
+ gr.Markdown("### 👁️ Preview (First Found Feature)")
1016
+ batch_preview_plot = gr.Plot(label="Preview")
1017
+ batch_preview_status = gr.Textbox(
1018
+ label="Preview Status",
1019
+ lines=2,
1020
+ interactive=False,
1021
+ )
1022
+
1023
+ # Column 3: Results
1024
+ with gr.Column(scale=1):
1025
+ gr.Markdown("### 📦 Results")
1026
+
1027
+ batch_status = gr.Textbox(
1028
+ label="Batch Status",
1029
+ lines=3,
1030
+ interactive=False,
1031
+ )
1032
+
1033
+ batch_download = gr.File(
1034
+ label="📥 Download Results (ZIP)",
1035
+ file_count="single",
1036
+ interactive=False,
1037
+ )
1038
+
1039
+ with gr.Accordion("📋 Summary Report", open=True):
1040
+ batch_report = gr.Markdown(
1041
+ value="*Run batch visualization to see report*",
1042
+ )
1043
+
1044
+ with gr.Accordion("📊 Expression Statistics", open=False):
1045
+ batch_stats = gr.Markdown(
1046
+ value="*Run batch visualization to see statistics*",
1047
+ )
1048
+
1049
+ # ==================== About Tab ====================
1050
+ with gr.Tab("ℹ️ About"):
1051
+ gr.Markdown(
1052
+ """
1053
+ ## About This Tool
1054
+
1055
+ This tool visualizes spatial omics expression from AnnData (.h5ad) files.
1056
+
1057
+ ### Features
1058
+ - 🚀 Auto-loads demo dataset on startup
1059
+ - 🔍 Feature name autocomplete search
1060
+ - 🔗 Load from URLs (HuggingFace, Zenodo, S3, Google Drive)
1061
+ - 📤 Upload h5ad/ZIP files
1062
+ - 🖼️ Tissue background image overlay
1063
+ - 📊 Interactive Plotly visualization
1064
+ - 💾 Memory-efficient backed mode
1065
+
1066
+ ### How to Use
1067
+ 1. **Load Data**: Select built-in dataset or import external data
1068
+ 2. **Visualize**: Search for features and visualize spatial expression
1069
+ 3. **Customize**: Adjust plot settings and background
1070
+
1071
+ ### For Large Files
1072
+ Please **Duplicate this Space** for large files (>2GB), frequent usage, or private data.
1073
+
1074
+ ---
1075
+ Built for the spatial omics research community.
1076
+ """
1077
+ )
1078
+
1079
+ # ============================================
1080
+ # Event bindings
1081
+ # ============================================
1082
+
1083
+ # Import type toggle
1084
+ def toggle_import_type(import_method):
1085
+ return {
1086
+ url_group: gr.update(visible=(import_method == "URL")),
1087
+ upload_group: gr.update(visible=(import_method == "Upload")),
1088
+ }
1089
+
1090
+ import_type.change(
1091
+ toggle_import_type,
1092
+ inputs=[import_type],
1093
+ outputs=[url_group, upload_group],
1094
+ )
1095
+
1096
+ # Switch dataset when clicking on selector
1097
+ def switch_dataset(source_id):
1098
+ """Switch to selected dataset (load if needed) and update all views"""
1099
+ if not source_id:
1100
+ return "", None, "*Select a dataset*", viewer.get_current_dataset_info()
1101
+
1102
+ try:
1103
+ # 1. Get source info
1104
+ source = viewer.data_manager.get_source(source_id)
1105
+ if source is None:
1106
+ return f"❌ Dataset {source_id} not found", None, "", ""
1107
+
1108
+ # 2. Lazy load if not already loaded
1109
+ if source.adata is None:
1110
+ print(f"DEBUG: Lazy loading {source.name} from {source.source_path}")
1111
+ # Free up memory from other datasets first
1112
+ import gc
1113
+ for other_id, other_source in viewer.data_manager.sources.items():
1114
+ if other_id != source_id and other_source.adata is not None:
1115
+ print(f"DEBUG: Freeing memory from {other_source.name}")
1116
+ other_source.adata = None
1117
+ gc.collect()
1118
+
1119
+ # Load current
1120
+ adata = H5adLoader.load_from_source(source.source_path)
1121
+
1122
+ # Validate
1123
+ is_valid, errors = AnnDataValidator.validate(adata)
1124
+ if not is_valid:
1125
+ return f"❌ Validation failed: {'; '.join(errors)}", None, "", ""
1126
+
1127
+ # Update source object
1128
+ source.adata = adata
1129
+ source.n_obs = adata.n_obs
1130
+ source.n_vars = adata.n_vars
1131
+ source.loaded_at = datetime.datetime.now()
1132
+
1133
+ # 3. Set as current
1134
+ viewer.data_manager.set_current(source_id)
1135
+
1136
+ # 4. Update all views
1137
+ overview_fig = viewer.create_overview_with_background()
1138
+ summary = viewer.get_adata_summary()
1139
+ dataset_info = viewer.get_current_dataset_info()
1140
+ choices = viewer.data_manager.get_source_choices()
1141
+
1142
+ # Update selector choices to show cell/gene counts
1143
+ selector_update = gr.update(choices=choices, value=source_id)
1144
+
1145
+ return f"✅ Loaded: {source.name}", overview_fig, summary, dataset_info, selector_update
1146
+
1147
+ except Exception as e:
1148
+ import traceback
1149
+ print(traceback.format_exc())
1150
+ return f"❌ Error loading dataset: {str(e)}", None, "", "", gr.update()
1151
+
1152
+ dataset_selector.change(
1153
+ switch_dataset,
1154
+ inputs=[dataset_selector],
1155
+ outputs=[load_status, overview_plot, dataset_summary, current_dataset_display, dataset_selector],
1156
+ )
1157
+
1158
+ # Import from URL
1159
+ def import_from_url(url):
1160
+ """Import dataset from URL"""
1161
+ if not url or not url.strip():
1162
+ return "❌ Please enter a URL", None, "", gr.update(), ""
1163
+
1164
+ url = url.strip()
1165
+ display_name = url.split("/")[-1].split("?")[0] or "URL Dataset"
1166
+
1167
+ try:
1168
+ # Clear existing memory-heavy data before loading new one
1169
+ import gc
1170
+ for source in viewer.data_manager.sources.values():
1171
+ source.adata = None
1172
+ gc.collect()
1173
+
1174
+ loaded_data = H5adLoader.load_from_source(url)
1175
+
1176
+ if not isinstance(loaded_data, list):
1177
+ loaded_data = [loaded_data]
1178
+
1179
+ last_id = None
1180
+ for idx, adata in enumerate(loaded_data):
1181
+ is_valid, errors = AnnDataValidator.validate(adata)
1182
+ if not is_valid:
1183
+ return f"❌ Validation failed: {'; '.join(errors)}", None, "", gr.update(), ""
1184
+
1185
+ name = display_name if len(loaded_data) == 1 else f"{display_name} - Part {idx + 1}"
1186
+ last_id = viewer.data_manager.add_source(
1187
+ name=name,
1188
+ source_type="url",
1189
+ source_path=url,
1190
+ adata=adata
1191
+ )
1192
+
1193
+ # Set the last imported one as current
1194
+ if last_id:
1195
+ viewer.data_manager.set_current(last_id)
1196
+
1197
+ # Update views
1198
+ overview_fig = viewer.create_overview_with_background()
1199
+ summary = viewer.get_adata_summary()
1200
+ choices = viewer.data_manager.get_source_choices()
1201
+ selector_update = gr.update(choices=choices, value=viewer.data_manager.current_id)
1202
+ dataset_info = viewer.get_current_dataset_info()
1203
+
1204
+ return f"✅ Imported: {display_name}", overview_fig, summary, selector_update, dataset_info
1205
+
1206
+ except Exception as e:
1207
+ return f"❌ Error: {str(e)}", None, "", gr.update(), ""
1208
+
1209
+ import_url_btn.click(
1210
+ import_from_url,
1211
+ inputs=[url_input],
1212
+ outputs=[load_status, overview_plot, dataset_summary, dataset_selector, current_dataset_display],
1213
+ )
1214
+
1215
+ # Upload file
1216
+ def upload_file(uploaded_file):
1217
+ """Handle file upload"""
1218
+ if not uploaded_file:
1219
+ return "❌ No file uploaded", None, "", gr.update(), ""
1220
+
1221
+ display_name = Path(uploaded_file).name
1222
+
1223
+ try:
1224
+ # Clear existing memory-heavy data
1225
+ import gc
1226
+ for source in viewer.data_manager.sources.values():
1227
+ source.adata = None
1228
+ gc.collect()
1229
+
1230
+ loaded_data = H5adLoader.load_from_source(uploaded_file)
1231
+
1232
+ if not isinstance(loaded_data, list):
1233
+ loaded_data = [loaded_data]
1234
+
1235
+ last_id = None
1236
+ for idx, adata in enumerate(loaded_data):
1237
+ is_valid, errors = AnnDataValidator.validate(adata)
1238
+ if not is_valid:
1239
+ return f"❌ Validation failed: {'; '.join(errors)}", None, "", gr.update(), ""
1240
+
1241
+ name = display_name if len(loaded_data) == 1 else f"{display_name} - Part {idx + 1}"
1242
+ last_id = viewer.data_manager.add_source(
1243
+ name=name,
1244
+ source_type="upload",
1245
+ source_path=uploaded_file,
1246
+ adata=adata
1247
+ )
1248
+
1249
+ # Set as current
1250
+ if last_id:
1251
+ viewer.data_manager.set_current(last_id)
1252
+
1253
+ # Update views
1254
+ overview_fig = viewer.create_overview_with_background()
1255
+ summary = viewer.get_adata_summary()
1256
+ choices = viewer.data_manager.get_source_choices()
1257
+ selector_update = gr.update(choices=choices, value=viewer.data_manager.current_id)
1258
+ dataset_info = viewer.get_current_dataset_info()
1259
+
1260
+ return f"✅ Uploaded: {display_name}", overview_fig, summary, selector_update, dataset_info
1261
+
1262
+ except Exception as e:
1263
+ return f"❌ Error: {str(e)}", None, "", gr.update(), ""
1264
+
1265
+ file_input.change(
1266
+ upload_file,
1267
+ inputs=[file_input],
1268
+ outputs=[load_status, overview_plot, dataset_summary, dataset_selector, current_dataset_display],
1269
+ )
1270
+
1271
+ # Visualize tab events
1272
+ def update_on_tab_select():
1273
+ return viewer.get_current_dataset_info()
1274
+
1275
+ visualize_tab.select(
1276
+ update_on_tab_select,
1277
+ inputs=[],
1278
+ outputs=[current_dataset_display],
1279
+ )
1280
+
1281
+ def live_search(query):
1282
+ if not query or len(query.strip()) < 2:
1283
+ return gr.update(choices=[], visible=False)
1284
+ results = viewer.search_genes(query, limit=15)
1285
+ if results:
1286
+ return gr.update(choices=results, visible=True, value=None)
1287
+ return gr.update(choices=[], visible=False)
1288
+
1289
+ gene_input.change(
1290
+ live_search,
1291
+ inputs=[gene_input],
1292
+ outputs=[gene_quick_picks],
1293
+ )
1294
+
1295
+ def quick_visualize(selected_gene, point_size, use_log, colorscale, show_bg, bg_opacity):
1296
+ if not selected_gene:
1297
+ return gr.update(), None, "", "", gr.update(visible=False), ""
1298
+
1299
+ status, plot, stats, dataset_info = viewer.visualize_gene(
1300
+ selected_gene, point_size, use_log, colorscale, show_bg, bg_opacity
1301
+ )
1302
+ return selected_gene, plot, stats, dataset_info, gr.update(visible=False), status
1303
+
1304
+ gene_quick_picks.change(
1305
+ quick_visualize,
1306
+ inputs=[gene_quick_picks, point_size, use_log, colorscale, show_background, background_opacity],
1307
+ outputs=[gene_input, gene_plot, stats_output, current_dataset_display, gene_quick_picks, vis_status],
1308
+ )
1309
+
1310
+ # Auto-render when any parameter changes
1311
+ def auto_visualize(gene_name, pt_size, log_transform, color_scale, show_bg, bg_opacity):
1312
+ """Auto-render visualization when parameters change"""
1313
+ if not gene_name or gene_name.strip() == "":
1314
+ return gr.update(), gr.update(), gr.update(), ""
1315
+
1316
+ status, plot, stats, dataset_info = viewer.visualize_gene(
1317
+ gene_name, pt_size, log_transform, color_scale, show_bg, bg_opacity
1318
+ )
1319
+ return status, plot, stats, dataset_info
1320
+
1321
+ # Bind auto-render to all parameter changes
1322
+ auto_render_inputs = [gene_input, point_size, use_log, colorscale, show_background, background_opacity]
1323
+ auto_render_outputs = [vis_status, gene_plot, stats_output, current_dataset_display]
1324
+
1325
+ # Re-render on gene input blur (when user finishes typing)
1326
+ gene_input.blur(
1327
+ auto_visualize,
1328
+ inputs=auto_render_inputs,
1329
+ outputs=auto_render_outputs,
1330
+ )
1331
+
1332
+ # Re-render on parameter changes
1333
+ point_size.release(
1334
+ auto_visualize,
1335
+ inputs=auto_render_inputs,
1336
+ outputs=auto_render_outputs,
1337
+ )
1338
+
1339
+ use_log.change(
1340
+ auto_visualize,
1341
+ inputs=auto_render_inputs,
1342
+ outputs=auto_render_outputs,
1343
+ )
1344
+
1345
+ colorscale.change(
1346
+ auto_visualize,
1347
+ inputs=auto_render_inputs,
1348
+ outputs=auto_render_outputs,
1349
+ )
1350
+
1351
+ show_background.change(
1352
+ auto_visualize,
1353
+ inputs=auto_render_inputs,
1354
+ outputs=auto_render_outputs,
1355
+ )
1356
+
1357
+ background_opacity.release(
1358
+ auto_visualize,
1359
+ inputs=auto_render_inputs,
1360
+ outputs=auto_render_outputs,
1361
+ )
1362
+
1363
+ # ============================================
1364
+ # Batch Visualize Tab Events
1365
+ # ============================================
1366
+
1367
+ def update_batch_dataset():
1368
+ return viewer.get_current_dataset_info()
1369
+
1370
+ batch_tab.select(
1371
+ update_batch_dataset,
1372
+ inputs=[],
1373
+ outputs=[batch_current_dataset],
1374
+ )
1375
+
1376
+ def parse_and_preview(variables_text, pt_size, log_transform, color_scale, show_bg, bg_opacity):
1377
+ """Parse variables list and preview first found feature"""
1378
+ found, not_found, all_parsed = viewer.parse_variables_list(variables_text)
1379
+
1380
+ # Build parse result message
1381
+ result_lines = []
1382
+ result_lines.append(f"**Parsed:** {len(all_parsed)} items")
1383
+ result_lines.append(f"**Found:** {len(found)} features")
1384
+ if found:
1385
+ result_lines.append(f"- `{', '.join(found[:10])}`" + (f" ... (+{len(found)-10} more)" if len(found) > 10 else ""))
1386
+ result_lines.append(f"**Not Found:** {len(not_found)} items")
1387
+ if not_found:
1388
+ result_lines.append(f"- `{', '.join(not_found[:5])}`" + (f" ... (+{len(not_found)-5} more)" if len(not_found) > 5 else ""))
1389
+
1390
+ parse_result = "\n".join(result_lines)
1391
+
1392
+ # Preview first found feature
1393
+ if found:
1394
+ first_gene = found[0]
1395
+ status, plot, stats, _ = viewer.visualize_gene(
1396
+ first_gene, pt_size, log_transform, color_scale, show_bg, bg_opacity
1397
+ )
1398
+ preview_status = f"Previewing: {first_gene}"
1399
+ return parse_result, plot, preview_status
1400
+ else:
1401
+ return parse_result, None, "No features found to preview"
1402
+
1403
+ batch_parse_btn.click(
1404
+ parse_and_preview,
1405
+ inputs=[batch_variables_input, batch_point_size, batch_use_log, batch_colorscale, batch_show_background, batch_background_opacity],
1406
+ outputs=[batch_parse_result, batch_preview_plot, batch_preview_status],
1407
+ )
1408
+
1409
+ # Auto-update preview when settings change (if there's already input)
1410
+ def update_preview_on_settings(variables_text, pt_size, log_transform, color_scale, show_bg, bg_opacity):
1411
+ """Update preview when batch settings change"""
1412
+ found, _, _ = viewer.parse_variables_list(variables_text)
1413
+ if found:
1414
+ first_gene = found[0]
1415
+ status, plot, stats, _ = viewer.visualize_gene(
1416
+ first_gene, pt_size, log_transform, color_scale, show_bg, bg_opacity
1417
+ )
1418
+ return plot, f"Previewing: {first_gene}"
1419
+ return gr.update(), gr.update()
1420
+
1421
+ batch_preview_inputs = [batch_variables_input, batch_point_size, batch_use_log, batch_colorscale, batch_show_background, batch_background_opacity]
1422
+ batch_preview_outputs = [batch_preview_plot, batch_preview_status]
1423
+
1424
+ batch_point_size.release(update_preview_on_settings, inputs=batch_preview_inputs, outputs=batch_preview_outputs)
1425
+ batch_use_log.change(update_preview_on_settings, inputs=batch_preview_inputs, outputs=batch_preview_outputs)
1426
+ batch_colorscale.change(update_preview_on_settings, inputs=batch_preview_inputs, outputs=batch_preview_outputs)
1427
+ batch_show_background.change(update_preview_on_settings, inputs=batch_preview_inputs, outputs=batch_preview_outputs)
1428
+ batch_background_opacity.release(update_preview_on_settings, inputs=batch_preview_inputs, outputs=batch_preview_outputs)
1429
+
1430
+ def run_batch_visualization(variables_text, pt_size, log_transform, color_scale, show_bg, bg_opacity, progress=gr.Progress()):
1431
+ """Run batch visualization"""
1432
+ status, zip_path, report, stats = viewer.batch_visualize(
1433
+ variables_text, pt_size, log_transform, color_scale, show_bg, bg_opacity, progress
1434
+ )
1435
+ return status, zip_path, report, stats
1436
+
1437
+ batch_run_btn.click(
1438
+ run_batch_visualization,
1439
+ inputs=[batch_variables_input, batch_point_size, batch_use_log, batch_colorscale, batch_show_background, batch_background_opacity],
1440
+ outputs=[batch_status, batch_download, batch_report, batch_stats],
1441
+ )
1442
+
1443
+ # Auto-load all demo datasets on startup
1444
+ def startup_load():
1445
+ """Register all built-in datasets on startup (without loading them into RAM)"""
1446
+ # Skip if already registered
1447
+ if viewer.data_manager.has_sources():
1448
+ overview_fig = viewer.create_overview_with_background()
1449
+ summary = viewer.get_adata_summary()
1450
+ choices = viewer.data_manager.get_source_choices()
1451
+ dataset_info = viewer.get_current_dataset_info()
1452
+ selector_update = gr.update(choices=choices, value=viewer.data_manager.current_id)
1453
+ return "✅ Ready", overview_fig, summary, selector_update, dataset_info
1454
+
1455
+ # Register local h5ad files as sources (lazy loading)
1456
+ local_files = viewer.get_local_h5ad_files()
1457
+
1458
+ for filename in local_files:
1459
+ source_path = str(Path("data") / filename)
1460
+ viewer.data_manager.add_source(
1461
+ name=filename,
1462
+ source_type="demo",
1463
+ source_path=source_path,
1464
+ adata=None # DON'T LOAD YET
1465
+ )
1466
+
1467
+ if viewer.data_manager.has_sources():
1468
+ choices = viewer.data_manager.get_source_choices()
1469
+ # We don't load the first one automatically to save RAM
1470
+ # But we can set it as current so the UI shows it as selected
1471
+ viewer.data_manager.current_id = choices[0][1]
1472
+
1473
+ return (
1474
+ "📂 Datasets found. Select one to load and visualize.",
1475
+ None,
1476
+ "*Select a dataset to load*",
1477
+ gr.update(choices=choices, value=viewer.data_manager.current_id),
1478
+ "No dataset loaded"
1479
+ )
1480
+
1481
+ return "No datasets found in data/ folder", None, "", gr.update(), ""
1482
+
1483
+ app.load(
1484
+ startup_load,
1485
+ inputs=[],
1486
+ outputs=[load_status, overview_plot, dataset_summary, dataset_selector, current_dataset_display],
1487
+ )
1488
+
1489
+ return app
1490
+
1491
+
1492
+ if __name__ == "__main__":
1493
+ app = create_interface()
1494
+ app.launch()
data/Mouse_Adult_Brain_M9_70_15um_adata_brain_15um.h5ad ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07097176920994c442259c2cb55639cd1ab6194ec2a86f78b261e5585b6f1196
3
+ size 834789514
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ anndata>=0.10.0
3
+ scanpy>=1.10.0
4
+ plotly>=5.18.0
5
+ numpy>=1.24.0
6
+ pandas>=2.0.0
7
+ scipy>=1.11.0
8
+ h5py>=3.10.0
9
+ requests>=2.31.0
10
+ Pillow>=10.0.0
11
+ kaleido>=0.2.1
utils/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ from .loader import H5adLoader
2
+ from .validator import AnnDataValidator
3
+ from .plot import SpatialPlotter
4
+
5
+ __all__ = ["H5adLoader", "AnnDataValidator", "SpatialPlotter"]
utils/data_source_manager.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Optional, Tuple
2
+ from dataclasses import dataclass
3
+ from pathlib import Path
4
+ from anndata import AnnData
5
+ import datetime
6
+
7
+
8
+ @dataclass
9
+ class DataSource:
10
+ """Represents a loaded h5ad data source"""
11
+ id: str # Unique identifier
12
+ name: str # Display name
13
+ source_type: str # 'demo', 'url', 'upload'
14
+ source_path: str # Original source (URL, file path, etc.)
15
+ adata: Optional[AnnData] # The loaded AnnData object (Optional for lazy loading)
16
+ loaded_at: Optional[datetime.datetime] # When it was loaded
17
+ n_obs: int = 0 # Number of observations
18
+ n_vars: int = 0 # Number of variables
19
+
20
+ def get_display_name(self) -> str:
21
+ """Get formatted display name with metadata"""
22
+ if self.adata is not None:
23
+ return f"{self.name} ({self.n_obs:,} cells, {self.n_vars:,} genes)"
24
+ return f"{self.name} (Not loaded)"
25
+
26
+ def get_info(self) -> str:
27
+ """Get detailed information string"""
28
+ return (
29
+ f"Dataset: {self.name}\n"
30
+ f"Source: {self.source_type}\n"
31
+ f"Cells/Spots: {self.n_obs:,}\n"
32
+ f"Genes: {self.n_vars:,}\n"
33
+ f"Loaded: {self.loaded_at.strftime('%Y-%m-%d %H:%M:%S')}"
34
+ )
35
+
36
+
37
+ class DataSourceManager:
38
+ """
39
+ Manage multiple loaded h5ad datasets
40
+
41
+ This class handles:
42
+ - Tracking all loaded datasets
43
+ - Switching between datasets
44
+ - Providing dataset metadata
45
+ """
46
+
47
+ def __init__(self):
48
+ self.sources: Dict[str, DataSource] = {}
49
+ self.current_id: Optional[str] = None
50
+ self._id_counter = 0
51
+
52
+ def add_source(
53
+ self,
54
+ name: str,
55
+ source_type: str,
56
+ source_path: str,
57
+ adata: Optional[AnnData] = None
58
+ ) -> str:
59
+ """
60
+ Add a new data source
61
+
62
+ Args:
63
+ name: Display name for the dataset
64
+ source_type: Type of source ('demo', 'url', 'upload')
65
+ source_path: Original source location
66
+ adata: Optional loaded AnnData object
67
+
68
+ Returns:
69
+ Unique ID of the added source
70
+ """
71
+ # Check if already exists by source_path to avoid duplicates
72
+ for existing_id, source in self.sources.items():
73
+ if source.source_path == source_path:
74
+ if adata is not None and source.adata is None:
75
+ # Update existing source with loaded adata
76
+ source.adata = adata
77
+ source.loaded_at = datetime.datetime.now()
78
+ source.n_obs = adata.n_obs
79
+ source.n_vars = adata.n_vars
80
+ return existing_id
81
+
82
+ # Generate unique ID
83
+ source_id = f"ds_{self._id_counter}"
84
+ self._id_counter += 1
85
+
86
+ # Create data source
87
+ source = DataSource(
88
+ id=source_id,
89
+ name=name,
90
+ source_type=source_type,
91
+ source_path=source_path,
92
+ adata=adata,
93
+ loaded_at=datetime.datetime.now() if adata is not None else None,
94
+ n_obs=adata.n_obs if adata is not None else 0,
95
+ n_vars=adata.n_vars if adata is not None else 0
96
+ )
97
+
98
+ self.sources[source_id] = source
99
+
100
+ # Set as current if it's the first one
101
+ if self.current_id is None:
102
+ self.current_id = source_id
103
+
104
+ return source_id
105
+
106
+ def get_source(self, source_id: str) -> Optional[DataSource]:
107
+ """Get a data source by ID"""
108
+ return self.sources.get(source_id)
109
+
110
+ def get_current_source(self) -> Optional[DataSource]:
111
+ """Get the currently active data source"""
112
+ if self.current_id is None:
113
+ return None
114
+ return self.sources.get(self.current_id)
115
+
116
+ def set_current(self, source_id: str) -> bool:
117
+ """
118
+ Set the current active data source
119
+
120
+ Args:
121
+ source_id: ID of the source to activate
122
+
123
+ Returns:
124
+ True if successful, False if source not found
125
+ """
126
+ if source_id in self.sources:
127
+ self.current_id = source_id
128
+ return True
129
+ return False
130
+
131
+ def get_all_sources(self) -> List[DataSource]:
132
+ """Get list of all loaded data sources"""
133
+ return list(self.sources.values())
134
+
135
+ def get_source_choices(self) -> List[Tuple[str, str]]:
136
+ """
137
+ Get list of sources for dropdown/radio selection
138
+
139
+ Returns:
140
+ List of (display_name, source_id) tuples
141
+ """
142
+ return [
143
+ (source.get_display_name(), source.id)
144
+ for source in self.sources.values()
145
+ ]
146
+
147
+ def get_source_names(self) -> List[str]:
148
+ """Get list of source display names"""
149
+ return [source.name for source in self.sources.values()]
150
+
151
+ def remove_source(self, source_id: str) -> bool:
152
+ """
153
+ Remove a data source
154
+
155
+ Args:
156
+ source_id: ID of source to remove
157
+
158
+ Returns:
159
+ True if removed, False if not found
160
+ """
161
+ if source_id in self.sources:
162
+ del self.sources[source_id]
163
+
164
+ # Update current_id if we removed the current source
165
+ if self.current_id == source_id:
166
+ if len(self.sources) > 0:
167
+ self.current_id = list(self.sources.keys())[0]
168
+ else:
169
+ self.current_id = None
170
+
171
+ return True
172
+ return False
173
+
174
+ def has_sources(self) -> bool:
175
+ """Check if any sources are loaded"""
176
+ return len(self.sources) > 0
177
+
178
+ def count_sources(self) -> int:
179
+ """Get number of loaded sources"""
180
+ return len(self.sources)
181
+
182
+ def clear_all(self):
183
+ """Remove all data sources"""
184
+ self.sources.clear()
185
+ self.current_id = None
186
+ self._id_counter = 0
utils/loader.py ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import tempfile
3
+ import zipfile
4
+ import re
5
+ from pathlib import Path
6
+ from typing import Optional, Union, Callable, List
7
+ import requests
8
+ import anndata
9
+ from anndata import AnnData
10
+
11
+
12
+ class H5adLoader:
13
+ """Handle h5ad file loading with backed='r' for efficient memory usage"""
14
+
15
+ ALLOWED_DOMAINS = [
16
+ "huggingface.co",
17
+ "zenodo.org",
18
+ "s3.amazonaws.com",
19
+ "drive.google.com",
20
+ "docs.google.com",
21
+ ]
22
+
23
+ MAX_DOWNLOAD_SIZE = 20 * 1024 * 1024 * 1024 # 20GB
24
+ TIMEOUT = 3000 # 3000 seconds = 50 minutes
25
+
26
+ @staticmethod
27
+ def convert_google_drive_url(url: str) -> str:
28
+ """
29
+ Convert Google Drive sharing URL to direct download URL
30
+
31
+ Supports formats:
32
+ - https://drive.google.com/file/d/{FILE_ID}/view?usp=sharing
33
+ - https://drive.google.com/open?id={FILE_ID}
34
+ - https://docs.google.com/...
35
+
36
+ Args:
37
+ url: Google Drive sharing URL
38
+
39
+ Returns:
40
+ Direct download URL
41
+
42
+ Raises:
43
+ ValueError: If cannot extract file ID
44
+ """
45
+ # Pattern 1: /file/d/{ID}/view
46
+ match = re.search(r'/file/d/([a-zA-Z0-9_-]+)', url)
47
+ if match:
48
+ file_id = match.group(1)
49
+ return f"https://drive.google.com/uc?export=download&id={file_id}"
50
+
51
+ # Pattern 2: open?id={ID}
52
+ match = re.search(r'[?&]id=([a-zA-Z0-9_-]+)', url)
53
+ if match:
54
+ file_id = match.group(1)
55
+ return f"https://drive.google.com/uc?export=download&id={file_id}"
56
+
57
+ # If already a direct download URL, return as-is
58
+ if 'drive.google.com/uc' in url:
59
+ return url
60
+
61
+ raise ValueError(
62
+ "Cannot parse Google Drive URL. Please use a sharing link like: "
63
+ "https://drive.google.com/file/d/{FILE_ID}/view?usp=sharing"
64
+ )
65
+
66
+ @staticmethod
67
+ def is_zip_file(filepath: str) -> bool:
68
+ """Check if file is a ZIP archive"""
69
+ return filepath.lower().endswith('.zip') and zipfile.is_zipfile(filepath)
70
+
71
+ @staticmethod
72
+ def extract_h5ad_from_zip(zip_path: str, extract_dir: Optional[str] = None) -> List[str]:
73
+ """
74
+ Extract all .h5ad files from a ZIP archive
75
+
76
+ Args:
77
+ zip_path: Path to ZIP file
78
+ extract_dir: Directory to extract to (uses temp dir if None)
79
+
80
+ Returns:
81
+ List of paths to extracted h5ad files
82
+
83
+ Raises:
84
+ ValueError: If no h5ad files found in ZIP
85
+ """
86
+ if extract_dir is None:
87
+ extract_dir = tempfile.mkdtemp()
88
+
89
+ extracted_h5ad_files = []
90
+
91
+ try:
92
+ with zipfile.ZipFile(zip_path, 'r') as zip_ref:
93
+ # Get all .h5ad files
94
+ h5ad_files = [f for f in zip_ref.namelist() if f.lower().endswith('.h5ad')]
95
+
96
+ if not h5ad_files:
97
+ raise ValueError("No .h5ad files found in ZIP archive")
98
+
99
+ # Extract each h5ad file
100
+ for h5ad_file in h5ad_files:
101
+ # Skip macOS metadata files
102
+ if '__MACOSX' in h5ad_file or h5ad_file.startswith('.'):
103
+ continue
104
+
105
+ zip_ref.extract(h5ad_file, extract_dir)
106
+ extracted_path = os.path.join(extract_dir, h5ad_file)
107
+ extracted_h5ad_files.append(extracted_path)
108
+
109
+ if not extracted_h5ad_files:
110
+ raise ValueError("No valid .h5ad files found in ZIP (only hidden/system files)")
111
+
112
+ except zipfile.BadZipFile:
113
+ raise ValueError("Invalid or corrupted ZIP file")
114
+
115
+ return extracted_h5ad_files
116
+
117
+ @staticmethod
118
+ def is_valid_url(url: str) -> bool:
119
+ """Check if URL is from allowed domains"""
120
+ if not url.startswith(("http://", "https://")):
121
+ return False
122
+ return any(domain in url for domain in H5adLoader.ALLOWED_DOMAINS)
123
+
124
+ @staticmethod
125
+ def _extract_filename_from_response(response, url: str) -> str:
126
+ """
127
+ Extract filename from HTTP response headers or URL
128
+
129
+ Prioritizes Content-Disposition header (especially useful for Google Drive)
130
+
131
+ Args:
132
+ response: requests.Response object
133
+ url: Original URL
134
+
135
+ Returns:
136
+ Extracted filename
137
+ """
138
+ filename = None
139
+
140
+ # Try to get filename from Content-Disposition header
141
+ content_disposition = response.headers.get('Content-Disposition', '')
142
+ if content_disposition:
143
+ # Try filename*= (RFC 5987 encoded)
144
+ match = re.search(r"filename\*=(?:UTF-8''|utf-8'')(.+?)(?:;|$)", content_disposition, re.IGNORECASE)
145
+ if match:
146
+ from urllib.parse import unquote
147
+ filename = unquote(match.group(1).strip())
148
+
149
+ # Try filename= with quotes
150
+ if not filename:
151
+ match = re.search(r'filename="([^"]+)"', content_disposition)
152
+ if match:
153
+ filename = match.group(1).strip()
154
+
155
+ # Try filename= without quotes
156
+ if not filename:
157
+ match = re.search(r'filename=([^;\s]+)', content_disposition)
158
+ if match:
159
+ filename = match.group(1).strip()
160
+
161
+ # Fallback: try to extract from URL
162
+ if not filename:
163
+ filename = url.split("/")[-1].split("?")[0]
164
+
165
+ # Default filename if still empty
166
+ if not filename or filename == "" or filename == "uc":
167
+ filename = "downloaded_data.h5ad"
168
+
169
+ # If no extension, try to determine from content type or URL
170
+ if '.' not in filename:
171
+ content_type = response.headers.get('Content-Type', '')
172
+ if 'zip' in content_type.lower() or 'zip' in url.lower():
173
+ filename = filename + ".zip"
174
+ else:
175
+ filename = filename + ".h5ad"
176
+
177
+ return filename
178
+
179
+ @staticmethod
180
+ def download_h5ad(
181
+ url: str,
182
+ save_dir: Optional[str] = None,
183
+ progress_callback: Optional[Callable[[int, int], None]] = None
184
+ ) -> Union[str, List[str]]:
185
+ """
186
+ Download h5ad file (or ZIP containing h5ad files) from URL
187
+
188
+ Args:
189
+ url: URL to h5ad or ZIP file
190
+ save_dir: Directory to save file (uses temp dir if None)
191
+ progress_callback: Optional callback function(downloaded_bytes, total_bytes)
192
+
193
+ Returns:
194
+ Path to downloaded file, or list of paths if ZIP was extracted
195
+
196
+ Raises:
197
+ ValueError: If URL is invalid or download fails
198
+ """
199
+ # Convert Google Drive URL if needed
200
+ original_url = url
201
+ if 'drive.google.com' in url or 'docs.google.com' in url:
202
+ try:
203
+ url = H5adLoader.convert_google_drive_url(url)
204
+ except ValueError as e:
205
+ raise ValueError(f"Google Drive URL error: {str(e)}")
206
+
207
+ if not H5adLoader.is_valid_url(url) and not H5adLoader.is_valid_url(original_url):
208
+ raise ValueError(
209
+ f"URL not from allowed domains: {', '.join(H5adLoader.ALLOWED_DOMAINS)}"
210
+ )
211
+
212
+ if save_dir is None:
213
+ save_dir = tempfile.mkdtemp()
214
+
215
+ try:
216
+ response = requests.get(
217
+ url,
218
+ stream=True,
219
+ timeout=H5adLoader.TIMEOUT,
220
+ allow_redirects=True
221
+ )
222
+ response.raise_for_status()
223
+
224
+ # Extract filename from response headers (handles Google Drive properly)
225
+ filename = H5adLoader._extract_filename_from_response(response, url)
226
+ filepath = os.path.join(save_dir, filename)
227
+
228
+ # Get total size if available
229
+ total_size = int(response.headers.get('content-length', 0))
230
+
231
+ downloaded_size = 0
232
+ with open(filepath, "wb") as f:
233
+ for chunk in response.iter_content(chunk_size=8192):
234
+ if chunk:
235
+ downloaded_size += len(chunk)
236
+
237
+ # Check size limit
238
+ if downloaded_size > H5adLoader.MAX_DOWNLOAD_SIZE:
239
+ raise ValueError(
240
+ f"File too large (>{H5adLoader.MAX_DOWNLOAD_SIZE / 1e9:.1f}GB)"
241
+ )
242
+
243
+ f.write(chunk)
244
+
245
+ # Call progress callback if provided
246
+ if progress_callback:
247
+ progress_callback(downloaded_size, total_size)
248
+
249
+ # Check if it's a ZIP file and extract if so
250
+ if H5adLoader.is_zip_file(filepath):
251
+ extracted_files = H5adLoader.extract_h5ad_from_zip(filepath, save_dir)
252
+ return extracted_files # Return list of extracted h5ad files
253
+
254
+ return filepath
255
+
256
+ except requests.RequestException as e:
257
+ raise ValueError(f"Failed to download file: {str(e)}")
258
+
259
+ @staticmethod
260
+ def load_h5ad(
261
+ path: Union[str, Path],
262
+ backed: str = "r",
263
+ ) -> Union[AnnData, List[AnnData]]:
264
+ """
265
+ Load h5ad file with backed mode for memory efficiency
266
+ Also handles ZIP files containing h5ad files
267
+
268
+ Args:
269
+ path: Path to h5ad or ZIP file, or URL
270
+ backed: Backing mode ('r' for read-only, recommended)
271
+
272
+ Returns:
273
+ AnnData object with backed mode enabled, or list of AnnData if ZIP
274
+
275
+ Raises:
276
+ ValueError: If file cannot be loaded
277
+ """
278
+ path_str = str(path)
279
+
280
+ # If it's a URL, download first
281
+ if path_str.startswith(("http://", "https://")):
282
+ downloaded = H5adLoader.download_h5ad(path_str)
283
+
284
+ # Check if we got multiple files from ZIP
285
+ if isinstance(downloaded, list):
286
+ # Load all extracted h5ad files
287
+ adata_list = []
288
+ for h5ad_path in downloaded:
289
+ adata = anndata.read_h5ad(h5ad_path, backed=backed)
290
+ adata_list.append(adata)
291
+ return adata_list
292
+
293
+ path_str = downloaded
294
+
295
+ # Check if local file is a ZIP
296
+ if os.path.exists(path_str) and H5adLoader.is_zip_file(path_str):
297
+ extracted_files = H5adLoader.extract_h5ad_from_zip(path_str)
298
+
299
+ if len(extracted_files) == 1:
300
+ # Single h5ad file in ZIP
301
+ path_str = extracted_files[0]
302
+ else:
303
+ # Multiple h5ad files in ZIP
304
+ adata_list = []
305
+ for h5ad_path in extracted_files:
306
+ adata = anndata.read_h5ad(h5ad_path, backed=backed)
307
+ adata_list.append(adata)
308
+ return adata_list
309
+
310
+ # Validate file exists
311
+ if not os.path.exists(path_str):
312
+ raise ValueError(f"File not found: {path_str}")
313
+
314
+ # Validate file extension
315
+ if not path_str.endswith(".h5ad"):
316
+ raise ValueError("File must have .h5ad extension")
317
+
318
+ try:
319
+ # Load with backed mode for efficient memory usage
320
+ adata = anndata.read_h5ad(path_str, backed=backed)
321
+ return adata
322
+
323
+ except Exception as e:
324
+ raise ValueError(f"Failed to load h5ad file: {str(e)}")
325
+
326
+ @staticmethod
327
+ def load_from_source(source: Union[str, Path]) -> AnnData:
328
+ """
329
+ Convenience method to load h5ad from file path or URL
330
+
331
+ Args:
332
+ source: File path or URL to h5ad file
333
+
334
+ Returns:
335
+ AnnData object loaded with backed='r'
336
+ """
337
+ return H5adLoader.load_h5ad(source, backed="r")
utils/plot.py ADDED
@@ -0,0 +1,504 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import plotly.graph_objects as go
3
+ import plotly.express as px
4
+ from typing import Optional, Tuple, Dict, Any, Union
5
+ from PIL import Image
6
+ import base64
7
+ from io import BytesIO
8
+
9
+
10
+ class SpatialImageExtractor:
11
+ """Extract spatial background images from AnnData objects"""
12
+
13
+ @staticmethod
14
+ def get_spatial_image(
15
+ adata,
16
+ library_id: Optional[str] = None,
17
+ prefer_lowres: bool = True,
18
+ ) -> Optional[Tuple[np.ndarray, Dict[str, Any], str]]:
19
+ """
20
+ Extract spatial background image from AnnData object
21
+
22
+ Spatial images are typically stored in:
23
+ - adata.uns['spatial'][library_id]['images']['hires'] or 'lowres'
24
+ - adata.uns['spatial'][library_id]['scalefactors']
25
+
26
+ Args:
27
+ adata: AnnData object
28
+ library_id: Library/sample ID. If None, uses first available.
29
+ prefer_lowres: If True, prefer lowres image for faster rendering
30
+
31
+ Returns:
32
+ Tuple of (image_array, scalefactors, image_key) or None if not found
33
+ """
34
+ try:
35
+ # Check if spatial data exists
36
+ if 'spatial' not in adata.uns:
37
+ return None
38
+
39
+ spatial_data = adata.uns['spatial']
40
+
41
+ # Get library_id
42
+ if library_id is None:
43
+ # Use first available library
44
+ if isinstance(spatial_data, dict) and len(spatial_data) > 0:
45
+ library_id = list(spatial_data.keys())[0]
46
+ else:
47
+ return None
48
+
49
+ if library_id not in spatial_data:
50
+ return None
51
+
52
+ library_data = spatial_data[library_id]
53
+
54
+ # Get images
55
+ if 'images' not in library_data:
56
+ return None
57
+
58
+ images = library_data['images']
59
+
60
+ # Select image based on preference (lowres is faster)
61
+ image_key = None
62
+ if prefer_lowres and 'lowres' in images:
63
+ img_array = images['lowres']
64
+ image_key = 'lowres'
65
+ elif 'hires' in images:
66
+ img_array = images['hires']
67
+ image_key = 'hires'
68
+ elif 'lowres' in images:
69
+ img_array = images['lowres']
70
+ image_key = 'lowres'
71
+ else:
72
+ return None
73
+
74
+ # Get scalefactors
75
+ scalefactors = library_data.get('scalefactors', {})
76
+
77
+ return img_array, scalefactors, image_key
78
+
79
+ except Exception as e:
80
+ print(f"Warning: Could not extract spatial image: {e}")
81
+ return None
82
+
83
+ @staticmethod
84
+ def get_available_libraries(adata) -> list:
85
+ """Get list of available library IDs with spatial images"""
86
+ try:
87
+ if 'spatial' not in adata.uns:
88
+ return []
89
+ return list(adata.uns['spatial'].keys())
90
+ except:
91
+ return []
92
+
93
+ @staticmethod
94
+ def has_spatial_image(adata) -> bool:
95
+ """Check if AnnData has spatial background image"""
96
+ try:
97
+ if 'spatial' not in adata.uns:
98
+ return False
99
+ spatial_data = adata.uns['spatial']
100
+ if not isinstance(spatial_data, dict) or len(spatial_data) == 0:
101
+ return False
102
+ # Check first library
103
+ first_lib = list(spatial_data.keys())[0]
104
+ lib_data = spatial_data[first_lib]
105
+ if 'images' not in lib_data:
106
+ return False
107
+ images = lib_data['images']
108
+ return 'hires' in images or 'lowres' in images
109
+ except:
110
+ return False
111
+
112
+
113
+ class SpatialPlotter:
114
+ """Create spatial visualizations for gene expression"""
115
+
116
+ @staticmethod
117
+ def plot_spatial_gene(
118
+ spatial_coords: np.ndarray,
119
+ expression: np.ndarray,
120
+ gene_name: str,
121
+ point_size: int = 5,
122
+ use_log: bool = False,
123
+ colorscale: str = "Viridis",
124
+ width: int = 800,
125
+ height: int = 800,
126
+ background_image: Optional[Union[np.ndarray, str]] = None,
127
+ scalefactors: Optional[Dict[str, float]] = None,
128
+ background_opacity: float = 0.5,
129
+ ) -> go.Figure:
130
+ """
131
+ Create spatial scatter plot of gene expression
132
+
133
+ Args:
134
+ spatial_coords: Nx2 array of spatial coordinates
135
+ expression: N-length array of gene expression values
136
+ gene_name: Name of the gene
137
+ point_size: Size of scatter points
138
+ use_log: Whether to apply log1p transformation to expression
139
+ colorscale: Plotly colorscale name
140
+ width: Figure width in pixels
141
+ height: Figure height in pixels
142
+ background_image: Background image as numpy array or file path
143
+ scalefactors: Scale factors from h5ad for coordinate mapping
144
+ background_opacity: Opacity of background image (0.0-1.0)
145
+
146
+ Returns:
147
+ Plotly Figure object
148
+ """
149
+ # Prepare expression values
150
+ expr_values = expression.copy()
151
+
152
+ # Apply log transformation if requested
153
+ if use_log:
154
+ expr_values = np.log1p(expr_values)
155
+ expr_label = f"log1p({gene_name})"
156
+ else:
157
+ expr_label = gene_name
158
+
159
+ # Extract coordinates
160
+ x = spatial_coords[:, 0]
161
+ y = spatial_coords[:, 1]
162
+
163
+ # Create figure
164
+ fig = go.Figure()
165
+
166
+ # Add background image if provided
167
+ if background_image is not None:
168
+ try:
169
+ # Handle different input types
170
+ if isinstance(background_image, str):
171
+ # File path
172
+ img = Image.open(background_image)
173
+ img_array = np.array(img)
174
+ elif isinstance(background_image, np.ndarray):
175
+ img_array = background_image
176
+ else:
177
+ img_array = None
178
+
179
+ if img_array is not None:
180
+ # Convert numpy array to PIL Image for Plotly
181
+ if img_array.dtype == np.float64 or img_array.dtype == np.float32:
182
+ # Normalize float images to 0-255
183
+ img_array = (img_array * 255).astype(np.uint8)
184
+
185
+ img = Image.fromarray(img_array)
186
+
187
+ # Calculate image bounds in spatial coordinate system
188
+ # The spatial coordinates in adata.obsm['spatial'] are in full-resolution pixel space
189
+ # The stored image is scaled down by scalefactors
190
+ img_height, img_width = img_array.shape[:2]
191
+
192
+ # Determine the scale factor used for this image
193
+ if scalefactors:
194
+ # Get scale factor based on image_key (passed via scalefactors dict)
195
+ image_key = scalefactors.get('_image_key', 'hires')
196
+ if image_key == 'lowres':
197
+ scale = scalefactors.get('tissue_lowres_scalef', 1.0)
198
+ else:
199
+ scale = scalefactors.get('tissue_hires_scalef', 1.0)
200
+
201
+ # Image spans from (0,0) to (img_width/scale, img_height/scale) in spatial coords
202
+ img_x_min = 0
203
+ img_y_min = 0
204
+ img_x_max = img_width / scale
205
+ img_y_max = img_height / scale
206
+ else:
207
+ # No scalefactors: fit image to coordinate bounds with padding
208
+ padding = 0.05 # 5% padding
209
+ x_range = x.max() - x.min()
210
+ y_range = y.max() - y.min()
211
+ img_x_min = x.min() - x_range * padding
212
+ img_y_min = y.min() - y_range * padding
213
+ img_x_max = x.max() + x_range * padding
214
+ img_y_max = y.max() + y_range * padding
215
+
216
+ # Convert to base64 for Plotly (use JPEG for faster encoding)
217
+ buffered = BytesIO()
218
+ # Convert RGBA to RGB if needed for JPEG
219
+ if img.mode == 'RGBA':
220
+ img_rgb = Image.new('RGB', img.size, (255, 255, 255))
221
+ img_rgb.paste(img, mask=img.split()[3])
222
+ img = img_rgb
223
+ img.save(buffered, format="JPEG", quality=85)
224
+ img_base64 = base64.b64encode(buffered.getvalue()).decode()
225
+ img_src = f"data:image/jpeg;base64,{img_base64}"
226
+
227
+ # With Y-axis reversed (autorange="reversed"), smaller Y is at top
228
+ # Image anchor point is top-left, so y should be img_y_min (top of image)
229
+ fig.add_layout_image(
230
+ dict(
231
+ source=img_src,
232
+ xref="x",
233
+ yref="y",
234
+ x=img_x_min,
235
+ y=img_y_min, # Top of image (smallest Y value)
236
+ sizex=img_x_max - img_x_min,
237
+ sizey=img_y_max - img_y_min,
238
+ sizing="stretch",
239
+ opacity=background_opacity,
240
+ layer="below",
241
+ yanchor="top",
242
+ )
243
+ )
244
+ except Exception as e:
245
+ print(f"Warning: Could not load background image: {e}")
246
+
247
+ # Add scatter plot
248
+ fig.add_trace(
249
+ go.Scatter(
250
+ x=x,
251
+ y=y,
252
+ mode="markers",
253
+ marker=dict(
254
+ size=point_size,
255
+ color=expr_values,
256
+ colorscale=colorscale,
257
+ showscale=True,
258
+ colorbar=dict(title=expr_label),
259
+ line=dict(width=0),
260
+ ),
261
+ text=[f"Expression: {val:.2f}" for val in expr_values],
262
+ hovertemplate="<b>%{text}</b><br>"
263
+ + "X: %{x:.1f}<br>"
264
+ + "Y: %{y:.1f}<br>"
265
+ + "<extra></extra>",
266
+ )
267
+ )
268
+
269
+ # Update layout
270
+ fig.update_layout(
271
+ title=dict(
272
+ text=f"Spatial Expression: {gene_name}",
273
+ x=0.5,
274
+ xanchor="center",
275
+ font=dict(size=18),
276
+ ),
277
+ xaxis=dict(
278
+ title="Spatial X",
279
+ showgrid=False,
280
+ zeroline=False,
281
+ ),
282
+ yaxis=dict(
283
+ title="Spatial Y",
284
+ showgrid=False,
285
+ zeroline=False,
286
+ scaleanchor="x",
287
+ scaleratio=1,
288
+ autorange="reversed", # Flip Y-axis to match image coordinate system
289
+ ),
290
+ width=width,
291
+ height=height,
292
+ hovermode="closest",
293
+ plot_bgcolor="white",
294
+ )
295
+
296
+ return fig
297
+
298
+ @staticmethod
299
+ def create_overview_plot(
300
+ spatial_coords: np.ndarray,
301
+ width: int = 600,
302
+ height: int = 600,
303
+ ) -> go.Figure:
304
+ """
305
+ Create overview plot of spatial coordinates (without gene expression)
306
+
307
+ Args:
308
+ spatial_coords: Nx2 array of spatial coordinates
309
+ width: Figure width in pixels
310
+ height: Figure height in pixels
311
+
312
+ Returns:
313
+ Plotly Figure object
314
+ """
315
+ x = spatial_coords[:, 0]
316
+ y = spatial_coords[:, 1]
317
+
318
+ fig = go.Figure()
319
+
320
+ fig.add_trace(
321
+ go.Scatter(
322
+ x=x,
323
+ y=y,
324
+ mode="markers",
325
+ marker=dict(
326
+ size=3,
327
+ color="lightblue",
328
+ line=dict(width=0),
329
+ ),
330
+ hovertemplate="X: %{x:.1f}<br>Y: %{y:.1f}<extra></extra>",
331
+ )
332
+ )
333
+
334
+ fig.update_layout(
335
+ title=dict(
336
+ text="Spatial Overview",
337
+ x=0.5,
338
+ xanchor="center",
339
+ ),
340
+ xaxis=dict(
341
+ title="Spatial X",
342
+ showgrid=False,
343
+ zeroline=False,
344
+ ),
345
+ yaxis=dict(
346
+ title="Spatial Y",
347
+ showgrid=False,
348
+ zeroline=False,
349
+ scaleanchor="x",
350
+ scaleratio=1,
351
+ ),
352
+ width=width,
353
+ height=height,
354
+ plot_bgcolor="white",
355
+ )
356
+
357
+ return fig
358
+
359
+ @staticmethod
360
+ def create_overview_plot_with_background(
361
+ spatial_coords: np.ndarray,
362
+ background_image: Optional[np.ndarray] = None,
363
+ scalefactors: Optional[Dict[str, Any]] = None,
364
+ width: int = 600,
365
+ height: int = 600,
366
+ background_opacity: float = 0.6,
367
+ ) -> go.Figure:
368
+ """
369
+ Create overview plot of spatial coordinates with optional tissue background
370
+
371
+ Args:
372
+ spatial_coords: Nx2 array of spatial coordinates
373
+ background_image: Optional background image as numpy array
374
+ scalefactors: Scale factors for coordinate mapping
375
+ width: Figure width in pixels
376
+ height: Figure height in pixels
377
+ background_opacity: Opacity of background image
378
+
379
+ Returns:
380
+ Plotly Figure object
381
+ """
382
+ x = spatial_coords[:, 0]
383
+ y = spatial_coords[:, 1]
384
+
385
+ fig = go.Figure()
386
+
387
+ # Add background image if provided
388
+ if background_image is not None:
389
+ try:
390
+ img_array = background_image
391
+ if img_array.dtype == np.float64 or img_array.dtype == np.float32:
392
+ img_array = (img_array * 255).astype(np.uint8)
393
+
394
+ img = Image.fromarray(img_array)
395
+ img_height, img_width = img_array.shape[:2]
396
+
397
+ # Calculate image bounds
398
+ if scalefactors:
399
+ image_key = scalefactors.get('_image_key', 'hires')
400
+ if image_key == 'lowres':
401
+ scale = scalefactors.get('tissue_lowres_scalef', 1.0)
402
+ else:
403
+ scale = scalefactors.get('tissue_hires_scalef', 1.0)
404
+ img_x_min = 0
405
+ img_y_min = 0
406
+ img_x_max = img_width / scale
407
+ img_y_max = img_height / scale
408
+ else:
409
+ padding = 0.05
410
+ x_range = x.max() - x.min()
411
+ y_range = y.max() - y.min()
412
+ img_x_min = x.min() - x_range * padding
413
+ img_y_min = y.min() - y_range * padding
414
+ img_x_max = x.max() + x_range * padding
415
+ img_y_max = y.max() + y_range * padding
416
+
417
+ # Convert to base64
418
+ buffered = BytesIO()
419
+ if img.mode == 'RGBA':
420
+ img_rgb = Image.new('RGB', img.size, (255, 255, 255))
421
+ img_rgb.paste(img, mask=img.split()[3])
422
+ img = img_rgb
423
+ img.save(buffered, format="JPEG", quality=85)
424
+ img_base64 = base64.b64encode(buffered.getvalue()).decode()
425
+ img_src = f"data:image/jpeg;base64,{img_base64}"
426
+
427
+ fig.add_layout_image(
428
+ dict(
429
+ source=img_src,
430
+ xref="x",
431
+ yref="y",
432
+ x=img_x_min,
433
+ y=img_y_min,
434
+ sizex=img_x_max - img_x_min,
435
+ sizey=img_y_max - img_y_min,
436
+ sizing="stretch",
437
+ opacity=background_opacity,
438
+ layer="below",
439
+ yanchor="top",
440
+ )
441
+ )
442
+ except Exception as e:
443
+ print(f"Warning: Could not add background image: {e}")
444
+
445
+ fig.add_trace(
446
+ go.Scatter(
447
+ x=x,
448
+ y=y,
449
+ mode="markers",
450
+ marker=dict(
451
+ size=3,
452
+ color="rgba(65, 105, 225, 0.7)", # Royal blue with transparency
453
+ line=dict(width=0),
454
+ ),
455
+ hovertemplate="X: %{x:.1f}<br>Y: %{y:.1f}<extra></extra>",
456
+ )
457
+ )
458
+
459
+ fig.update_layout(
460
+ title=dict(
461
+ text="Spatial Overview",
462
+ x=0.5,
463
+ xanchor="center",
464
+ ),
465
+ xaxis=dict(
466
+ title="Spatial X",
467
+ showgrid=False,
468
+ zeroline=False,
469
+ ),
470
+ yaxis=dict(
471
+ title="Spatial Y",
472
+ showgrid=False,
473
+ zeroline=False,
474
+ scaleanchor="x",
475
+ scaleratio=1,
476
+ autorange="reversed", # Match image coordinate system
477
+ ),
478
+ width=width,
479
+ height=height,
480
+ plot_bgcolor="white",
481
+ )
482
+
483
+ return fig
484
+
485
+ @staticmethod
486
+ def get_expression_stats(expression: np.ndarray) -> dict:
487
+ """
488
+ Calculate basic statistics for expression values
489
+
490
+ Args:
491
+ expression: Expression array
492
+
493
+ Returns:
494
+ Dictionary with statistics
495
+ """
496
+ return {
497
+ "min": float(np.min(expression)),
498
+ "max": float(np.max(expression)),
499
+ "mean": float(np.mean(expression)),
500
+ "median": float(np.median(expression)),
501
+ "std": float(np.std(expression)),
502
+ "non_zero_count": int(np.sum(expression > 0)),
503
+ "non_zero_percent": float(100 * np.sum(expression > 0) / len(expression)),
504
+ }
utils/validator.py ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Tuple, List
2
+ import numpy as np
3
+ from anndata import AnnData
4
+
5
+
6
+ class AnnDataValidator:
7
+ """Validate AnnData objects for spatial visualization requirements"""
8
+
9
+ MAX_OBS = 500_000 # Max number of observations (cells/spots)
10
+ MAX_VARS = 50_000 # Max number of variables (genes)
11
+
12
+ @staticmethod
13
+ def validate(adata: AnnData) -> Tuple[bool, List[str]]:
14
+ """
15
+ Validate AnnData object for spatial visualization
16
+
17
+ Args:
18
+ adata: AnnData object to validate
19
+
20
+ Returns:
21
+ Tuple of (is_valid, error_messages)
22
+ """
23
+ errors = []
24
+
25
+ # Check spatial coordinates exist
26
+ if "spatial" not in adata.obsm:
27
+ errors.append(
28
+ "Missing spatial coordinates. adata.obsm['spatial'] is required."
29
+ )
30
+
31
+ # Validate spatial coordinates format
32
+ if "spatial" in adata.obsm:
33
+ spatial = adata.obsm["spatial"]
34
+ if spatial.shape[1] != 2:
35
+ errors.append(
36
+ f"Spatial coordinates must be 2D (x, y). Got shape: {spatial.shape}"
37
+ )
38
+
39
+ # Check number of observations
40
+ if adata.n_obs > AnnDataValidator.MAX_OBS:
41
+ errors.append(
42
+ f"Too many observations: {adata.n_obs:,} (max: {AnnDataValidator.MAX_OBS:,})"
43
+ )
44
+
45
+ # Check number of variables
46
+ if adata.n_vars > AnnDataValidator.MAX_VARS:
47
+ errors.append(
48
+ f"Too many variables: {adata.n_vars:,} (max: {AnnDataValidator.MAX_VARS:,})"
49
+ )
50
+
51
+ # Check if data is accessible
52
+ try:
53
+ _ = adata.var_names
54
+ except Exception as e:
55
+ errors.append(f"Cannot access variable names: {str(e)}")
56
+
57
+ return (len(errors) == 0, errors)
58
+
59
+ @staticmethod
60
+ def validate_gene(adata: AnnData, gene_name: str) -> Tuple[bool, str]:
61
+ """
62
+ Validate if a gene exists in the dataset
63
+
64
+ Args:
65
+ adata: AnnData object
66
+ gene_name: Gene name to check
67
+
68
+ Returns:
69
+ Tuple of (exists, message)
70
+ """
71
+ if gene_name not in adata.var_names:
72
+ # Try to find similar gene names
73
+ var_names = list(adata.var_names)
74
+ similar = [g for g in var_names if gene_name.lower() in g.lower()][:5]
75
+
76
+ if similar:
77
+ return (
78
+ False,
79
+ f"Gene '{gene_name}' not found. Similar genes: {', '.join(similar)}",
80
+ )
81
+ else:
82
+ return (False, f"Gene '{gene_name}' not found in dataset.")
83
+
84
+ return (True, f"Gene '{gene_name}' found.")
85
+
86
+ @staticmethod
87
+ def get_gene_expression(adata: AnnData, gene_name: str) -> np.ndarray:
88
+ """
89
+ Extract gene expression for a specific gene
90
+
91
+ Args:
92
+ adata: AnnData object
93
+ gene_name: Gene name to extract
94
+
95
+ Returns:
96
+ Expression vector as numpy array
97
+
98
+ Raises:
99
+ ValueError: If gene not found
100
+ """
101
+ is_valid, message = AnnDataValidator.validate_gene(adata, gene_name)
102
+ if not is_valid:
103
+ raise ValueError(message)
104
+
105
+ # Extract gene expression (works with backed mode)
106
+ gene_data = adata[:, gene_name].X
107
+
108
+ # Convert to dense array if sparse
109
+ if hasattr(gene_data, "toarray"):
110
+ gene_data = gene_data.toarray()
111
+
112
+ # Flatten if needed
113
+ if gene_data.ndim > 1:
114
+ gene_data = gene_data.flatten()
115
+
116
+ return gene_data
117
+
118
+ @staticmethod
119
+ def get_gene_list(adata: AnnData, limit: int = 1000) -> List[str]:
120
+ """
121
+ Get list of available genes (limited for performance)
122
+
123
+ Args:
124
+ adata: AnnData object
125
+ limit: Maximum number of genes to return
126
+
127
+ Returns:
128
+ List of gene names
129
+ """
130
+ var_names = list(adata.var_names)
131
+ return var_names[:limit]