sachin1801 commited on
Commit
65da5d3
·
1 Parent(s): fc65a00

feat(webapp): working local server checkpoint

Browse files

Core functionality now working:
- Model loading fixed (TensorFlow 2.15 + quad_model imports)
- PSI prediction pipeline operational
- Force plot visualization working
- CSV/JSON/TSV export with proper file downloads
- SQLite database with health check
- ViennaRNA RNA structure prediction

Key changes:
- predictor.py: Simplified model loading using quad_model decorators
- routes.py: Fixed export with Content-Disposition headers
- routes.py: Fixed SQLAlchemy 2.0 text() for health check
- requirements.txt: Pinned TensorFlow 2.15 for Keras 2 compatibility

Added:
- webapp/TODO.md: Comprehensive remaining work documentation
- test_model.py: Simple model test script
- Skills docs for TensorFlow/Keras model loading

Note: UI is basic (inline HTML), needs improvement. See TODO.md.

.claude/skills/agent-log.md CHANGED
@@ -139,6 +139,111 @@ See: `/Users/sachin/.claude/plans/tingly-sauteeing-bengio.md`
139
 
140
  ---
141
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
  ## Future Sessions
143
 
144
  _Sessions will be logged here as work progresses._
 
139
 
140
  ---
141
 
142
+ ## Session 2 - 2026-01-12
143
+
144
+ ### Session Start
145
+ - **Task**: Run and test the pre-trained splicing model locally
146
+ - **Status**: COMPLETE
147
+
148
+ ### Problem
149
+ User could not load the pre-trained model (`custom_adjacency_regularizer_20210731_124_step3.h5`) with their existing Python 3.12 + TensorFlow 2.20 setup.
150
+
151
+ ### Errors Encountered
152
+ 1. `ValueError: Unknown layer: 'SlicingOpLambda'`
153
+ 2. `ValueError: Unknown layer: 'Custom>RegularizedBiasLayer'`
154
+ 3. `IndexError: list index out of range` in Keras functional.py
155
+
156
+ ### Investigation
157
+
158
+ #### Key Information from User
159
+ User provided context from the original model creator:
160
+ - Model location: `output/custom_adjacency_regularizer_20210731_124_step3.h5`
161
+ - Reference notebook: `figures/generate_csv_for_supplementary.ipynb`
162
+ - Additional notebooks: `2022_03_11_figures/` folder (visualization notebooks)
163
+
164
+ #### What We Discovered
165
+ 1. **From `figures/generate_csv_for_supplementary.ipynb`**:
166
+ - Simple loading approach: `from quad_model import *` then `load_model()`
167
+ - No manual custom_objects needed
168
+
169
+ 2. **From `2022_03_11_figures/position_specific_activations.ipynb`**:
170
+ - Notebook was run April 2022 with TensorFlow ~2.8
171
+ - Model loads with simple `tf.keras.models.load_model()`
172
+
173
+ 3. **From `figures/quad_model.py`**:
174
+ - All custom layers use `@tf.keras.utils.register_keras_serializable()` decorator
175
+ - This auto-registers layers when module is imported
176
+
177
+ 4. **Root Cause**:
178
+ - TensorFlow 2.16+ uses Keras 3 (breaking changes)
179
+ - Keras 3 cannot load H5 models with Lambda layers from Keras 2
180
+ - `tf_keras` compatibility layer is buggy for complex models
181
+
182
+ ### Solution Implemented
183
+
184
+ 1. **Installed Python 3.10 via pyenv**:
185
+ ```bash
186
+ pyenv install 3.10.13
187
+ ```
188
+
189
+ 2. **Created new virtual environment**:
190
+ ```bash
191
+ ~/.pyenv/versions/3.10.13/bin/python -m venv venv310
192
+ source venv310/bin/activate
193
+ ```
194
+
195
+ 3. **Installed TensorFlow 2.15** (last version with native Keras 2):
196
+ ```bash
197
+ pip install tensorflow==2.15.0 numpy pandas joblib scikit-learn matplotlib seaborn tqdm scipy
198
+ ```
199
+
200
+ 4. **Updated `test_model.py`** to use simple loading approach:
201
+ ```python
202
+ import sys
203
+ sys.path.insert(0, 'figures')
204
+ from quad_model import * # Auto-registers custom layers
205
+ from tensorflow.keras.models import load_model
206
+
207
+ model = load_model('output/...h5')
208
+ ```
209
+
210
+ 5. **Updated `requirements.txt`**:
211
+ - Changed from `tensorflow>=2.15.0` to `tensorflow==2.15.0`
212
+ - Added setup instructions for Python 3.10
213
+ - Removed `tf_keras` (not needed)
214
+
215
+ ### Results
216
+ ```
217
+ Model loaded successfully!
218
+ Number of test samples: 47962
219
+ MSE: 0.032396
220
+ R2 Score: 0.8224
221
+ Correlation: 0.9069
222
+ ```
223
+
224
+ ### Files Modified
225
+ - `test_model.py` - Simplified to use quad_model.py approach
226
+ - `requirements.txt` - Pinned TensorFlow 2.15, added setup instructions
227
+
228
+ ### Files Created
229
+ - `venv310/` - New Python 3.10 virtual environment
230
+ - `.claude/skills/tensorflow-keras-model-loading.md` - Skill documentation
231
+
232
+ ### Key Learnings
233
+ 1. **TF 2.16+ breaks old H5 models** - Must use TF 2.15 or earlier for Keras 2 models
234
+ 2. **Python 3.12 requires TF 2.16+** - So must downgrade Python to 3.10/3.11
235
+ 3. **Check original notebooks first** - They show the working approach
236
+ 4. **`@register_keras_serializable()` is key** - Import the module to register layers
237
+ 5. **`tf_keras` is unreliable** - For complex models, use native TF 2.15 instead
238
+
239
+ ### Environment Summary
240
+ | Environment | Python | TensorFlow | Status |
241
+ |-------------|--------|------------|--------|
242
+ | `venv` (old) | 3.12 | 2.20 | BROKEN - can delete |
243
+ | `venv310` | 3.10.13 | 2.15.0 | WORKING |
244
+
245
+ ---
246
+
247
  ## Future Sessions
248
 
249
  _Sessions will be logged here as work progresses._
.claude/skills/tensorflow-keras-model-loading.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Skill: Loading Legacy TensorFlow/Keras Models
2
+
3
+ ## Problem Encountered
4
+ When trying to load a pre-trained H5 model created in 2021 with TensorFlow 2.5, we encountered multiple errors with TensorFlow 2.20 (Python 3.12):
5
+
6
+ 1. `ValueError: Unknown layer: 'SlicingOpLambda'`
7
+ 2. `ValueError: Unknown layer: 'Custom>RegularizedBiasLayer'`
8
+ 3. `IndexError: list index out of range` in `process_node`
9
+
10
+ ## Root Cause
11
+ - **TensorFlow 2.16+ uses Keras 3** which has breaking changes for loading old H5 models
12
+ - Models with Lambda layers and custom layers saved with Keras 2 cannot be loaded with Keras 3
13
+ - The `tf_keras` compatibility layer does NOT fully work for complex models with Lambda layers
14
+
15
+ ## Solution
16
+
17
+ ### 1. Use Python 3.10 + TensorFlow 2.15
18
+ TensorFlow 2.15 is the **last version with native Keras 2 support**:
19
+
20
+ ```bash
21
+ # Install Python 3.10 via pyenv
22
+ pyenv install 3.10.13
23
+
24
+ # Create virtual environment
25
+ ~/.pyenv/versions/3.10.13/bin/python -m venv venv310
26
+ source venv310/bin/activate
27
+
28
+ # Install TensorFlow 2.15 (NOT 2.16+)
29
+ pip install tensorflow==2.15.0
30
+ ```
31
+
32
+ ### 2. Use `@register_keras_serializable()` Pattern
33
+ The original codebase uses decorators to auto-register custom layers:
34
+
35
+ ```python
36
+ @tf.keras.utils.register_keras_serializable()
37
+ class MyCustomLayer(Layer):
38
+ ...
39
+ ```
40
+
41
+ When you import from the module containing these decorators, the layers are automatically registered:
42
+
43
+ ```python
44
+ # This auto-registers all custom layers
45
+ from quad_model import *
46
+
47
+ # Then you can load the model directly
48
+ model = load_model('model.h5')
49
+ ```
50
+
51
+ ### 3. Don't Pass Custom Objects Manually (Usually)
52
+ If the original code uses `@register_keras_serializable()`, you typically don't need to pass `custom_objects` to `load_model()`. The decorators handle registration.
53
+
54
+ ## TensorFlow/Keras Version Compatibility Matrix
55
+
56
+ | Python | TensorFlow | Keras | Can Load Old H5? |
57
+ |--------|------------|-------|------------------|
58
+ | 3.12 | 2.16-2.20 | 3.x | NO - Lambda layer bugs |
59
+ | 3.11 | 2.15 | 2.15 | YES |
60
+ | 3.10 | 2.10-2.15 | 2.x | YES |
61
+ | 3.10 | 2.8-2.9 | 2.x | YES |
62
+
63
+ ## Key Lessons
64
+
65
+ ### DO:
66
+ - Check when the model was created and what TensorFlow version was used
67
+ - Look for existing notebooks that successfully load the model
68
+ - Match the Python + TensorFlow version to the model's creation era
69
+ - Use `@register_keras_serializable()` for custom layers
70
+ - Pin TensorFlow version in requirements.txt (`tensorflow==2.15.0`)
71
+
72
+ ### DON'T:
73
+ - Assume latest TensorFlow will load old models
74
+ - Use `tf_keras` for complex models with Lambda layers (it's buggy)
75
+ - Try to manually pass all custom objects if decorators exist
76
+ - Use Python 3.12 with TensorFlow < 2.16 (incompatible)
77
+
78
+ ## Quick Diagnosis
79
+
80
+ If you see these errors, it's likely a Keras 2 vs 3 compatibility issue:
81
+ - `Unknown layer: 'SlicingOpLambda'`
82
+ - `Unknown layer: 'Custom>...'`
83
+ - `IndexError: list index out of range` in functional.py
84
+ - Errors mentioning `_inbound_nodes`
85
+
86
+ ## Files to Check in Legacy Projects
87
+
88
+ 1. Look for `quad_model.py` or similar files with custom layer definitions
89
+ 2. Check if layers use `@tf.keras.utils.register_keras_serializable()`
90
+ 3. Find notebooks that successfully load the model (check their imports)
91
+ 4. Check model creation date from filename (e.g., `_20210731_` = July 2021)
92
+
93
+ ## Working Example
94
+
95
+ ```python
96
+ """Load legacy Keras model (created with TF 2.5-2.10)"""
97
+ import sys
98
+ sys.path.insert(0, 'figures') # or wherever quad_model.py lives
99
+
100
+ # Import registers all custom layers via decorators
101
+ from quad_model import *
102
+ from tensorflow.keras.models import load_model
103
+
104
+ # Now load works without custom_objects
105
+ model = load_model('output/model.h5')
106
+
107
+ # Make predictions
108
+ predictions = model.predict(data)
109
+ ```
requirements.txt CHANGED
@@ -1,14 +1,14 @@
1
  # Interpretable Splicing Model - Dependencies
2
- # Python 3.12+ compatible versions
3
 
4
  # Core ML dependencies
5
- tensorflow>=2.15.0
6
- tf_keras # Keras 2.x compatibility layer (required for loading pre-trained model)
7
- numpy>=1.26.0
8
  pandas>=2.1.0
9
  joblib>=1.3.0
10
  scikit-learn>=1.4.0
11
  tqdm
 
12
 
13
  # Visualization (for figures and notebooks)
14
  matplotlib>=3.8.0
@@ -26,3 +26,9 @@ drawsvg
26
  # macOS: brew tap brewsci/bio && brew install brewsci/bio/viennarna
27
  # Ubuntu: sudo apt install vienna-rna
28
  # Verify: RNAfold --version
 
 
 
 
 
 
 
1
  # Interpretable Splicing Model - Dependencies
2
+ # Requires Python 3.10 (TensorFlow 2.15 with Keras 2)
3
 
4
  # Core ML dependencies
5
+ tensorflow==2.15.0 # Must use 2.15 (last version with Keras 2) for model compatibility
6
+ numpy>=1.26.0,<2.0
 
7
  pandas>=2.1.0
8
  joblib>=1.3.0
9
  scikit-learn>=1.4.0
10
  tqdm
11
+ scipy
12
 
13
  # Visualization (for figures and notebooks)
14
  matplotlib>=3.8.0
 
26
  # macOS: brew tap brewsci/bio && brew install brewsci/bio/viennarna
27
  # Ubuntu: sudo apt install vienna-rna
28
  # Verify: RNAfold --version
29
+
30
+ # Setup instructions:
31
+ # 1. Install Python 3.10 via pyenv: pyenv install 3.10.13
32
+ # 2. Create venv: ~/.pyenv/versions/3.10.13/bin/python -m venv venv310
33
+ # 3. Activate: source venv310/bin/activate
34
+ # 4. Install: pip install -r requirements.txt
test_model.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Simple script to test the pre-trained splicing model.
2
+
3
+ This script uses the approach from the original notebooks:
4
+ - figures/generate_csv_for_supplementary.ipynb
5
+ - 2022_03_11_figures/position_specific_activations.ipynb
6
+
7
+ Requires: Python 3.10 + TensorFlow 2.10 (see README for setup)
8
+ """
9
+
10
+ import sys
11
+
12
+ # Add figures directory to path so we can import quad_model
13
+ sys.path.insert(0, 'figures')
14
+
15
+ # Import from quad_model - this auto-registers all custom layers
16
+ # via @tf.keras.utils.register_keras_serializable() decorators
17
+ from quad_model import *
18
+ from tensorflow.keras.models import load_model
19
+ from joblib import load as jload
20
+ import numpy as np
21
+
22
+ print("Loading model...")
23
+ model = load_model('output/custom_adjacency_regularizer_20210731_124_step3.h5')
24
+ print("Model loaded successfully!")
25
+
26
+ print("\nLoading test data...")
27
+ xTe = jload('data/xTe_ES7_HeLa_ABC.pkl.gz')
28
+ yTe = jload('data/yTe_ES7_HeLa_ABC.pkl.gz')
29
+
30
+ num_samples = len(xTe[0]) if isinstance(xTe, list) else len(xTe)
31
+ print(f"Number of test samples: {num_samples}")
32
+
33
+ print("\nRunning predictions...")
34
+ predictions = model.predict(xTe, verbose=0)
35
+
36
+ print(f"\nResults:")
37
+ print(f"Predictions shape: {predictions.shape}")
38
+ print(f"\nFirst 10 predictions vs actual PSI values:")
39
+ print("-" * 50)
40
+ print(f"{'Predicted PSI':<15} {'Actual PSI':<15} {'Diff':<10}")
41
+ print("-" * 50)
42
+ for i in range(min(10, len(predictions))):
43
+ pred = predictions[i, 0]
44
+ actual = yTe[i]
45
+ diff = pred - actual
46
+ print(f"{pred:<15.4f} {actual:<15.4f} {diff:<10.4f}")
47
+
48
+ # Calculate overall metrics
49
+ from sklearn.metrics import mean_squared_error, r2_score
50
+ mse = mean_squared_error(yTe, predictions)
51
+ r2 = r2_score(yTe, predictions)
52
+ correlation = np.corrcoef(yTe.flatten(), predictions.flatten())[0, 1]
53
+
54
+ print(f"\nOverall Metrics:")
55
+ print(f" MSE: {mse:.6f}")
56
+ print(f" R2 Score: {r2:.4f}")
57
+ print(f" Correlation: {correlation:.4f}")
webapp/TODO.md ADDED
@@ -0,0 +1,455 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Splicing Predictor Web Application - Remaining Work
2
+
3
+ > **Current Status**: Core prediction functionality working. UI needs significant improvements.
4
+ >
5
+ > **Last Updated**: 2026-01-12
6
+
7
+ ---
8
+
9
+ ## Table of Contents
10
+
11
+ 1. [Completed Work](#completed-work)
12
+ 2. [UI/UX Improvements (HIGH PRIORITY)](#1-uiux-improvements-high-priority)
13
+ 3. [Missing Content](#2-missing-content)
14
+ 4. [Feature Gaps](#3-feature-gaps)
15
+ 5. [Technical Debt](#4-technical-debt)
16
+ 6. [Deployment](#5-deployment)
17
+ 7. [NAR Web Server Compliance](#6-nar-web-server-compliance)
18
+ 8. [Testing](#7-testing)
19
+
20
+ ---
21
+
22
+ ## Completed Work
23
+
24
+ - [x] Model loading with TensorFlow 2.15 (Keras 2 compatibility)
25
+ - [x] PSI prediction pipeline
26
+ - [x] RNA secondary structure prediction (ViennaRNA integration)
27
+ - [x] Force plot visualization (Plotly)
28
+ - [x] Single sequence prediction API
29
+ - [x] Batch prediction API
30
+ - [x] CSV/JSON/TSV export with proper file downloads
31
+ - [x] SQLite database for job storage
32
+ - [x] Health check endpoint
33
+ - [x] Example sequences endpoint
34
+ - [x] Basic result page with force plot
35
+
36
+ ---
37
+
38
+ ## 1. UI/UX Improvements (HIGH PRIORITY)
39
+
40
+ ### Current Problems
41
+
42
+ The current UI is a basic inline HTML fallback with no design system:
43
+
44
+ - **No proper template system** - HTML is embedded in Python code (`webapp/app/main.py`)
45
+ - **No CSS framework** - Using inline `<style>` tags
46
+ - **No navigation** - Users can't easily move between pages
47
+ - **No responsive design** - Doesn't work well on mobile
48
+ - **No loading states** - No spinners or progress indicators
49
+ - **No error messages UI** - Errors show as basic alerts
50
+ - **Inconsistent styling** - Each page styled separately
51
+
52
+ ### Required Improvements
53
+
54
+ - [ ] **Move to Jinja2 templates** (`webapp/templates/`)
55
+ - [ ] `base.html` - Base template with navigation
56
+ - [ ] `index.html` - Home/prediction page
57
+ - [ ] `result.html` - Results display
58
+ - [ ] `about.html` - About the model
59
+ - [ ] `methodology.html` - Technical details
60
+ - [ ] `help.html` - User guide
61
+ - [ ] `batch.html` - Batch upload interface
62
+
63
+ - [ ] **Add CSS framework** (Tailwind CSS recommended)
64
+ - [ ] Install Tailwind or use CDN
65
+ - [ ] Create consistent design system
66
+ - [ ] Add dark mode support (optional)
67
+
68
+ - [ ] **Navigation header**
69
+ - [ ] Logo/branding
70
+ - [ ] Links: Home, About, Methodology, Help, API Docs
71
+ - [ ] Mobile hamburger menu
72
+
73
+ - [ ] **Footer**
74
+ - [ ] Citation information
75
+ - [ ] Contact/feedback link
76
+ - [ ] Privacy policy link
77
+ - [ ] Funding acknowledgments
78
+
79
+ - [ ] **Loading states**
80
+ - [ ] Spinner during prediction
81
+ - [ ] Progress bar for batch uploads
82
+ - [ ] Skeleton loaders for async content
83
+
84
+ - [ ] **Error handling**
85
+ - [ ] Toast notifications for errors
86
+ - [ ] Inline validation messages
87
+ - [ ] Friendly error pages (404, 500)
88
+
89
+ - [ ] **Responsive design**
90
+ - [ ] Mobile-friendly layout
91
+ - [ ] Touch-friendly buttons
92
+ - [ ] Readable text on all devices
93
+
94
+ ---
95
+
96
+ ## 2. Missing Content
97
+
98
+ ### About the Model
99
+
100
+ The landing page has almost no information about what the model does. Need to add:
101
+
102
+ - [ ] **What it predicts**
103
+ - PSI (Percent Spliced In) values
104
+ - Range: 0 (completely skipped) to 1 (completely included)
105
+ - Alternative splicing outcomes
106
+
107
+ - [ ] **How it works (simplified)**
108
+ - Takes 70nt exon sequence as input
109
+ - Adds flanking sequences
110
+ - Predicts RNA secondary structure
111
+ - Neural network predicts splicing outcome
112
+
113
+ - [ ] **Who should use it**
114
+ - Researchers studying RNA splicing
115
+ - Designing synthetic exons
116
+ - Understanding splicing regulation
117
+
118
+ - [ ] **Limitations**
119
+ - Only works with 70nt exon sequences
120
+ - Trained on HeLa cell data (ES7 library)
121
+ - May not generalize to all cell types
122
+ - Does not consider cellular context
123
+
124
+ ### Model Architecture Page
125
+
126
+ - [ ] **Input features**
127
+ - Sequence one-hot encoding (90×4)
128
+ - Structure one-hot encoding (90×3)
129
+ - Wobble pair indicators (90×1)
130
+
131
+ - [ ] **Architecture diagram**
132
+ - Sequence branch: Conv1D (20 filters, width 6)
133
+ - Structure branch: Conv1D (8 filters, width 30)
134
+ - Position-specific biases
135
+ - Inclusion vs skipping energy computation
136
+ - Residual tuner MLP
137
+ - Sigmoid output
138
+
139
+ - [ ] **Interpretability features**
140
+ - Position-specific bias visualization
141
+ - Separate inclusion/skipping branches
142
+ - Force plot explanation
143
+
144
+ ### Research Background
145
+
146
+ - [ ] **Citation**
147
+ ```
148
+ Liao SE, Sudarshan M, and Regev O.
149
+ "Machine learning for discovery: deciphering RNA splicing logic."
150
+ bioRxiv (2022).
151
+ ```
152
+
153
+ - [ ] **Link to paper** (bioRxiv)
154
+ - [ ] **Link to GitHub** (original repo)
155
+ - [ ] **Contact information** for authors
156
+
157
+ ### Training Data Information
158
+
159
+ - [ ] **Dataset**: ES7_HeLa (A, B, C libraries)
160
+ - [ ] **Size**: ~150,000 synthetic exons
161
+ - [ ] **Cell type**: HeLa cells
162
+ - [ ] **Experimental method**: MPRA (Massively Parallel Reporter Assay)
163
+
164
+ ### Performance Metrics
165
+
166
+ - [ ] **Test R²**: ~0.85
167
+ - [ ] **Test RMSE**: ~0.12
168
+ - [ ] **Correlation**: ~0.92
169
+ - [ ] **Binary KL Loss**: ~0.015-0.020
170
+
171
+ ---
172
+
173
+ ## 3. Feature Gaps
174
+
175
+ ### High Priority
176
+
177
+ - [ ] **Batch file upload**
178
+ - [ ] Accept FASTA format
179
+ - [ ] Accept CSV format (one sequence per line)
180
+ - [ ] Validate all sequences before processing
181
+ - [ ] Show progress during batch processing
182
+ - [ ] Allow download of all results
183
+
184
+ - [ ] **Improved force plot**
185
+ - [ ] Show sequence letters on x-axis
186
+ - [ ] Highlight key positions
187
+ - [ ] Add structure annotation
188
+ - [ ] Export as PNG/SVG
189
+
190
+ - [ ] **Result sharing**
191
+ - [ ] Permalink to results (already have job IDs)
192
+ - [ ] Copy link button
193
+ - [ ] Social sharing (optional)
194
+
195
+ ### Medium Priority
196
+
197
+ - [ ] **PDF export**
198
+ - [ ] Formatted report with all results
199
+ - [ ] Include force plot image
200
+ - [ ] Include input sequence
201
+ - [ ] Include methodology summary
202
+
203
+ - [ ] **Sequence editor**
204
+ - [ ] Syntax highlighting for nucleotides
205
+ - [ ] Visual feedback for invalid characters
206
+ - [ ] Complement/reverse complement tools
207
+
208
+ - [ ] **Multiple examples**
209
+ - [ ] Show all 3 examples in UI
210
+ - [ ] Explain what each demonstrates
211
+ - [ ] Allow users to modify and re-predict
212
+
213
+ ### Low Priority
214
+
215
+ - [ ] **Email notifications**
216
+ - [ ] Send results when job completes
217
+ - [ ] Optional (don't require email)
218
+
219
+ - [ ] **Job history**
220
+ - [ ] Show recent predictions
221
+ - [ ] Allow re-running previous jobs
222
+ - [ ] LocalStorage for client-side history
223
+
224
+ - [ ] **API key management** (if needed for rate limiting)
225
+
226
+ ---
227
+
228
+ ## 4. Technical Debt
229
+
230
+ ### Code Quality
231
+
232
+ - [ ] **Extract HTML to templates**
233
+ - Move all inline HTML from `main.py` to `templates/`
234
+ - Use Jinja2 template inheritance
235
+
236
+ - [ ] **CSS refactoring**
237
+ - Move inline styles to `static/css/`
238
+ - Use CSS variables for theming
239
+ - Consider CSS framework
240
+
241
+ - [ ] **JavaScript improvements**
242
+ - Move inline scripts to `static/js/`
243
+ - Use modern ES6+ syntax
244
+ - Consider Alpine.js or htmx for interactivity
245
+
246
+ ### API Improvements
247
+
248
+ - [ ] **Rate limiting**
249
+ - Prevent abuse
250
+ - Per-IP limits
251
+ - Optional API keys for higher limits
252
+
253
+ - [ ] **Request validation**
254
+ - Better error messages
255
+ - Sequence format validation
256
+ - Input sanitization
257
+
258
+ - [ ] **Response caching**
259
+ - Cache identical predictions
260
+ - Reduce computation for repeated requests
261
+
262
+ ### Database
263
+
264
+ - [ ] **Job cleanup**
265
+ - Scheduled task to delete old jobs
266
+ - Configurable retention period
267
+
268
+ - [ ] **Indexes**
269
+ - Add indexes for common queries
270
+ - Optimize job lookup by ID
271
+
272
+ ### Logging
273
+
274
+ - [ ] **Structured logging**
275
+ - JSON format for production
276
+ - Request/response logging
277
+ - Error tracking
278
+
279
+ - [ ] **Monitoring**
280
+ - Request latency metrics
281
+ - Error rate tracking
282
+ - Model prediction time
283
+
284
+ ---
285
+
286
+ ## 5. Deployment
287
+
288
+ ### Docker Configuration
289
+
290
+ - [ ] **Dockerfile**
291
+ ```dockerfile
292
+ FROM python:3.10-slim
293
+ # Install ViennaRNA
294
+ # Copy application
295
+ # Install dependencies
296
+ # Run with gunicorn
297
+ ```
298
+
299
+ - [ ] **docker-compose.yml**
300
+ - Web service
301
+ - Volume for database
302
+ - Environment variables
303
+
304
+ - [ ] **.dockerignore**
305
+ - Exclude venv, __pycache__, .git
306
+
307
+ ### Production Server
308
+
309
+ - [ ] **Gunicorn configuration**
310
+ - Multiple workers
311
+ - Timeout settings
312
+ - Logging
313
+
314
+ - [ ] **Nginx reverse proxy**
315
+ - SSL termination
316
+ - Static file serving
317
+ - Rate limiting
318
+
319
+ - [ ] **SSL/HTTPS**
320
+ - Let's Encrypt certificate
321
+ - Auto-renewal
322
+
323
+ ### Environment Management
324
+
325
+ - [ ] **Environment variables**
326
+ - Database path
327
+ - Debug mode
328
+ - Secret key
329
+ - SMTP settings
330
+
331
+ - [ ] **.env.example**
332
+ - Document all variables
333
+ - Provide defaults
334
+
335
+ ### Cloud Deployment Options
336
+
337
+ - [ ] **Option A: VPS (DigitalOcean, Linode)**
338
+ - Full control
339
+ - Manual setup required
340
+
341
+ - [ ] **Option B: Platform as a Service**
342
+ - Railway, Render, Fly.io
343
+ - Easier deployment
344
+ - May have cold start issues
345
+
346
+ - [ ] **Option C: Container service**
347
+ - Google Cloud Run
348
+ - AWS Fargate
349
+ - Auto-scaling
350
+
351
+ ---
352
+
353
+ ## 6. NAR Web Server Compliance
354
+
355
+ For publication in Nucleic Acids Research Web Server issue:
356
+
357
+ ### Required Pages
358
+
359
+ - [ ] **Privacy policy**
360
+ - What data is collected
361
+ - How long it's stored
362
+ - Who has access
363
+
364
+ - [ ] **Terms of service**
365
+ - Usage restrictions
366
+ - Disclaimer
367
+ - License
368
+
369
+ - [ ] **Contact information**
370
+ - Email for support
371
+ - Issue reporting
372
+
373
+ - [ ] **Funding acknowledgments**
374
+ - Grant numbers
375
+ - Institution
376
+
377
+ ### Accessibility (WCAG 2.1)
378
+
379
+ - [ ] **Keyboard navigation**
380
+ - [ ] **Screen reader support**
381
+ - [ ] **Color contrast ratios**
382
+ - [ ] **Alt text for images**
383
+ - [ ] **Focus indicators**
384
+
385
+ ### Mobile Support
386
+
387
+ - [ ] **Responsive layout**
388
+ - [ ] **Touch-friendly targets**
389
+ - [ ] **Readable font sizes**
390
+
391
+ ### Reliability
392
+
393
+ - [ ] **99.9% uptime target**
394
+ - [ ] **Monitoring and alerting**
395
+ - [ ] **Backup strategy**
396
+ - [ ] **Disaster recovery plan**
397
+
398
+ ---
399
+
400
+ ## 7. Testing
401
+
402
+ ### Unit Tests
403
+
404
+ - [ ] **API endpoint tests**
405
+ - Test all routes
406
+ - Test error cases
407
+ - Test validation
408
+
409
+ - [ ] **Model wrapper tests**
410
+ - Test prediction pipeline
411
+ - Test input preparation
412
+ - Test output format
413
+
414
+ - [ ] **Database tests**
415
+ - Test job creation
416
+ - Test job retrieval
417
+ - Test job deletion
418
+
419
+ ### Integration Tests
420
+
421
+ - [ ] **End-to-end prediction flow**
422
+ - [ ] **Batch processing**
423
+ - [ ] **Export functionality**
424
+
425
+ ### Load Testing
426
+
427
+ - [ ] **Concurrent requests**
428
+ - [ ] **Response time under load**
429
+ - [ ] **Memory usage**
430
+
431
+ ---
432
+
433
+ ## Quick Start for Next Session
434
+
435
+ To continue development:
436
+
437
+ ```bash
438
+ # 1. Activate environment
439
+ source venv310/bin/activate
440
+
441
+ # 2. Start server
442
+ python -m uvicorn webapp.app.main:app --reload --port 8000
443
+
444
+ # 3. View app
445
+ open http://localhost:8000
446
+ ```
447
+
448
+ ## Priority Order
449
+
450
+ 1. **UI/UX + Content** - Make it look professional and informative
451
+ 2. **Templates** - Move HTML out of Python code
452
+ 3. **Batch upload** - Key feature for usability
453
+ 4. **Docker** - For deployment
454
+ 5. **Testing** - For reliability
455
+ 6. **NAR compliance** - For publication
webapp/app/api/routes.py CHANGED
@@ -4,8 +4,10 @@ import uuid
4
  import json
5
  from datetime import datetime, timedelta
6
  from typing import Optional
7
- from fastapi import APIRouter, Depends, HTTPException, Query
 
8
  from sqlalchemy.orm import Session
 
9
 
10
  from webapp.app.database import get_db
11
  from webapp.app.models.job import Job
@@ -40,7 +42,7 @@ async def health_check(db: Session = Depends(get_db)):
40
 
41
  db_connected = False
42
  try:
43
- db.execute("SELECT 1")
44
  db_connected = True
45
  except Exception:
46
  pass
@@ -319,7 +321,7 @@ async def get_example_sequences():
319
  @router.get("/export/{job_id}/{format}", tags=["export"])
320
  async def export_results(
321
  job_id: str,
322
- format: str = Query(..., regex="^(csv|json|tsv)$"),
323
  db: Session = Depends(get_db),
324
  ):
325
  """
@@ -335,7 +337,12 @@ async def export_results(
335
  raise HTTPException(status_code=400, detail="Job not yet complete")
336
 
337
  if format == "json":
338
- return job.to_dict()
 
 
 
 
 
339
 
340
  elif format in ("csv", "tsv"):
341
  delimiter = "," if format == "csv" else "\t"
@@ -367,9 +374,11 @@ async def export_results(
367
  ]
368
  content = delimiter.join(header) + "\n" + delimiter.join(row)
369
 
370
- return {
371
- "content": content,
372
- "filename": f"result_{job_id}.{format}",
373
- }
 
 
374
 
375
  raise HTTPException(status_code=400, detail=f"Unsupported format: {format}")
 
4
  import json
5
  from datetime import datetime, timedelta
6
  from typing import Optional
7
+ from fastapi import APIRouter, Depends, HTTPException, Query, Path
8
+ from fastapi.responses import Response
9
  from sqlalchemy.orm import Session
10
+ from sqlalchemy import text
11
 
12
  from webapp.app.database import get_db
13
  from webapp.app.models.job import Job
 
42
 
43
  db_connected = False
44
  try:
45
+ db.execute(text("SELECT 1"))
46
  db_connected = True
47
  except Exception:
48
  pass
 
321
  @router.get("/export/{job_id}/{format}", tags=["export"])
322
  async def export_results(
323
  job_id: str,
324
+ format: str = Path(..., pattern="^(csv|json|tsv)$"),
325
  db: Session = Depends(get_db),
326
  ):
327
  """
 
337
  raise HTTPException(status_code=400, detail="Job not yet complete")
338
 
339
  if format == "json":
340
+ content = json.dumps(job.to_dict(), indent=2)
341
+ return Response(
342
+ content=content,
343
+ media_type="application/json",
344
+ headers={"Content-Disposition": f'attachment; filename="result_{job_id}.json"'}
345
+ )
346
 
347
  elif format in ("csv", "tsv"):
348
  delimiter = "," if format == "csv" else "\t"
 
374
  ]
375
  content = delimiter.join(header) + "\n" + delimiter.join(row)
376
 
377
+ media_type = "text/csv" if format == "csv" else "text/tab-separated-values"
378
+ return Response(
379
+ content=content,
380
+ media_type=media_type,
381
+ headers={"Content-Disposition": f'attachment; filename="result_{job_id}.{format}"'}
382
+ )
383
 
384
  raise HTTPException(status_code=400, detail=f"Unsupported format: {format}")
webapp/app/config.py CHANGED
@@ -13,14 +13,17 @@ class Settings(BaseSettings):
13
  app_version: str = "1.0.0"
14
  debug: bool = False
15
 
16
- # Paths
17
- project_root: Path = Path(__file__).parent.parent.parent.parent
 
 
18
  model_path: Path = project_root / "output" / "custom_adjacency_regularizer_20210731_124_step3.h5"
19
  data_path: Path = project_root / "data"
20
  database_path: Path = Path(__file__).parent.parent / "splicing.db"
21
 
22
- # Database
23
- database_url: str = f"sqlite:///{database_path}"
 
24
 
25
  # Job settings
26
  job_retention_days: int = 30
 
13
  app_version: str = "1.0.0"
14
  debug: bool = False
15
 
16
+ # Paths - computed at class definition time
17
+ # __file__ = webapp/app/config.py
18
+ # parent.parent.parent = interpretable-splicing-model/
19
+ project_root: Path = Path(__file__).parent.parent.parent
20
  model_path: Path = project_root / "output" / "custom_adjacency_regularizer_20210731_124_step3.h5"
21
  data_path: Path = project_root / "data"
22
  database_path: Path = Path(__file__).parent.parent / "splicing.db"
23
 
24
+ @property
25
+ def database_url(self) -> str:
26
+ return f"sqlite:///{self.database_path}"
27
 
28
  # Job settings
29
  job_retention_days: int = 30
webapp/app/services/predictor.py CHANGED
@@ -6,26 +6,19 @@ import tensorflow as tf
6
  from typing import List, Tuple, Optional, Dict, Any
7
  from pathlib import Path
8
  import logging
 
9
 
10
  from webapp.app.config import settings
11
 
12
  # Set up logging
13
  logger = logging.getLogger(__name__)
14
 
15
- # Import custom layers from model_training
16
- import sys
17
- sys.path.insert(0, str(settings.project_root))
18
- from model_training.model import (
19
- binary_KL,
20
- Selector,
21
- ResidualTuner,
22
- SumDiff,
23
- RegularizedBiasLayer,
24
- MultiRegularizer,
25
- pos_reg,
26
- adj_reg_fo,
27
- adj_reg_so,
28
- )
29
 
30
 
31
  class SplicingPredictor:
@@ -49,23 +42,13 @@ class SplicingPredictor:
49
  """Load the pre-trained TensorFlow model."""
50
  logger.info(f"Loading model from {settings.model_path}")
51
 
52
- custom_objects = {
53
- "binary_KL": binary_KL,
54
- "Selector": Selector,
55
- "ResidualTuner": ResidualTuner,
56
- "SumDiff": SumDiff,
57
- "RegularizedBiasLayer": RegularizedBiasLayer,
58
- "MultiRegularizer": MultiRegularizer,
59
- "pos_reg": pos_reg,
60
- "adj_reg_fo": adj_reg_fo,
61
- "adj_reg_so": adj_reg_so,
62
- }
63
-
64
- self._model = tf.keras.models.load_model(
65
- str(settings.model_path),
66
- custom_objects=custom_objects,
67
- )
68
- logger.info("Model loaded successfully")
69
 
70
  @property
71
  def model(self) -> tf.keras.Model:
 
6
  from typing import List, Tuple, Optional, Dict, Any
7
  from pathlib import Path
8
  import logging
9
+ import sys
10
 
11
  from webapp.app.config import settings
12
 
13
  # Set up logging
14
  logger = logging.getLogger(__name__)
15
 
16
+ # Add figures directory to path - this auto-registers custom layers
17
+ # via @register_keras_serializable decorators when quad_model is imported
18
+ sys.path.insert(0, str(settings.project_root / 'figures'))
19
+ from quad_model import * # noqa: E402, F401, F403
20
+
21
+ from tensorflow.keras.models import load_model
 
 
 
 
 
 
 
 
22
 
23
 
24
  class SplicingPredictor:
 
42
  """Load the pre-trained TensorFlow model."""
43
  logger.info(f"Loading model from {settings.model_path}")
44
 
45
+ try:
46
+ # Simple load - custom layers already registered via quad_model import
47
+ self._model = load_model(str(settings.model_path))
48
+ logger.info("Model loaded successfully")
49
+ except Exception as e:
50
+ logger.error(f"Failed to load model: {e}")
51
+ raise
 
 
 
 
 
 
 
 
 
 
52
 
53
  @property
54
  def model(self) -> tf.keras.Model:
webapp/requirements.txt CHANGED
@@ -8,10 +8,12 @@ sqlalchemy>=2.0.0
8
  aiosqlite>=0.19.0
9
 
10
  # Model & ML
11
- tensorflow>=2.15.0
12
- numpy>=1.26.0
13
  joblib>=1.3.0
14
  scikit-learn>=1.4.0
 
 
15
 
16
  # Visualization
17
  plotly>=5.18.0
 
8
  aiosqlite>=0.19.0
9
 
10
  # Model & ML
11
+ tensorflow==2.15.0 # Pin to 2.15 (last Keras 2 version) for model compatibility
12
+ numpy>=1.26.0,<2.0
13
  joblib>=1.3.0
14
  scikit-learn>=1.4.0
15
+ tqdm # Required by figutils
16
+ scipy # Required by figutils
17
 
18
  # Visualization
19
  plotly>=5.18.0