YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
MutationPredictor-CNN v4
Overview
MutationPredictor-CNN v4 is a sequence-based convolutional neural network designed to predict pathogenicity of genomic variants with splice-awareness.
The model uses:
- 401 bp genomic window (GRCh38)
- One-hot encoded forward and reverse complement sequence
- Positional encoding
- Canonical splice motif features (GT/AG)
- Explicit mutation type encoding (12 SNV classes)
- Auxiliary region and splice feature embeddings
This model is trained on ClinVar-derived labeled splice-impact variants.
Architecture
Input Features:
- 11-channel sequence encoding
- 12-dim mutation one-hot encoding
- 2-dim region embedding
- 3-dim splice embedding
Model:
Conv1d(11 β 64, kernel=7)
Conv1d(64 β 128, kernel=5)
Conv1d(128 β 256, kernel=3)
AdaptiveAvgPool1d(1)
Fully connected layers:
- 312 β 128
- 128 β 64
- 64 β 1 (logit output)
Activation: ReLU
Regularization: Dropout(0.3)
Loss: BCEWithLogitsLoss
Training Details
- Genome build: GRCh38
- Window size: 401 bp
- Optimizer: Adam
- Learning rate: 0.001
- Epochs: 30
- Batch size: 256
Best model selected by highest training AUC.
Evaluation
Internal (ClinVar-derived)
AUC β 0.97
External Benchmark β SpliceAI Comparison (1000 variants)
AUC β 0.91
Strict Fair Benchmark (134 variants not seen in training context)
AUC β 0.54
Fair Benchmark by Consequence
| Variant Type | Count | AUC |
|---|---|---|
| Splice Donor | 37 | 0.53 |
| Splice Acceptor | 34 | 0.51 |
| Other Consequences | 63 | 0.61 |
Known Limitations
- Trained primarily on ClinVar-labeled variants
- Limited indel generalization
- 401 bp receptive field (short-range context)
- Not validated for clinical decision-making
- Research-use only
Intended Use
This model is intended as:
- Research tool
- Variant prioritization support
- Experimental splice-aware classifier
Not intended for direct clinical decision-making.
Future Directions
- Expand receptive field (>2kb)
- Dilated convolutions
- Transformer-based architecture
- Multi-task pathogenicity + splice scoring
- Larger independent external validation
License
Research use only.
Author
Nilesh Hanotia