YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

MutationPredictor-CNN v4

Overview

MutationPredictor-CNN v4 is a sequence-based convolutional neural network designed to predict pathogenicity of genomic variants with splice-awareness.

The model uses:

  • 401 bp genomic window (GRCh38)
  • One-hot encoded forward and reverse complement sequence
  • Positional encoding
  • Canonical splice motif features (GT/AG)
  • Explicit mutation type encoding (12 SNV classes)
  • Auxiliary region and splice feature embeddings

This model is trained on ClinVar-derived labeled splice-impact variants.


Architecture

Input Features:

  • 11-channel sequence encoding
  • 12-dim mutation one-hot encoding
  • 2-dim region embedding
  • 3-dim splice embedding

Model:

Conv1d(11 β†’ 64, kernel=7)
Conv1d(64 β†’ 128, kernel=5)
Conv1d(128 β†’ 256, kernel=3)
AdaptiveAvgPool1d(1)

Fully connected layers:

  • 312 β†’ 128
  • 128 β†’ 64
  • 64 β†’ 1 (logit output)

Activation: ReLU
Regularization: Dropout(0.3)
Loss: BCEWithLogitsLoss


Training Details

  • Genome build: GRCh38
  • Window size: 401 bp
  • Optimizer: Adam
  • Learning rate: 0.001
  • Epochs: 30
  • Batch size: 256

Best model selected by highest training AUC.


Evaluation

Internal (ClinVar-derived)

AUC β‰ˆ 0.97

External Benchmark – SpliceAI Comparison (1000 variants)

AUC β‰ˆ 0.91

Strict Fair Benchmark (134 variants not seen in training context)

AUC β‰ˆ 0.54

Fair Benchmark by Consequence

Variant Type Count AUC
Splice Donor 37 0.53
Splice Acceptor 34 0.51
Other Consequences 63 0.61

Known Limitations

  • Trained primarily on ClinVar-labeled variants
  • Limited indel generalization
  • 401 bp receptive field (short-range context)
  • Not validated for clinical decision-making
  • Research-use only

Intended Use

This model is intended as:

  • Research tool
  • Variant prioritization support
  • Experimental splice-aware classifier

Not intended for direct clinical decision-making.


Future Directions

  • Expand receptive field (>2kb)
  • Dilated convolutions
  • Transformer-based architecture
  • Multi-task pathogenicity + splice scoring
  • Larger independent external validation

License

Research use only.


Author

Nilesh Hanotia

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using nileshhanotia/mutation-predictor-v4 3