nileshhanotia commited on
Commit
3ed4e54
·
verified ·
1 Parent(s): 4a15301

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: pytorch
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - genomics
8
+ - mutation
9
+ - pathogenicity
10
+ - splice
11
+ - explainable-ai
12
+ - biology
13
+ - clinical-ai
14
+ ---
15
+
16
+ # 🧬 MutationPredictorCNN_v2 — Splice-Aware Pathogenicity Predictor
17
+
18
+ ## Model Summary
19
+
20
+ MutationPredictorCNN_v2 is a splice-aware convolutional neural network designed to predict pathogenicity of single nucleotide variants using genomic sequence context and splice-aware features.
21
+
22
+ Supports built-in explainability:
23
+
24
+ • CNN activation heatmap
25
+ • Gradient attribution
26
+ • Counterfactual mutation analysis
27
+ • Feature ablation analysis
28
+ • Splice distance analysis
29
+
30
+ Validation accuracy: 74.8%
31
+
32
+ ---
33
+
34
+ ## Intended Use
35
+
36
+ Research use cases:
37
+
38
+ • Genomic variant interpretation
39
+ • Explainable AI research
40
+ • Variant prioritization
41
+ • Educational and academic research
42
+
43
+ NOT intended for clinical diagnostic use.
44
+
45
+ ---
46
+
47
+ ## Model Architecture
48
+
49
+ CNN-based architecture:
50
+
51
+ Input: 1106 features
52
+ Output: Pathogenicity probability
53
+
54
+ Explainability heads:
55
+
56
+ • Mutation importance
57
+ • Region importance
58
+ • Splice importance
59
+
60
+ ---
61
+
62
+ ## Training Data
63
+
64
+ Source: ClinVar
65
+
66
+ Dataset size:
67
+
68
+ 100,000 variants
69
+ 50,000 pathogenic
70
+ 50,000 benign
71
+
72
+ Sequence window: 99 bp
73
+
74
+ ---
75
+
76
+ ## Performance
77
+
78
+ Validation accuracy:
79
+
80
+ 74.8%
81
+
82
+ Balanced dataset.
83
+
84
+ ---
85
+
86
+ ## Explainability
87
+
88
+ Provides multi-level explainability:
89
+
90
+ • Activation heatmap
91
+ • Mutation rank percentile
92
+ • Gradient attribution map
93
+ • Counterfactual analysis
94
+ • Feature ablation analysis
95
+
96
+ ---
97
+
98
+ ## Limitations
99
+
100
+ Supports only:
101
+
102
+ • Single nucleotide variants
103
+ • 99 bp context window
104
+
105
+ Does not include:
106
+
107
+ • Conservation scores
108
+ • Protein structure
109
+ • Expression context
110
+
111
+ ---
112
+
113
+ ## Disclaimer
114
+
115
+ ⚠ Research use only
116
+ Not a clinical diagnostic tool
117
+
118
+ ---
119
+
120
+ ## Maintainer
121
+
122
+ Nilesh Hanotia