Upload README.md via DNA Console
Browse files
README.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- biology
|
| 4 |
+
- genomics
|
| 5 |
+
- classification
|
| 6 |
+
- sklearn
|
| 7 |
+
library_name: sklearn
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# BioGuard DNA Classifier Ensemble
|
| 11 |
+
|
| 12 |
+
This repository contains a dual-model ensemble for DNA sequence analysis and virus classification, trained using the **DNA Governance Console**.
|
| 13 |
+
|
| 14 |
+
## 🧬 Models Included
|
| 15 |
+
|
| 16 |
+
This repository hosts two distinct models specialized for different aspects of genomic analysis:
|
| 17 |
+
|
| 18 |
+
### 1. **GenetiForest** (RandomForestClassifier)
|
| 19 |
+
* **File**: `dna_classifier.joblib`
|
| 20 |
+
* **Purpose**: General-purpose synthetic vs. biological sequence classification.
|
| 21 |
+
* **Architecture**: Random Forest (sklearn) with biological feature extraction (k-mers, GC content, etc.).
|
| 22 |
+
* **Performance (Test Set)**:
|
| 23 |
+
* **Accuracy**: 89.4%
|
| 24 |
+
* **F1 Score**: 89.4%
|
| 25 |
+
|
| 26 |
+
### 2. **ViralBoost** (GradientBoostingClassifier)
|
| 27 |
+
* **File**: `sequence_model.joblib`
|
| 28 |
+
* **Purpose**: Specific virus type identification (Influenza A, Norovirus, etc.) based on sequence signatures.
|
| 29 |
+
* **Architecture**: Gradient Boosting (sklearn) trained on real-world viral sequences.
|
| 30 |
+
* **Performance (Test Set)**:
|
| 31 |
+
* **Accuracy**: 99.4%
|
| 32 |
+
* **F1 Score**: 99.4%
|
| 33 |
+
* **Classes**: Other, Influenza A, Chicken anemia virus, Norovirus, Influenza B
|
| 34 |
+
|
| 35 |
+
## 🚀 Usage
|
| 36 |
+
|
| 37 |
+
You can load these models using `joblib` in Python:
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
import joblib
|
| 41 |
+
|
| 42 |
+
# Load GenetiForest
|
| 43 |
+
rf_model = joblib.load("dna_classifier.joblib")
|
| 44 |
+
|
| 45 |
+
# Load ViralBoost
|
| 46 |
+
gb_model = joblib.load("sequence_model.joblib")
|
| 47 |
+
|
| 48 |
+
# Prediction
|
| 49 |
+
# (Requires matching FeatureExtractor - see 'sequence_extractor.joblib')
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## 📊 Training Meta
|
| 53 |
+
* **Generated By**: DNA Governance Console (vparka)
|
| 54 |
+
* **Framework**: scikit-learn
|