File size: 2,219 Bytes
03b1034 5e3c446 03b1034 5e3c446 03b1034 5e3c446 03b1034 5e3c446 03b1034 5e3c446 03b1034 5e3c446 03b1034 5e3c446 03b1034 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
tags:
- biology
- genomics
- classification
- sklearn
library_name: sklearn
---
# BioGuard DNA Classifier Ensemble (Portable v1.1)
This repository contains a dual-model ensemble for DNA sequence analysis and virus classification, optimized for **portability and zero-dependency loading**.
> [!NOTE]
> **Version 1.1 Update**: This version has been refactored to decouple the models from custom feature extraction classes. It uses a raw scikit-learn format for maximum compatibility.
## 🧬 Models Included
This repository hosts two distinct models specialized for different aspects of genomic analysis:
### 1. **GenetiForest** (RandomForestClassifier)
* **File**: `dna_classifier.joblib`
* **Purpose**: General-purpose synthetic vs. biological sequence classification.
* **Architecture**: Random Forest (sklearn) with biological feature extraction (k-mers, GC content, etc.).
* **Performance (Test Set)**:
* **Accuracy**: 89.4%
* **F1 Score**: 89.4%
### 2. **ViralBoost** (GradientBoostingClassifier)
* **File**: `sequence_model.joblib`
* **Purpose**: Specific virus type identification (Influenza A, Norovirus, etc.) based on sequence signatures.
* **Architecture**: Gradient Boosting (sklearn) trained on real-world viral sequences.
* **Performance (Test Set)**:
* **Accuracy**: 99.4%
* **F1 Score**: 99.4%
* **Classes**: Other, Influenza A, Chicken anemia virus, Norovirus, Influenza B
## 🚀 Usage
Since these models use biological feature extraction, we provide a standalone `inference.py` script for easy usage.
1. Download all files (`.joblib` and `inference.py`).
2. Use the `inference.py` script:
```python
from inference import predict_dna
sequence = "ATGCTAGCTAGCTAG..."
results = predict_dna(sequence)
print(f"Genetic Type: {results['classification']}")
print(f"Virus Identity: {results['virus_identity']}")
```
Alternatively, you can load components manually:
```python
import joblib
classifier = joblib.load("dna_classifier.joblib")
scaler = joblib.load("scaler_rf.joblib")
# (Refer to inference.py for Feature Extraction logic)
```
## 📊 Training Meta
* **Generated By**: DNA Governance Console (vparka)
* **Framework**: scikit-learn
|