--- tags: - biology - genomics - classification - sklearn library_name: sklearn --- # BioGuard DNA Classifier Ensemble (Portable v1.1) This repository contains a dual-model ensemble for DNA sequence analysis and virus classification, optimized for **portability and zero-dependency loading**. > [!NOTE] > **Version 1.1 Update**: This version has been refactored to decouple the models from custom feature extraction classes. It uses a raw scikit-learn format for maximum compatibility. ## 🧬 Models Included This repository hosts two distinct models specialized for different aspects of genomic analysis: ### 1. **GenetiForest** (RandomForestClassifier) * **File**: `dna_classifier.joblib` * **Purpose**: General-purpose synthetic vs. biological sequence classification. * **Architecture**: Random Forest (sklearn) with biological feature extraction (k-mers, GC content, etc.). * **Performance (Test Set)**: * **Accuracy**: 89.4% * **F1 Score**: 89.4% ### 2. **ViralBoost** (GradientBoostingClassifier) * **File**: `sequence_model.joblib` * **Purpose**: Specific virus type identification (Influenza A, Norovirus, etc.) based on sequence signatures. * **Architecture**: Gradient Boosting (sklearn) trained on real-world viral sequences. * **Performance (Test Set)**: * **Accuracy**: 99.4% * **F1 Score**: 99.4% * **Classes**: Other, Influenza A, Chicken anemia virus, Norovirus, Influenza B ## 🚀 Usage Since these models use biological feature extraction, we provide a standalone `inference.py` script for easy usage. 1. Download all files (`.joblib` and `inference.py`). 2. Use the `inference.py` script: ```python from inference import predict_dna sequence = "ATGCTAGCTAGCTAG..." results = predict_dna(sequence) print(f"Genetic Type: {results['classification']}") print(f"Virus Identity: {results['virus_identity']}") ``` Alternatively, you can load components manually: ```python import joblib classifier = joblib.load("dna_classifier.joblib") scaler = joblib.load("scaler_rf.joblib") # (Refer to inference.py for Feature Extraction logic) ``` ## 📊 Training Meta * **Generated By**: DNA Governance Console (vparka) * **Framework**: scikit-learn