VParka's picture
Upload README.md via DNA Console (Portable Version)
5e3c446 verified
|
raw
history blame
2.22 kB
metadata
tags:
  - biology
  - genomics
  - classification
  - sklearn
library_name: sklearn

BioGuard DNA Classifier Ensemble (Portable v1.1)

This repository contains a dual-model ensemble for DNA sequence analysis and virus classification, optimized for portability and zero-dependency loading.

Version 1.1 Update: This version has been refactored to decouple the models from custom feature extraction classes. It uses a raw scikit-learn format for maximum compatibility.

🧬 Models Included

This repository hosts two distinct models specialized for different aspects of genomic analysis:

1. GenetiForest (RandomForestClassifier)

  • File: dna_classifier.joblib
  • Purpose: General-purpose synthetic vs. biological sequence classification.
  • Architecture: Random Forest (sklearn) with biological feature extraction (k-mers, GC content, etc.).
  • Performance (Test Set):
    • Accuracy: 89.4%
    • F1 Score: 89.4%

2. ViralBoost (GradientBoostingClassifier)

  • File: sequence_model.joblib
  • Purpose: Specific virus type identification (Influenza A, Norovirus, etc.) based on sequence signatures.
  • Architecture: Gradient Boosting (sklearn) trained on real-world viral sequences.
  • Performance (Test Set):
    • Accuracy: 99.4%
    • F1 Score: 99.4%
  • Classes: Other, Influenza A, Chicken anemia virus, Norovirus, Influenza B

πŸš€ Usage

Since these models use biological feature extraction, we provide a standalone inference.py script for easy usage.

  1. Download all files (.joblib and inference.py).
  2. Use the inference.py script:
from inference import predict_dna

sequence = "ATGCTAGCTAGCTAG..."
results = predict_dna(sequence)

print(f"Genetic Type: {results['classification']}")
print(f"Virus Identity: {results['virus_identity']}")

Alternatively, you can load components manually:

import joblib
classifier = joblib.load("dna_classifier.joblib")
scaler = joblib.load("scaler_rf.joblib")
# (Refer to inference.py for Feature Extraction logic)

πŸ“Š Training Meta

  • Generated By: DNA Governance Console (vparka)
  • Framework: scikit-learn