Malware Detection Model (Enhanced Baseline)

🧠 Overview

This repository contains a machine learning-based malware detection model designed to classify files as benign or malicious based on extracted numerical features.

The current version focuses on validating the end-to-end pipeline, including:

Feature preprocessing
Model training
Serialization
Deployment via Hugging Face Hub

⚙️ Model Architecture

Algorithm: Random Forest Classifier
Number of trees: 300
Max depth: 15
Input features: 50 engineered features

The model is optimized for:

Fast inference
Robustness to noisy inputs
Scalability for larger datasets

📊 Features Used

The model operates on numerical features derived from file characteristics, such as:

File entropy (randomness of bytes)
File size
Section count
Import/export table size
Byte distribution statistics
Header metadata patterns

Note: Current version uses simulated data to validate architecture. Integration with real-world datasets (e.g., EMBER) is planned.

🧪 Usage

import joblib
import numpy as np

bundle = joblib.load("model.pkl")

model = bundle["model"]
scaler = bundle["scaler"]

sample = np.random.rand(1, 50)
sample_scaled = scaler.transform(sample)

prediction = model.predict(sample_scaled)

print("Malicious" if prediction[0] == 1 else "Benign")

Downloads last month: -; Downloads are not tracked for this model. How to track