Malware Detection Model (Enhanced Baseline)

🧠 Overview

This repository contains a machine learning-based malware detection model designed to classify files as benign or malicious based on extracted numerical features.

The current version focuses on validating the end-to-end pipeline, including:

  • Feature preprocessing
  • Model training
  • Serialization
  • Deployment via Hugging Face Hub

βš™οΈ Model Architecture

  • Algorithm: Random Forest Classifier
  • Number of trees: 300
  • Max depth: 15
  • Input features: 50 engineered features

The model is optimized for:

  • Fast inference
  • Robustness to noisy inputs
  • Scalability for larger datasets

πŸ“Š Features Used

The model operates on numerical features derived from file characteristics, such as:

  • File entropy (randomness of bytes)
  • File size
  • Section count
  • Import/export table size
  • Byte distribution statistics
  • Header metadata patterns

Note: Current version uses simulated data to validate architecture. Integration with real-world datasets (e.g., EMBER) is planned.


πŸ§ͺ Usage

import joblib
import numpy as np

bundle = joblib.load("model.pkl")

model = bundle["model"]
scaler = bundle["scaler"]

sample = np.random.rand(1, 50)
sample_scaled = scaler.transform(sample)

prediction = model.predict(sample_scaled)

print("Malicious" if prediction[0] == 1 else "Benign")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support