malware-detector / README.md
Ujjwal12431's picture
Upload README.md with huggingface_hub
33ccbf9 verified
metadata
language: en
license: mit
library_name: scikit-learn
tags:
  - malware-detection
  - cybersecurity
  - machine-learning
  - random-forest
  - classification
pipeline_tag: tabular-classification

Malware Detection Model (Enhanced Baseline)

🧠 Overview

This repository contains a machine learning-based malware detection model designed to classify files as benign or malicious based on extracted numerical features.

The current version focuses on validating the end-to-end pipeline, including:

  • Feature preprocessing
  • Model training
  • Serialization
  • Deployment via Hugging Face Hub

⚙️ Model Architecture

  • Algorithm: Random Forest Classifier
  • Number of trees: 300
  • Max depth: 15
  • Input features: 50 engineered features

The model is optimized for:

  • Fast inference
  • Robustness to noisy inputs
  • Scalability for larger datasets

📊 Features Used

The model operates on numerical features derived from file characteristics, such as:

  • File entropy (randomness of bytes)
  • File size
  • Section count
  • Import/export table size
  • Byte distribution statistics
  • Header metadata patterns

Note: Current version uses simulated data to validate architecture. Integration with real-world datasets (e.g., EMBER) is planned.


🧪 Usage

import joblib
import numpy as np

bundle = joblib.load("model.pkl")

model = bundle["model"]
scaler = bundle["scaler"]

sample = np.random.rand(1, 50)
sample_scaled = scaler.transform(sample)

prediction = model.predict(sample_scaled)

print("Malicious" if prediction[0] == 1 else "Benign")