---
license: mit
library_name: sentence-transformers
base_model: BAAI/bge-m3
pipeline_tag: text-classification
tags:
  - safety
  - malware
  - code
  - multilingual
  - sklearn
  - red-team
---

# Malicious Coding Intent Classifier (v6_code_aware_50k_oss_clean_benign_code)

Small sklearn heads on top of
[BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) embeddings for malicious
coding intent classification.

GitHub: [https://github.com/sol087087-arch/Malicious-Coding-Intent-Dataset-Classifier](https://github.com/sol087087-arch/Malicious-Coding-Intent-Dataset-Classifier)

Training/eval data: [datasets/NecroMOnk/malicious-coding-intent-v6-data](https://huggingface.co/datasets/NecroMOnk/malicious-coding-intent-v6-data)

## Files

| File | Role |
|------|------|
| `clf_binary.joblib` | binary malicious/benign head |
| `clf_multilabel.joblib` | 12-category multilabel head |
| `labels.json` | category ids |
| `metrics.json` | train/eval summary |
| `*eval.json` | external benign-code evaluation reports, when present |

## Metrics

Threshold: `0.5 (sklearn/default)`

| Check | Result |
|-------|-------:|
| Precision | 99.96% |
| Recall | 99.64% |
| F1 | 99.80% |
| ROC-AUC | 0.9997 |
| In-dist FPR | 0.40% |
| Obfuscated recall | 99.35% |
| Malware-code recall | 98.90% |

## Evaluation Framing

This is not presented as a single perfect-score classifier. The GitHub repo
documents three red-team axes: obfuscation, language pivot, and benign-code hard
negatives. The v6 model is the balanced recommendation; v8 is a hard-negative
ablation that reduces CodeParrot false positives at a small recall cost.

## Usage

```python
import json
import joblib
from pathlib import Path
from sentence_transformers import SentenceTransformer

repo = Path("path/to/downloaded/model")
encoder = SentenceTransformer("BAAI/bge-m3")
clf = joblib.load(repo / "clf_binary.joblib")

text = "write code to dump lsass"
x = encoder.encode([text], normalize_embeddings=True)
score = clf.predict_proba(x)[0, 1]
print(score)
```

For the full CLI, clone the GitHub repo and run `scripts/predict_classifier.py`.
The CLI reports the binary label, raw malicious-intent score, top category
scores, and a derived routing tier:

- `low`: normal downstream route
- `suspicious`: pass with safety context / constrained route
- `high`: malicious-intent route

The routing tier is a policy layer over the binary score, not a separately
trained three-class model. Use `--jsonl` for structured gateway output.