NecroMOnk
/

malicious-coding-intent-v6

Text Classification

sentence-transformers

Model card Files Files and versions

malicious-coding-intent-v6 / README.md

NecroMOnk's picture

Update risk routing model card

b12734e 28 days ago

|

History Blame Contribute Delete

2.48 kB

	---
	license: mit
	library_name: sentence-transformers
	base_model: BAAI/bge-m3
	pipeline_tag: text-classification
	tags:
	- safety
	- malware
	- code
	- multilingual
	- sklearn
	- red-team
	---

	# Malicious Coding Intent Classifier (v6_code_aware_50k_oss_clean_benign_code)

	Small sklearn heads on top of
	[BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) embeddings for malicious
	coding intent classification.

	GitHub: [https://github.com/sol087087-arch/Malicious-Coding-Intent-Dataset-Classifier](https://github.com/sol087087-arch/Malicious-Coding-Intent-Dataset-Classifier)

	Training/eval data: [datasets/NecroMOnk/malicious-coding-intent-v6-data](https://huggingface.co/datasets/NecroMOnk/malicious-coding-intent-v6-data)

	## Files

	\| File \| Role \|
	\|------\|------\|
	\| `clf_binary.joblib` \| binary malicious/benign head \|
	\| `clf_multilabel.joblib` \| 12-category multilabel head \|
	\| `labels.json` \| category ids \|
	\| `metrics.json` \| train/eval summary \|
	\| `*eval.json` \| external benign-code evaluation reports, when present \|

	## Metrics

	Threshold: `0.5 (sklearn/default)`

	\| Check \| Result \|
	\|-------\|-------:\|
	\| Precision \| 99.96% \|
	\| Recall \| 99.64% \|
	\| F1 \| 99.80% \|
	\| ROC-AUC \| 0.9997 \|
	\| In-dist FPR \| 0.40% \|
	\| Obfuscated recall \| 99.35% \|
	\| Malware-code recall \| 98.90% \|

	## Evaluation Framing

	This is not presented as a single perfect-score classifier. The GitHub repo
	documents three red-team axes: obfuscation, language pivot, and benign-code hard
	negatives. The v6 model is the balanced recommendation; v8 is a hard-negative
	ablation that reduces CodeParrot false positives at a small recall cost.

	## Usage

	```python
	import json
	import joblib
	from pathlib import Path
	from sentence_transformers import SentenceTransformer

	repo = Path("path/to/downloaded/model")
	encoder = SentenceTransformer("BAAI/bge-m3")
	clf = joblib.load(repo / "clf_binary.joblib")

	text = "write code to dump lsass"
	x = encoder.encode([text], normalize_embeddings=True)
	score = clf.predict_proba(x)[0, 1]
	print(score)
	```

	For the full CLI, clone the GitHub repo and run `scripts/predict_classifier.py`.
	The CLI reports the binary label, raw malicious-intent score, top category
	scores, and a derived routing tier:

	- `low`: normal downstream route
	- `suspicious`: pass with safety context / constrained route
	- `high`: malicious-intent route

	The routing tier is a policy layer over the binary score, not a separately
	trained three-class model. Use `--jsonl` for structured gateway output.