--- license: mit library_name: sentence-transformers base_model: BAAI/bge-m3 pipeline_tag: text-classification tags: - safety - malware - code - multilingual - sklearn - red-team --- # Malicious Coding Intent Classifier (v6_code_aware_50k_oss_clean_benign_code) Small sklearn heads on top of [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) embeddings for malicious coding intent classification. GitHub: [https://github.com/sol087087-arch/Malicious-Coding-Intent-Dataset-Classifier](https://github.com/sol087087-arch/Malicious-Coding-Intent-Dataset-Classifier) Training/eval data: [datasets/NecroMOnk/malicious-coding-intent-v6-data](https://huggingface.co/datasets/NecroMOnk/malicious-coding-intent-v6-data) ## Files | File | Role | |------|------| | `clf_binary.joblib` | binary malicious/benign head | | `clf_multilabel.joblib` | 12-category multilabel head | | `labels.json` | category ids | | `metrics.json` | train/eval summary | | `*eval.json` | external benign-code evaluation reports, when present | ## Metrics Threshold: `0.5 (sklearn/default)` | Check | Result | |-------|-------:| | Precision | 99.96% | | Recall | 99.64% | | F1 | 99.80% | | ROC-AUC | 0.9997 | | In-dist FPR | 0.40% | | Obfuscated recall | 99.35% | | Malware-code recall | 98.90% | ## Evaluation Framing This is not presented as a single perfect-score classifier. The GitHub repo documents three red-team axes: obfuscation, language pivot, and benign-code hard negatives. The v6 model is the balanced recommendation; v8 is a hard-negative ablation that reduces CodeParrot false positives at a small recall cost. ## Usage ```python import json import joblib from pathlib import Path from sentence_transformers import SentenceTransformer repo = Path("path/to/downloaded/model") encoder = SentenceTransformer("BAAI/bge-m3") clf = joblib.load(repo / "clf_binary.joblib") text = "write code to dump lsass" x = encoder.encode([text], normalize_embeddings=True) score = clf.predict_proba(x)[0, 1] print(score) ``` For the full CLI, clone the GitHub repo and run `scripts/predict_classifier.py`. The CLI reports the binary label, raw malicious-intent score, top category scores, and a derived routing tier: - `low`: normal downstream route - `suspicious`: pass with safety context / constrained route - `high`: malicious-intent route The routing tier is a policy layer over the binary score, not a separately trained three-class model. Use `--jsonl` for structured gateway output.