opena2a
/

nanomind-security-classifier

Text Classification

threat-detection

Eval Results (legacy)

Model card Files Files and versions

ecolibria commited on 17 days ago

Commit

addca6a

·

verified ·

1 Parent(s): ca982aa

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ model-index:
         metrics:
           - name: Eval Accuracy
             type: accuracy
-            value: 0.9822
 ---
 # NanoMind Security Classifier v0.5.0
@@ -45,7 +45,7 @@ AI agents and MCP servers can contain hidden malicious instructions that static
 | Metric | Value |
 |--------|-------|
-| **Eval accuracy** | **98.22%** |
 | Training samples | 3600 |
 | Eval samples | 450 |
 | Attack classes | 9 |
@@ -58,15 +58,15 @@ AI agents and MCP servers can contain hidden malicious instructions that static
 | Attack Class | F1 Score | Description |
 |-------------|----------|-------------|
-| injection | 0.98 | Instruction override, jailbreak, prompt injection (DAN, ignore previous) |
 | social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) |
-| credential_abuse | 0.98 | Credential harvesting and phishing (share API key, enter password) |
-| privilege_escalation | 0.99 | Unauthorized access elevation (admin access, bypass permissions) |
-| persistence | 0.98 | Permanent state manipulation (forever, no expiration, all future sessions) |
-| policy_violation | 0.96 | Governance bypass (bypass SOUL.md, override constraints) |
 | lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) |
 | benign | 0.97 | Normal, expected agent behavior with no exploitable patterns |
-| exfiltration | 0.97 | Data forwarding to external endpoints (mirror, upload, sync) |
 ## Architecture

         metrics:
           - name: Eval Accuracy
             type: accuracy
+            value: 0.9844
 ---
 # NanoMind Security Classifier v0.5.0
 | Metric | Value |
 |--------|-------|
+| **Eval accuracy** | **98.44%** |
 | Training samples | 3600 |
 | Eval samples | 450 |
 | Attack classes | 9 |
 | Attack Class | F1 Score | Description |
 |-------------|----------|-------------|
+| injection | 0.97 | Instruction override, jailbreak, prompt injection (DAN, ignore previous) |
 | social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) |
+| credential_abuse | 0.99 | Credential harvesting and phishing (share API key, enter password) |
+| privilege_escalation | 1.00 | Unauthorized access elevation (admin access, bypass permissions) |
+| persistence | 0.99 | Permanent state manipulation (forever, no expiration, all future sessions) |
+| policy_violation | 0.97 | Governance bypass (bypass SOUL.md, override constraints) |
 | lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) |
 | benign | 0.97 | Normal, expected agent behavior with no exploitable patterns |
+| exfiltration | 0.98 | Data forwarding to external endpoints (mirror, upload, sync) |
 ## Architecture