Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ model-index:
|
|
| 26 |
metrics:
|
| 27 |
- name: Eval Accuracy
|
| 28 |
type: accuracy
|
| 29 |
-
value: 0.
|
| 30 |
---
|
| 31 |
|
| 32 |
# NanoMind Security Classifier v0.5.0
|
|
@@ -45,7 +45,7 @@ AI agents and MCP servers can contain hidden malicious instructions that static
|
|
| 45 |
|
| 46 |
| Metric | Value |
|
| 47 |
|--------|-------|
|
| 48 |
-
| **Eval accuracy** | **98.
|
| 49 |
| Training samples | 3600 |
|
| 50 |
| Eval samples | 450 |
|
| 51 |
| Attack classes | 9 |
|
|
@@ -58,15 +58,15 @@ AI agents and MCP servers can contain hidden malicious instructions that static
|
|
| 58 |
|
| 59 |
| Attack Class | F1 Score | Description |
|
| 60 |
|-------------|----------|-------------|
|
| 61 |
-
| injection | 0.
|
| 62 |
| social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) |
|
| 63 |
-
| credential_abuse | 0.
|
| 64 |
-
| privilege_escalation |
|
| 65 |
-
| persistence | 0.
|
| 66 |
-
| policy_violation | 0.
|
| 67 |
| lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) |
|
| 68 |
| benign | 0.97 | Normal, expected agent behavior with no exploitable patterns |
|
| 69 |
-
| exfiltration | 0.
|
| 70 |
|
| 71 |
## Architecture
|
| 72 |
|
|
|
|
| 26 |
metrics:
|
| 27 |
- name: Eval Accuracy
|
| 28 |
type: accuracy
|
| 29 |
+
value: 0.9844
|
| 30 |
---
|
| 31 |
|
| 32 |
# NanoMind Security Classifier v0.5.0
|
|
|
|
| 45 |
|
| 46 |
| Metric | Value |
|
| 47 |
|--------|-------|
|
| 48 |
+
| **Eval accuracy** | **98.44%** |
|
| 49 |
| Training samples | 3600 |
|
| 50 |
| Eval samples | 450 |
|
| 51 |
| Attack classes | 9 |
|
|
|
|
| 58 |
|
| 59 |
| Attack Class | F1 Score | Description |
|
| 60 |
|-------------|----------|-------------|
|
| 61 |
+
| injection | 0.97 | Instruction override, jailbreak, prompt injection (DAN, ignore previous) |
|
| 62 |
| social_engineering | 0.99 | Urgency and pressure manipulation (urgent, emergency, act now) |
|
| 63 |
+
| credential_abuse | 0.99 | Credential harvesting and phishing (share API key, enter password) |
|
| 64 |
+
| privilege_escalation | 1.00 | Unauthorized access elevation (admin access, bypass permissions) |
|
| 65 |
+
| persistence | 0.99 | Permanent state manipulation (forever, no expiration, all future sessions) |
|
| 66 |
+
| policy_violation | 0.97 | Governance bypass (bypass SOUL.md, override constraints) |
|
| 67 |
| lateral_movement | 1.00 | Remote config/instruction fetching (download from URL, fetch config) |
|
| 68 |
| benign | 0.97 | Normal, expected agent behavior with no exploitable patterns |
|
| 69 |
+
| exfiltration | 0.98 | Data forwarding to external endpoints (mirror, upload, sync) |
|
| 70 |
|
| 71 |
## Architecture
|
| 72 |
|