Update model card: v11 metrics + Further Reading section with blog post link
Browse files
README.md
CHANGED
|
@@ -29,19 +29,19 @@ model-index:
|
|
| 29 |
split: held_out_eval
|
| 30 |
metrics:
|
| 31 |
- type: f1
|
| 32 |
-
value: 0.
|
| 33 |
name: Macro F1
|
| 34 |
- type: f1
|
| 35 |
-
value: 0.
|
| 36 |
name: Injection F1
|
| 37 |
- type: f1
|
| 38 |
-
value: 0.
|
| 39 |
name: Benign F1
|
| 40 |
- type: precision
|
| 41 |
-
value: 0.
|
| 42 |
name: Injection Precision
|
| 43 |
- type: recall
|
| 44 |
-
value: 0.
|
| 45 |
name: Injection Recall
|
| 46 |
---
|
| 47 |
|
|
@@ -221,10 +221,25 @@ If you use this model in research, please cite:
|
|
| 221 |
|
| 222 |
---
|
| 223 |
|
| 224 |
-
## Related Resources
|
| 225 |
-
|
| 226 |
- [SkillScan project website](https://skillscan.sh)
|
| 227 |
- [skillscan-security (rules, scanner, CLI)](https://github.com/kurtpayne/skillscan-security)
|
| 228 |
- [Base model: protectai/deberta-v3-base-prompt-injection-v2](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2)
|
| 229 |
- [ProtectAI/rebuff — prompt injection detection research](https://github.com/protectai/rebuff)
|
| 230 |
- [OWASP Top 10 for LLM Applications — LLM01: Prompt Injection](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
split: held_out_eval
|
| 30 |
metrics:
|
| 31 |
- type: f1
|
| 32 |
+
value: 0.926
|
| 33 |
name: Macro F1
|
| 34 |
- type: f1
|
| 35 |
+
value: 0.9007
|
| 36 |
name: Injection F1
|
| 37 |
- type: f1
|
| 38 |
+
value: 0.9513
|
| 39 |
name: Benign F1
|
| 40 |
- type: precision
|
| 41 |
+
value: 0.8608
|
| 42 |
name: Injection Precision
|
| 43 |
- type: recall
|
| 44 |
+
value: 0.9444
|
| 45 |
name: Injection Recall
|
| 46 |
---
|
| 47 |
|
|
|
|
| 221 |
|
| 222 |
---
|
| 223 |
|
| 224 |
+
### Related Resources
|
|
|
|
| 225 |
- [SkillScan project website](https://skillscan.sh)
|
| 226 |
- [skillscan-security (rules, scanner, CLI)](https://github.com/kurtpayne/skillscan-security)
|
| 227 |
- [Base model: protectai/deberta-v3-base-prompt-injection-v2](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2)
|
| 228 |
- [ProtectAI/rebuff — prompt injection detection research](https://github.com/protectai/rebuff)
|
| 229 |
- [OWASP Top 10 for LLM Applications — LLM01: Prompt Injection](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
|
| 230 |
+
|
| 231 |
+
---
|
| 232 |
+
|
| 233 |
+
## Further Reading
|
| 234 |
+
|
| 235 |
+
**[What Are AI Agent Skills, and Why Do They Need a Security Model?](https://skillscan.sh/blog/skills-security-model)**
|
| 236 |
+
|
| 237 |
+
A technical explainer for security engineers and enterprise architects covering:
|
| 238 |
+
|
| 239 |
+
- What skills are — runbooks for agentic consumption, not traditional code, but often shipping with code; why the distinction matters for security
|
| 240 |
+
- Five real attack archetypes with sanitized examples: README-driven dropper (AMOS/NemoClaw pattern), telemetry exfiltration disguised as analytics, indirect injection via trusted data channels, hallucination squatting, and goal substitution via jailbreak framing
|
| 241 |
+
- How static analysis catches each archetype before runtime — with the actual rule or ML finding shown for each example
|
| 242 |
+
- Where this model fits in the broader security stack: what it covers, what requires dynamic analysis (`skillscan-trace`), and what requires infrastructure controls (egress filtering, DNS-layer blocking)
|
| 243 |
+
- Recommended enterprise posture: CI/CD gate setup, ML detection for high-risk skill directories, pre-production trace review, and infrastructure backstop
|
| 244 |
+
|
| 245 |
+
The blog post uses the same five archetypes that are represented in this model's held-out eval set, making it a useful companion for understanding what the model is trained to detect and where its boundaries are.
|