kurtpayne commited on
Commit
b00fee7
·
verified ·
1 Parent(s): 3596652

Update model card: v11 metrics + Further Reading section with blog post link

Browse files
Files changed (1) hide show
  1. README.md +22 -7
README.md CHANGED
@@ -29,19 +29,19 @@ model-index:
29
  split: held_out_eval
30
  metrics:
31
  - type: f1
32
- value: 0.8448
33
  name: Macro F1
34
  - type: f1
35
- value: 0.7857
36
  name: Injection F1
37
  - type: f1
38
- value: 0.9040
39
  name: Benign F1
40
  - type: precision
41
- value: 0.9362
42
  name: Injection Precision
43
  - type: recall
44
- value: 0.6769
45
  name: Injection Recall
46
  ---
47
 
@@ -221,10 +221,25 @@ If you use this model in research, please cite:
221
 
222
  ---
223
 
224
- ## Related Resources
225
-
226
  - [SkillScan project website](https://skillscan.sh)
227
  - [skillscan-security (rules, scanner, CLI)](https://github.com/kurtpayne/skillscan-security)
228
  - [Base model: protectai/deberta-v3-base-prompt-injection-v2](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2)
229
  - [ProtectAI/rebuff — prompt injection detection research](https://github.com/protectai/rebuff)
230
  - [OWASP Top 10 for LLM Applications — LLM01: Prompt Injection](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  split: held_out_eval
30
  metrics:
31
  - type: f1
32
+ value: 0.926
33
  name: Macro F1
34
  - type: f1
35
+ value: 0.9007
36
  name: Injection F1
37
  - type: f1
38
+ value: 0.9513
39
  name: Benign F1
40
  - type: precision
41
+ value: 0.8608
42
  name: Injection Precision
43
  - type: recall
44
+ value: 0.9444
45
  name: Injection Recall
46
  ---
47
 
 
221
 
222
  ---
223
 
224
+ ### Related Resources
 
225
  - [SkillScan project website](https://skillscan.sh)
226
  - [skillscan-security (rules, scanner, CLI)](https://github.com/kurtpayne/skillscan-security)
227
  - [Base model: protectai/deberta-v3-base-prompt-injection-v2](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2)
228
  - [ProtectAI/rebuff — prompt injection detection research](https://github.com/protectai/rebuff)
229
  - [OWASP Top 10 for LLM Applications — LLM01: Prompt Injection](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
230
+
231
+ ---
232
+
233
+ ## Further Reading
234
+
235
+ **[What Are AI Agent Skills, and Why Do They Need a Security Model?](https://skillscan.sh/blog/skills-security-model)**
236
+
237
+ A technical explainer for security engineers and enterprise architects covering:
238
+
239
+ - What skills are — runbooks for agentic consumption, not traditional code, but often shipping with code; why the distinction matters for security
240
+ - Five real attack archetypes with sanitized examples: README-driven dropper (AMOS/NemoClaw pattern), telemetry exfiltration disguised as analytics, indirect injection via trusted data channels, hallucination squatting, and goal substitution via jailbreak framing
241
+ - How static analysis catches each archetype before runtime — with the actual rule or ML finding shown for each example
242
+ - Where this model fits in the broader security stack: what it covers, what requires dynamic analysis (`skillscan-trace`), and what requires infrastructure controls (egress filtering, DNS-layer blocking)
243
+ - Recommended enterprise posture: CI/CD gate setup, ML detection for high-risk skill directories, pre-production trace review, and infrastructure backstop
244
+
245
+ The blog post uses the same five archetypes that are represented in this model's held-out eval set, making it a useful companion for understanding what the model is trained to detect and where its boundaries are.