Commit History

docs: drop 'stated honestly' phrasing from card heading
950afbb
verified

dannyliv commited on

docs: clearer benefit-focused model card, fix library_name metadata to transformers (merged model at repo root)
2af0ae0
verified

dannyliv commited on

Move V3.2 adapter to adapter/ subfolder so root loads the merged model directly
fe5481f
verified

dannyliv commited on

Ship V3.2: GCG-hardened weights, merged model + ONNX rebuild, honest FPR disclosure
2a64cf7
verified

dannyliv commited on

docs(card): add 2026-05-16 project-status note; drop stale ONNX latency figure
cc5be0b
verified

dannyliv commited on

eval(lg3): full comparison table vs LlamaGuard-3-8B
7ac323a
verified

dannyliv commited on

audit phase 3: drop ShieldGemma references; lock LG3-vs-DeBERTa headline framing
5c00e6b
verified

dannyliv commited on

audit-fixes: correct GCG attribution (was DeBERTa, not ModernBERT)
ae2eade
verified

dannyliv commited on

audit-fixes: GCG eval (DA #8) + LlamaGuard-3-8B comparison (DA #1)
21ccfb4
verified

dannyliv commited on

audit-fixes: GCG eval (DA #8) + LlamaGuard-3-8B comparison (DA #1)
0a7475f
verified

dannyliv commited on

audit-fixes 2: disclose benign-input FPR (Dolly-15k n=500); ModernBERT 7.4% FPR @ t=0.5 critical for users
61e6f3b
verified

dannyliv commited on

audit-fixes: canonical t=0.5 headline, drop '#1' star, disclose comparison scope, remove unbenchmarked 18ms latency
d3d21ad
verified

dannyliv commited on

Style: remove em-dashes (CLAUDE.md Part I, also keeps YAML model-index parseable)
f8c9a51
verified

dannyliv commited on

Add per-label classification table (17 heads) + red-team loop status note
461d7d3
verified

dannyliv commited on

Add Problem-statement + model-selection guide + hardware requirements
ff13fd6
verified

dannyliv commited on

Add Methodology section: sample counts, GOAT techniques, autoresearch loop
0ae1b74
verified

dannyliv commited on

Strip training-cost mentions
d736314
verified

dannyliv commited on

Add 'Attack types covered & how it was trained' section with linked sources; remove excluded-model asides
345867a
verified

dannyliv commited on

Add benchmark links + explanations; remove excluded-model asides
b00b77f
verified

dannyliv commited on

Restructure model card with full HF metadata (model-index, datasets, metrics, pipeline_tag, intended-use, limitations, citation)
2cde43e
verified

dannyliv commited on

Extended baseline: 9 ungated classifiers; agent_guard_v1.2 sits 0.004 F1 below best on JBB, but beats on jackhhao and is most balanced
ac03ba3
verified

dannyliv commited on

Comparative benchmark vs 4 ungated PI classifiers; we tie best on JBB held-out
e81c617
verified

dannyliv commited on

v1.2: trained with +6.5k open-source examples; JBB F1 0.645->0.684
f4625c0
verified

dannyliv commited on

Model save
a8d5014
verified

dannyliv commited on

Training in progress, step 6696
fafa24f
verified

dannyliv commited on

Training in progress, step 6000
ec29b8d
verified

dannyliv commited on

Training in progress, step 5000
f43480c
verified

dannyliv commited on

Training in progress, step 4000
eeb36b4
verified

dannyliv commited on

Training in progress, step 3000
5ad850b
verified

dannyliv commited on

Training in progress, step 2000
ddf0779
verified

dannyliv commited on

Training in progress, step 1000
584e5b7
verified

dannyliv commited on

Model save
2cdaff0
verified

dannyliv commited on

Training in progress, step 6696
2676220
verified

dannyliv commited on

Training in progress, step 6000
0ddc07a
verified

dannyliv commited on

Training in progress, step 5000
e459fb7
verified

dannyliv commited on

Training in progress, step 4000
c502615
verified

dannyliv commited on

Training in progress, step 3000
a6d6ef3
verified

dannyliv commited on

Training in progress, step 2000
8f6c441
verified

dannyliv commited on

Training in progress, step 1000
5417fd2
verified

dannyliv commited on

Upload onnx/config.json with huggingface_hub
201f73a
verified

dannyliv commited on

v1.1 ONNX export (599MB merged, opset 18) - 18ms/inference on CPU
e7a21c1
verified

dannyliv commited on

Add threshold sweep results (JBB 0.675, deepset 0.818, jackhhao 0.957)
4ada7ed
verified

dannyliv commited on

Model save
29c9453
verified

dannyliv commited on

Training in progress, step 5628
cffcbc8
verified

dannyliv commited on

Training in progress, step 5000
da2f0e7
verified

dannyliv commited on

Training in progress, step 4000
33a5977
verified

dannyliv commited on

Training in progress, step 3000
0922e7e
verified

dannyliv commited on

Training in progress, step 2000
19e2628
verified

dannyliv commited on

Training in progress, step 1000
a472fd7
verified

dannyliv commited on

Model save
69f8f0b
verified

dannyliv commited on