memo-ozdincer
/

rrfa-runs

Text Generation

circuit-breakers

representation-rerouting

prompt-injection

Model card Files Files and versions

Commit History

Model Card, explaining new LMP and MWCS policies implemented (from Jan 18 group meeting)

92593eb

memo-ozdincer commited on Jan 19

LoRA RR adapters for Llama 3.1 trained on Tool-flip only Loss Masking Policy & Low-weight Mixture Weighting & Curriculum Schedules calculated by perplexity/cross-entropy on fixed token-window at the point of prompt injection. Artifacts of run #208788. Full performance eval uploaded soon

cc90432

memo-ozdincer commited on Jan 19