Secure Machine Learning Lab

university

https://schwinnl.github.io/

AI & ML interests

NEW HF: https://huggingface.co/SEML-Lab

authored 5 papers 4 months ago

Efficient Adversarial Training in LLMs with Continuous Attacks

Paper • 2405.15589 • Published May 24, 2024

Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors

Paper • 2305.12380 • Published May 21, 2023

Diffusion LLMs are Natural Adversaries for any LLM

Paper • 2511.00203 • Published Oct 31, 2025 • 1

Closing the Distribution Gap in Adversarial Training for LLMs

Paper • 2602.15238 • Published Feb 16

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

Paper • 2603.06594 • Published Feb 4 • 1

updated a collection 4 months ago

CoinflipForSafety

Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594) • 4 items • Updated Mar 16 • 1

updated a dataset 4 months ago

ASSELab/CoinflipForSafety

Viewer • Updated Mar 11 • 6.56k • 25 • 1

published a dataset 4 months ago

ASSELab/CoinflipForSafety

Viewer • Updated Mar 11 • 6.56k • 25 • 1

updated a dataset 4 months ago

ASSELab/JudgeStressTest

Viewer • Updated Mar 11 • 971 • 18 • 1

published a dataset 4 months ago

ASSELab/JudgeStressTest

Viewer • Updated Mar 11 • 971 • 18 • 1

updated a dataset 4 months ago

ASSELab/ReliableBench

Viewer • Updated Mar 11 • 43 • 36 • 1

updated a collection 4 months ago

CoinflipForSafety

Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594) • 4 items • Updated Mar 16 • 1

published a dataset 4 months ago

ASSELab/ReliableBench

Viewer • Updated Mar 11 • 43 • 36 • 1

updated 3 models 5 months ago

ASSELab/DAT-Llama-3-8B-Instruct

Text Generation • 8B • Updated Feb 18 • 189 • • 2

ASSELab/DAT-Qwen2.5-14B-Instruct

Text Generation • 15B • Updated Feb 18 • 3

ASSELab/Diffusion-Llama-3-8B-Instruct

Text Generation • 8B • Updated Feb 18 • 3

updated a collection 5 months ago

DAT

Distributional Adversarial Training utilizes cont. adv. training on diffusion-based adv. examples to close a gap in population-robust risk estimation. • 4 items • Updated Feb 18 • 1

in ASSELab/DAT-Qwen2.5-14B-Instruct 5 months ago

Upload merged model

#1 opened 5 months ago by