Update README.md

663f1f5 verified 2 months ago

1.7 kB

license: mit
base_model:
  - Qwen/Qwen2.5-14B-Instruct
datasets:
  - HuggingFaceH4/ultrachat_200k
  - walledai/HarmBench
new_version: ASSELab/DAT-Qwen2.5-14B-Instruct
tags:
  - pytorch
  - qwen
  - llama-3
  - DAT
  - robust
  - adversarial
library_name: transformers
paper:
  title: Closing the Distribution Gap in Adversarial Training for LLMs
  url: https://arxiv.org/abs/2602.15238

DAT - Distributional Adversarial Training

DAT utilizes continuous adversarial training on diffusion-based adversarial examples to close the gap between empirical and population-robust risk. We fine-tune Qwen/Qwen2.5-14B-Instruct.

This model is NOT using adversarial training! This is an ablation/baseline using just the diffusion data to fine-tune.

For further information, consult our paper https://arxiv.org/abs/2602.15238 or repository https://github.com/ASSELab/DAT

Citation

@misc{hu2026closingdistributiongapadversarial,
      title={Closing the Distribution Gap in Adversarial Training for LLMs}, 
      author={Chengzhi Hu and Jonas Dornbusch and David Lüdke and Stephan Günnemann and Leo Schwinn},
      year={2026},
      eprint={2602.15238},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.15238}, 
}