AAPA-8B

This repository contains the 8B A-GRPO checkpoint released with AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models.

AAPA augments GRPO-style post-training with a sentence-level adversarial anchoring signal. This checkpoint is trained from a Qwen3 8B base model using the AAPA code release.

Resources

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Jingleqian/AAPA-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Citation

@article{aapa2026,
  title={AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models},
  author={},
  journal={arXiv preprint},
  year={2026}
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jingleqian/AAPA-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1725)
this model
Quantizations
2 models

Collection including Jingleqian/AAPA-8B