AAPA-06B

This repository contains the 0.6B A-GRPO checkpoint released with AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models.

AAPA augments GRPO-style post-training with a sentence-level adversarial anchoring signal. This checkpoint is trained from the Qwen3 0.6B instruction model using the AAPA code release.

Resources

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Jingleqian/AAPA-06B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Citation

@article{aapa2026,
  title={AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models},
  author={},
  journal={arXiv preprint},
  year={2026}
}
Downloads last month
77
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jingleqian/AAPA-06B

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(1003)
this model
Quantizations
1 model

Collection including Jingleqian/AAPA-06B