ICLR 2026 LogicReward LoRA Adapter (Qwen3 8B)
1. Introduction
This repository provides LoRA adapter weights only for Qwen3 8B, trained using LLaMA-Factory as part of the LogicReward project.
๐ Paper: LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision (ICLR 2026)
๐ Project Page: https://llm-symbol.github.io/LogicReward/
๐ฆ Model Collection (all variants):
https://huggingface.co/collections/Aiden0526/logicreward
โ ๏ธ Important: This repository does NOT contain the base model weights.
You must separately obtain the base model from Hugging Face.
This model corresponds to one trained variant in LogicReward series.
See the collection page for other variants (e.g., LogicReward-Qwen3-8B).
2. Model Information
- Base model:
Qwen/Qwen3-8B - Model type: LoRA adapter (PEFT)
- Training framework: LLaMA-Factory
- Training stages: SFT โ DPO
- Architecture: Decoder-only Transformer
- Language: English
- License: Apache 2.0 (Inherits base model license)
Detailed training configuration and datasets are described in the paper.
3. How to Use
Installation
pip install -U transformers peft accelerate
Load Base Model + LoRA Adapter
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "Qwen/Qwen3-8B"
adapter_id = "Aiden0526/LogicReward-Qwen3-8B"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
Inference Example
prompt = "Explain symbolic reasoning in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
If you use this model, please cite:
@article{logicreward2026,
title = {LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision},
author = {Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu},
booktitle = {Proceedings of the International Conference on Learning Representations},
year = {2026},
url = {https://arxiv.org/abs/2512.18196}
}
- Downloads last month
- 4