ICLR 2026 LogicReward LoRA Adapter (Qwen3 8B)

1. Introduction

This repository provides LoRA adapter weights only for Qwen3 8B, trained using LLaMA-Factory as part of the LogicReward project.

📄 Paper: LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision (ICLR 2026)
💻 Code:
https://github.com/Aiden0526/Logic-Reward
🌐 Project Page: https://llm-symbol.github.io/LogicReward/
📦 Model Collection (all variants):
https://huggingface.co/collections/Aiden0526/logicreward

⚠️ Important: This repository does NOT contain the base model weights.
You must separately obtain the base model from Hugging Face.

This model corresponds to one trained variant in LogicReward series.
See the collection page for other variants (e.g., LogicReward-Qwen3-8B).

2. Model Information

Base model: Qwen/Qwen3-8B
Model type: LoRA adapter (PEFT)
Training framework: LLaMA-Factory
Training stages: SFT → DPO
Architecture: Decoder-only Transformer
Language: English
License: Apache 2.0 (Inherits base model license)

Detailed training configuration and datasets are described in the paper.

3. How to Use

Installation

pip install -U transformers peft accelerate

Load Base Model + LoRA Adapter

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3-8B"
adapter_id = "Aiden0526/LogicReward-Qwen3-8B"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference Example

prompt = "Explain symbolic reasoning in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model, please cite:

@article{logicreward2026,
  title   = {LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision},
  author  = {Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu},
  booktitle = {Proceedings of the International Conference on Learning Representations},
  year    = {2026},
  url = {https://arxiv.org/abs/2512.18196}
}

Downloads last month: 4

Model tree for Aiden0526/LogicReward-Qwen3-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1060)

this model

Collection including Aiden0526/LogicReward-Qwen3-8B

LogicReward

Collection

Model collection for ICLR 2026 Paper "LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision". • 2 items • Updated Jan 29 • 1

Paper for Aiden0526/LogicReward-Qwen3-8B

Training LLMs with LogicReward for Faithful and Rigorous Reasoning

Paper • 2512.18196 • Published Dec 20, 2025