Text Generation
Transformers
Safetensors
How to use from
Docker Model Runner
docker model run hf.co/jonathanhe123/iapo
Quick Links

IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

🚀 Overview

IAPO is an information-theoretic post-training framework designed to improve the token efficiency of Chain-of-Thought (CoT) reasoning. Instead of shaping rewards at the sequence level—as seen in standard RL methods like GRPO—IAPO assigns token-wise advantages based on each token's conditional mutual information (MI) with the final answer. This identifies informative reasoning steps and suppresses low-utility exploration, resulting in significantly shorter reasoning traces without sacrificing accuracy.

🎯 Key Features

  • 🧠 Information-Aware Advantage Shaping: Assigns token-level advantages based on conditional MI, amplifying informative tokens and suppressing redundant ones.
  • 🔍 Exploration Adjustment: Rewards confident tokens in correct trajectories and penalizes them in incorrect ones to prevent reasoning collapse.
  • Efficient Estimation: Introduces an early-exit–based MI estimator with KV-cache preloading to keep computational costs tractable for long-context reasoning.
  • 📉 Provable Length Reduction: Demonstrates monotonic reductions in reasoning verbosity while preserving correctness.
  • 🏆 Performance: Reduces reasoning length by up to 36-47% while improving accuracy across various mathematical reasoning benchmarks.

🔧 Loading the Checkpoints

This repository contains multiple checkpoints fine-tuned from different base models (Qwen2.5-0.5B, 1.5B, 7B). You can load a specific checkpoint using the subfolder argument corresponding to the {base_model}_{dataset} combination.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "jonathanhe123/iapo"
# Example: Load the Qwen2.5-0.5B-Instruct checkpoint fine-tuned on MATH-500
subfolder = "Qwen2.5-0.5B-Instruct_MATH-500"

tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder=subfolder)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    subfolder=subfolder,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Citation

If you find this work useful, please cite the paper:

@inproceedings{he2026iapo,
  title={IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning},
  author={He, Yinhan and Zhu, Yaochen and Shi, Mingjia and Zheng, Wendy and Su, Lin and Wang, Xiaoqing and Guo, Qi and Li, Jundong},
  booktitle={International Conference on Machine Learning (ICML 2026)},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jonathanhe123/iapo

Finetuned
(810)
this model

Datasets used to train jonathanhe123/iapo

Paper for jonathanhe123/iapo