Model Details

📃 Paper • 🌐 Project Page • 🤗 PABU-Data • 🤗 Model (PABU-Agent-8B)

Model Description

PABU-Agent-8B is a Large Language Model (LLM) agent built on top of LLaMA‑3.1‑8B, fine-tuned for interactive decision making using step-level supervision from the PABU Dataset. The model is trained to operate in sequential action–observation environments while maintaining a compact belief state via Progress-Aware Belief Update (PABU).

Instead of conditioning on full interaction histories, the model learns to predict relative task progress at each step and selectively retain informative past interactions. This results in improved task completion and reduced interaction length across diverse long-horizon environments.

Model type: Decoder-only causal language model with belief-state conditioning
Language(s) (NLP): English
License: Inherits LLaMA‑3.1 license and downstream dataset licenses
Finetuned from model: LLaMA‑3.1‑8B

Model Sources

Base Model: https://ai.meta.com/llama/
Repository: https://github.com/Hunter-Jiang/Progress-Aware-Belief-Update
Paper: PABU: Progress-Aware Belief Update for Efficient LLM Agents

Uses

Direct Use

Acting as an autonomous LLM agent in text-based interactive environments
Research on belief updating, memory selection, and long-horizon reasoning
Benchmarking agent efficiency under fixed training trajectories

Downstream Use

Further fine-tuning for specialized agent environments
Integration into agent frameworks requiring compact state representations

Out-of-Scope Use

General-purpose chat or instruction following without environment feedback
Real-world decision making or safety-critical deployments
Tasks requiring multimodal (vision, audio) perception

Bias, Risks, and Limitations

Optimized for synthetic, text-based environments; real-world transfer is limited
Progress signals are environment-dependent and may not generalize
Inherits biases present in the base LLaMA‑3.1 model and environment text

Recommendations

Users should evaluate the model in their target environment and avoid extrapolating performance gains beyond AgentGym-style tasks.

How to Get Started with the Model

The model is intended to be used within an agent loop that alternates between observations and actions and maintaining a belief memory buffer as described in PABU.

Citation

@misc{jiang2026pabuprogressawarebeliefupdate,
      title={PABU: Progress-Aware Belief Update for Efficient LLM Agents}, 
      author={Haitao Jiang and Lin Ge and Hengrui Cai and Rui Song},
      year={2026},
      eprint={2602.09138},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.09138}, 
}

Downloads last month: 18

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for HunterJiang97/PABU-Agent-8B

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1728)

this model

Dataset used to train HunterJiang97/PABU-Agent-8B

Collection including HunterJiang97/PABU-Agent-8B

PABU-Implementation

Collection

Paper for HunterJiang97/PABU-Agent-8B

PABU: Progress-Aware Belief Update for Efficient LLM Agents

Paper • 2602.09138 • Published 9 days ago