Model Details

📃 Paper • 🌐 Project Page • 🤗 PABU-Data • 🤗 Model (PABU-Agent-8B)

Model Description

PABU-Agent-8B is a Large Language Model (LLM) agent built on top of LLaMA‑3.1‑8B, fine-tuned for interactive decision making using step-level supervision from the PABU Dataset. The model is trained to operate in sequential action–observation environments while maintaining a compact belief state via Progress-Aware Belief Update (PABU).

Instead of conditioning on full interaction histories, the model learns to predict relative task progress at each step and selectively retain informative past interactions. This results in improved task completion and reduced interaction length across diverse long-horizon environments.

  • Model type: Decoder-only causal language model with belief-state conditioning
  • Language(s) (NLP): English
  • License: Inherits LLaMA‑3.1 license and downstream dataset licenses
  • Finetuned from model: LLaMA‑3.1‑8B

Model Sources

Uses

Direct Use

  • Acting as an autonomous LLM agent in text-based interactive environments
  • Research on belief updating, memory selection, and long-horizon reasoning
  • Benchmarking agent efficiency under fixed training trajectories

Downstream Use

  • Further fine-tuning for specialized agent environments
  • Integration into agent frameworks requiring compact state representations

Out-of-Scope Use

  • General-purpose chat or instruction following without environment feedback
  • Real-world decision making or safety-critical deployments
  • Tasks requiring multimodal (vision, audio) perception

Bias, Risks, and Limitations

  • Optimized for synthetic, text-based environments; real-world transfer is limited
  • Progress signals are environment-dependent and may not generalize
  • Inherits biases present in the base LLaMA‑3.1 model and environment text

Recommendations

Users should evaluate the model in their target environment and avoid extrapolating performance gains beyond AgentGym-style tasks.

How to Get Started with the Model

The model is intended to be used within an agent loop that alternates between observations and actions and maintaining a belief memory buffer as described in PABU.

Citation

@misc{jiang2026pabuprogressawarebeliefupdate,
      title={PABU: Progress-Aware Belief Update for Efficient LLM Agents}, 
      author={Haitao Jiang and Lin Ge and Hengrui Cai and Rui Song},
      year={2026},
      eprint={2602.09138},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.09138}, 
}
Downloads last month
18
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HunterJiang97/PABU-Agent-8B

Finetuned
(1728)
this model

Dataset used to train HunterJiang97/PABU-Agent-8B

Collection including HunterJiang97/PABU-Agent-8B

Paper for HunterJiang97/PABU-Agent-8B