Papers
arxiv:2605.24830

Macaron-A2UI: A Model for Generative UI in Personal Agents

Published on May 24
· Submitted by
Andrew Chen
on May 26
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

Generative UI models enable personal agents to synthesize dynamic interfaces with lightweight executable actions for enhanced interaction beyond text-only formats.

AI-generated summary

As personal agents evolve to handle complex, user-centric tasks, static plain-text chat is rapidly becoming a bottleneck. Generative UI emerges as the necessary new interface layer, dynamically synthesizing the right controls, options, and state from the interaction context in real time. We present Macaron-A2UI, a model for Generative UI in personal agents. Our goal is to move beyond text-only interaction by enabling agents to generate natural language together with lightweight, executable UI actions for information collection, preference refinement, confirmation, and multi-goal organization. We build a large-scale Generative UI corpus from heterogeneous dialogue sources, introduce A2UI-Bench for controlled evaluation, and train 30B, 235B and 754B models with parameter-efficient LoRA-based supervised fine-tuning followed by reward-driven reinforcement learning. The best Macaron-A2UI model reaches 75.6 overall on A2UI-Bench without explicit schema hints, surpassing the strongest full-schema frontier baseline. We release the models, benchmark, and evaluation protocol to support future work on Generative UI for personal agents.

Community

Paper submitter

Macaron-A2UI: A Model for Generative UI in Personal Agents

Interesting work!

the fact you can hit 75.6 on a2ui-bench without explicit schema hints is pretty striking. that schema-light training recipe, with loRA-sft followed by reward-driven rl, basically lets the model learn to generate executable ui alongside natural language. i’d love to see an ablation where you cut the rl reward model entirely and rely only on supervised fine-tuning — my hunch is rl is doing most of the heavy lifting for action validity and safety. edge cases where controls differ across apps or safety policies kick in could expose brittleness in the generated widgets. btw, arxivlens had a solid breakdown that helped me parse the method details: https://arxivlens.com/PaperView/Details/macaron-a2ui-a-model-for-generative-ui-in-personal-agents-495-62505cf9 do you plan to publish an ablation on rl vs sft and test true cross-app robustness in a follow-up?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.24830
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.24830 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.24830 in a Space README.md to link it from this page.

Collections including this paper 4