arxiv:2605.30102

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

Published on May 28

· Submitted by

Corrado Rainone on May 29

Qualcomm

Upvote

Authors:

Abstract

Hybrid multi-agent systems combining large and small language models offer flexible inference trade-offs, but optimal architecture depends heavily on specific tasks and performance metrics.

AI-generated summary

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-device inference. Hybrid multi-agent systems (MASs) combining on-device and cloud models offer a promising middle ground, but they also introduce a complex and poorly understood design space in which task accuracy, monetary cost, and edge energy consumption are tightly coupled; in the absence of general design principles, hybrid components, although not the most prevalent choice, are typically introduced through ad hoc decisions tailored to specific domains. In this work, we examine this design space more systematically. We adapt two representative MAS architectures to support hybrid inference and study how individual design choices shift the operating point along the Pareto frontier of power, cost, and performance. Our findings paint a nuanced picture of hybrid MAS design: while SLMs can effectively benefit from LLM assistance, the optimal architecture is highly task-dependent, and greater frontier-level compute does not consistently translate to better performance.

View arXiv page View PDF Add to collection

Community

crainone

Paper submitter about 10 hours ago

If you use an edge device-sized, or self hosted, LM to power your agentic system, you will usually observe subpar performance; on the other hand, while cloud-based frontier models can deliver satisfactory performance, they also come with potentially high API costs.
In this paper, we explore how this dilemma can worked around by putting a Multi-Agentic spin on the idea of Hybrid AI. In our system, an Executor agent living on device receives periodic assistance from a Supervisor agent living on the cloud. We explore the design space of such a system and make some non-trivial observations: we see that edge-sized Executors can indeed benefit from assistance from the cloud, resulting in performance superior to an edge-only setup for less API costs than a cloud-only setup; that the best-performing multi-agent architecture depends on the nature of the task; and that our Hybrid MAS is fundamentally different from a routing system.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.30102 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.30102 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.30102 in a Space README.md to link it from this page.