Papers
arxiv:2311.18578

Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum

Published on Jun 27, 2025
Authors:
,
,

Abstract

Federated learning faces challenges from data heterogeneity and partial client participation, which are addressed by proposing a novel Generalized Heavy-Ball Momentum method that achieves better convergence and performance in large-scale scenarios.

AI-generated summary

Federated Learning (FL) has emerged as the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios.However, system and statistical challenges hinder its real-world applicability, requiring efficient learning from edge devices and robustness to data heterogeneity. Despite significant research efforts, existing approaches often degrade severely due to the joint effect of heterogeneity and partial client participation. In particular, while momentum appears as a promising approach for overcoming statistical heterogeneity, in current approaches its update is biased towards the most recently sampled clients. As we show in this work, this is the reason why it fails to outperform FedAvg, preventing its effective use in real-world large-scale scenarios. In this work, we propose a novel Generalized Heavy-Ball Momentum (GHBM) and theoretically prove it enables convergence under unbounded data heterogeneity in cyclic partial participation, thereby advancing the understanding of momentum's effectiveness in FL. We then introduce adaptive and communication-efficient variants of GHBM that match the communication complexity of FedAvg in settings where clients can be stateful. Extensive experiments on vision and language tasks confirm our theoretical findings, demonstrating that GHBM substantially improves state-of-the-art performance under random uniform client sampling, particularly in large-scale settings with high data heterogeneity and low client participation. Code is available at https://rickzack.github.io/GHBM.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.18578 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.18578 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.18578 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.