AIME: Adaptive Inference with Model Evolution for Efficient On-Device Large Language Model Serving
Published in IEEE ICDCS, 2025
In this work, we propose ACME, an adaptive model customization framework designed to deploy large Transformer-based models efficiently across heterogeneous devices in distributed systems. ACME addresses critical issues such as performance imbalances, energy inefficiency, and privacy concerns when deploying pre-trained models like ViT and BERT at the edge.
The system uses a bidirectional single-loop architecture that progressively customizes models in two phases: (1) backbone customization through Pareto-optimal architecture generation on cloud and edge servers, and (2) header refinement through neural architecture search (NAS) and personalized aggregation based on local data distributions.
This work is a collaboration among researchers at the College of Intelligence and Computing, Tianjin University.
Recommended citation: Ziming Dai, Yunfeng Zhao, Yuxuan Wang, Jinhui Xu, Jinhang Song, Chao Qiu, and Salman Avestimehr. "AIME: Adaptive Inference with Model Evolution for Efficient On-Device Large Language Model Serving." IEEE ICDCS 2025.