Cross-Layer Neural Network Optimization for Balancing Accuracy, Speed, and Energy Consumption in Real-World Applications

Keai Chu Kin

Authors

Keai Chu Kin Independent Researcher, Hong Kong. Author

Keywords:

Cross-layer optimization, energy-aware neural networks, deep learning acceleration, edge computing, latency-aware training, dynamic inference, model compression, accuracy-latency tradeoff, adaptive pruning, quantization, neural architecture search

Abstract

As machine learning applications proliferate across mobile, embedded, and edge devices, there is a pressing need to optimize neural networks not only for accuracy but also for computational speed and energy efficiency. Traditional approaches that focus on single-layer optimizations often fall short in meeting the constraints of real-world applications. This paper presents a cross-layer optimization framework that integrates algorithmic, architectural, and hardware-level adaptations to holistically balance accuracy, latency, and energy consumption. The proposed framework enables neural networks to dynamically reconfigure their behavior based on runtime constraints, leveraging techniques such as layer fusion, quantization-aware training, memory hierarchy reorganization, and adaptive activation pruning. Results from empirical evaluations on diverse benchmarks demonstrate that our method achieves up to 40% energy reduction and 30% latency improvement with negligible accuracy degradation. These findings pave the way for more sustainable and deployable AI systems in edge computing and mobile inference.

References

Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. In: Advances in Neural Information Processing Systems, 2015.

Gujjala, P.K.R. (2022). Enhancing healthcare interoperability through artificial intelligence and machine learning: A predictive analytics framework for unified patient care. International Journal of Computer Engineering and Technology (IJCET), 13(3), 181-192. https://doi.org/10.34218/IJCET_13_03_018

Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

Howard AG, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.

Chen YH, et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits, 2016.

Sze V, Chen YH, Yang TJ, Emer JS. Efficient processing of deep neural networks: A tutorial and survey. Proc IEEE, 2017.

Gujjala, P.K.R. (2023). Advancing Artificial Intelligence and Data Science: A Comprehensive Framework for Computational Efficiency and Scalability. International Journal of Research in Computer Applications and Information Technology, 6(1), 155–166. https://doi.org/10.34218/IJRCAIT_06_01_012

Tan M, Le Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, 2019.

Wu B, et al. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In: CVPR, 2019.

Oleti, C.S. (2022). The future of payments: Building high-throughput transaction systems with AI and Java Microservices. World Journal of Advanced Research and Reviews, 16(03), 1401-1411. https://doi.org/10.30574/wjarr.2022.16.3.1281

Lin J, Chen W, Luo P. Dynamic runtime neural pruning. IEEE Trans Pattern Anal Mach Intell, 2020.

Wang Z, et al. Haq: Hardware-aware automated quantization with mixed precision. In: CVPR, 2019.

Choi Y, et al. Towards the limit of network quantization. In: ICLR, 2019.

Yang T, et al. NetAdapt: Platform-aware neural network adaptation for mobile applications. In: ECCV, 2018.

Zhang X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: CVPR, 2018.

Lee D, et al. Context-aware neural network pruning for edge AI. In: NeurIPS, 2020.

Kim Y, et al. Energy-aware dynamic DNN pruning for mobile devices. In: ACM MobiSys, 2020.

Oleti, C. S. (2022). Serverless intelligence: Securing J2EE-based federated learning pipelines on AWS. International Journal of Computer Engineering and Technology, 13(3), 163-180. https://doi.org/10.34218/IJCET_13_03_017

Cai H, et al. ProxylessNAS: Direct neural architecture search on target task and hardware. In: ICLR, 2019.

Lin S, et al. MCUNet: Tiny deep learning on IoT devices. In: NeurIPS, 2020.

Cross-Layer Neural Network Optimization for Balancing Accuracy, Speed, and Energy Consumption in Real-World Applications

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite