CFP last date
21 July 2025
Reseach Article

Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs

by Krishnam Raju Narsepalle
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 14
Year of Publication: 2025
Authors: Krishnam Raju Narsepalle
10.5120/ijca2025925323

Krishnam Raju Narsepalle . Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs. International Journal of Computer Applications. 187, 14 ( Jun 2025), 1-13. DOI=10.5120/ijca2025925323

@article{ 10.5120/ijca2025925323,
author = { Krishnam Raju Narsepalle },
title = { Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2025 },
volume = { 187 },
number = { 14 },
month = { Jun },
year = { 2025 },
issn = { 0975-8887 },
pages = { 1-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number14/energy-efficient-training-and-inference-in-large-language-models-optimizing-computational-and-energy-costs/ },
doi = { 10.5120/ijca2025925323 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-06-21T01:57:11.099954+05:30
%A Krishnam Raju Narsepalle
%T Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 14
%P 1-13
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The larger the size of the Large Language Models (LLMs) is, the higher their computational and energy costs become, and thus, the environmental and economic impact increases. This paper examines several initiatives aimed at reducing the energy and computational costs associated with training and deploying Large Language Models (LLMs). Training sparse, adaptive inference, and hardware acceleration (based on GPUs and TPUs) are assessed. The modelling experiments using BERT and GPT indicate that sparse training reduces the computational workload by an additional 35%, while adaptive inference significantly reduces energy consumption during inference by 20%. Additionally, a 25% energy savings has been achieved by optimizing resource loading on the hardware. These findings suggest that energy-efficient Large Language Model (LLM) training and inference methods can significantly reduce the environmental impact of large-scale AI models, making them more sustainable for widespread use.

References
  1. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  2. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI. https://www.openai.com/research/language-unsupervised/
  3. Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. 7th International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1803.03635
  4. Narang, S., Elsen, E., Diamos, G., & Sengupta, S. (2017). Exploring sparsity in recurrent neural networks. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1704.05119
  5. Hubara, I., Nahshan, Y., Hoffer, E., & Soudry, D. (2021). Training with Quantisation noise for extreme model compression. Advances in Neural Information Processing Systems (NeurIPS), 34, 10186–10197. https://arxiv.org/abs/2004.07320
  6. Dettmers, T., Lewis, M., Shleifer, S., & Zettlemoyer, L. (2022). LLM.int8(): 8-bit matrix multiplication for transformers at scale. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.org/abs/2208.07339
  7. Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63. https://doi.org/10.1145/3381831
  8. Liu, X., You, H., Zhang, Y., & Demmel, J. (2021). Dynamic neural networks for efficient inference. International Conference on Machine Learning (ICML). https://arxiv.org/abs/2102.04906
  9. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 3645–3650. https://doi.org/10.18653/v1/P19-1355
  10. Jouppi, N. P., Young, C., Patil, N., et al. (2017). In-datacenter performance analysis of a tensor processing unit. Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 1–12. https://doi.org/10.1145/3079856.3080246
  11. Gupta, U., Lee, J., Na, T., et al. (2020). Efficient AI at scale with nanoscale systems. Google AI Blog. https://ai.googleblog.com/
  12. Patterson, D., Gonzalez, J., Hölzle, U., et al. (2021). Carbon emissions and large neural network training. Nature Machine Intelligence, 3(2), 89–94. https://doi.org/10.1038/s42256-020-00297-z
  13. Micikevicius, P., Narang, S., Alben, J., et al. (2018). Mixed precision training. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1710.03740
  14. Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2022). ZeRO: Memory optimization towards training trillion parameter models. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/1910.02054
  15. Lacoste, A., Luccioni, A., Schmidt, V., & Dandres, T. (2019). Quantifying the carbon emissions of machine learning. NeurIPS Workshop on Tackling Climate Change with Machine Learning. https://arxiv.org/abs/1910.09700
  16. Henderson, P., Hu, J., Romoff, J., et al. (2020). Towards environmentally sustainable AI: Challenges, opportunities, and a research agenda. Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3351095.3372828
  17. Google AI. (2020). Toward a more sustainable AI. Google Research Blog. https://ai.googleblog.com/
  18. Facebook AI. (2021). Reducing the environmental impact of AI systems. Facebook AI Blog. https://ai.facebook.com/blog/
  19. BigScience Workshop. (2022). BLOOM: A 176B parameter open-access language model. arXiv preprint. https://arxiv.org/abs/2211.05100
  20. Black, S., et al. (2021). GPT-Neo: Large-scale autoregressive language models. EleutherAI. https://github.com/EleutherAI/gpt-neo
  21. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3442188.3445922
  22. Birhane, A., van Dijk, J., & Priya, S. (2022). The cost of AI: Environmental and social impacts. NeurIPS Workshop on Machine Learning for the Developing World. https://arxiv.org/abs/2206.11990
  23. Northcutt, C. G., Athalye, A., & Mueller, J. (2021). Pervasive label errors in test sets destabilize machine learning benchmarks. Journal of Machine Learning Research (JMLR), 22(1), 1–48. https://jmlr.org/papers/v22/20-950.html
  24. Dodge, J., Gururangan, S., Card, D., et al. (2020). Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.522
  25. Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2005.11
Index Terms

Computer Science
Information Sciences
Energy Efficiency
Model Optimisation
Sustainability
Artificial Intelligence (AI)
Computational Efficiency

Keywords

Energy-Efficient Training LLM Sparse Training Adaptive Inference Hardware Acceleration