Abstract
Context: The rise of Large Language Models (LLMs) has led to their widespread adoption in development pipelines.
Goal: We empirically assess the energy efficiency of Python code generated by LLMs against human-written code and code developed by a Green software expert.
Method: We test 363 solutions to 9 coding problems from the EvoEval benchmark using 6 widespread LLMs with 4 prompting techniques, and comparing them to human-developed solutions. Energy consumption is measured on three different hardware platforms: a server, a PC, and a Raspberry Pi for a total of ≈881h (36.7 days).
Results. Human solutions are 16% more energy-efficient on the server and 3% on the Raspberry Pi, while LLMs outperform human developers by 25% on the PC. Prompting does not consistently lead to energy savings, where the most energy-efficient prompts vary by hardware platform. The code developed by a Green software expert is consistently more energy-efficient by at least 17% to 30% against all LLMs on all hardware platforms.
Conclusions: Even though LLMs exhibit relatively good code generation capabilities, no LLM-generated code was more energy-efficient than that of an experienced Green software developer, suggesting that as of today there is still a great need of human expertise for developing energy-efficient Python code.
Software and Sustainability group paper accepted by ICSE 2026
17 September 2025
Paper title:
Generating Energy-Efficient Code via Large-Language Models – Where are we now?
Authors:
Radu Apsan, Vincenzo Stoico, Michel Albonico, Rudra Dhar, Karthik Vaidhyanathan, Ivano Malavolta