Original Article

Comparative evaluation of ensemble machine learning models for predicting band gaps of double perovskites

Abstract

For the discovery of promising new materials for energy and optoelectronic applications, it is of utmost importance to predict the band gaps of double perovskites with high accuracy. Standard Density Functional Theory (DFT) methods, although trustworthy, tend to be computationally expensive compared to big scale screenings. In order to circumvent these limitations, we develop a Gradient Boosting Regression (GBR)-based machine learning (ML) framework strengthened by polynomial feature expansion, standardized preprocessing and rigorous hyperparameter optimization. It has 39 descriptors fed from compositions, such as electronegativity, ionic radii, oxidation states, and orbital energies, giving a dataset of 4,121 double perovskite compounds. Separation of data into training and test sets was done randomly (80:20) and evaluation of model

performance was done using more than one metric. The models for GBR yielded 0.9556 (adjusted R²) 0.0988 eV (MAE), 0.2180 eV (RMSE), 0.9574 (EV), and 4.8393 (RPD) respectively, which indicates that this model was high predictive accuracy and reliability. Compared with earlier works which used Support Vector Regression (SVR), Random Forest and XGBoost, the proposed framework achieves far better performance with physical interpretability. This enables a reproducible datadriven approach between trial and error towards near-DFT-level predictions, leading to both highthroughput screening and rational design of double perovskites.

Keywords

Ensemble machine learningSupport vector regressionElectronic bandgapXGBoostLightGBM

Corresponding Author

Mr. Farhan Sarwar

Department of Basic Sciences, Superior University, Lahore, Pakistan

su92-mpmmw-f24-003@superior.edu.pk

Article History

Received Date : 29 April 2025

Revised Date : 20 May 2025

Accepted Date : 27 May 2025

Loading publication timeline...

WhatsApp Chat