TY - GEN
T1 - Evaluation of Missing Data Imputation Methods for an Enhanced Distributed PV Generation Prediction
AU - Sundararajan, Aditya
AU - Sarwat, Arif I.
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - To effectively predict generation of distributed photovoltaic (PV) systems, three parameters are critical: irradiance, ambient temperature, and module temperature. However, their completeness cannot be guaranteed because of issues in data acquisition. Many methods in literature address missingness, but their applicability varies with missingness mechanism. Exploration of methods to impute missing data in PV systems is lacking. This paper conducts statistical analyses to understand missingness mechanism in data of a real grid-tied 1.4MW PV system at Miami, and compares the imputation performance of different methods: random imputation, multiple imputation using expectation-maximization, kNN, and random forests, using error metrics and size effect measures. Imputed values are used in a multilayer perceptron to predict and compare PV generation with observed values. Results show that values imputed using kNN and random forests have the least differences in proportions and help utilities make more accurate prediction of generation for distribution planning.
AB - To effectively predict generation of distributed photovoltaic (PV) systems, three parameters are critical: irradiance, ambient temperature, and module temperature. However, their completeness cannot be guaranteed because of issues in data acquisition. Many methods in literature address missingness, but their applicability varies with missingness mechanism. Exploration of methods to impute missing data in PV systems is lacking. This paper conducts statistical analyses to understand missingness mechanism in data of a real grid-tied 1.4MW PV system at Miami, and compares the imputation performance of different methods: random imputation, multiple imputation using expectation-maximization, kNN, and random forests, using error metrics and size effect measures. Imputed values are used in a multilayer perceptron to predict and compare PV generation with observed values. Results show that values imputed using kNN and random forests have the least differences in proportions and help utilities make more accurate prediction of generation for distribution planning.
KW - Data processing
KW - Distributed PV
KW - Imputation methods
KW - Missing data
KW - PV Generation Prediction
UR - http://www.scopus.com/inward/record.url?scp=85075656932&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-32520-6_43
DO - 10.1007/978-3-030-32520-6_43
M3 - Conference contribution
AN - SCOPUS:85075656932
SN - 9783030325190
T3 - Advances in Intelligent Systems and Computing
SP - 590
EP - 609
BT - Proceedings of the Future Technologies Conference, FTC 2019 Volume 1
A2 - Arai, Kohei
A2 - Bhatia, Rahul
A2 - Kapoor, Supriya
PB - Springer
T2 - 4th Future Technologies Conference, FTC 2019
Y2 - 24 October 2019 through 25 October 2019
ER -