Development of missing data prediction model for carbon monoxide

Carbon monoxide (CO) is one of the most important pollutants since it is selected for API calculation. Therefore, it is paramount to ensure that there is no missing data of CO during the analysis. There are numbers of occurrences that may contribute to the missing data problems such as inability o...

Full description

Bibliographic Details
Main Authors: Abd Rani, Nurul Latiffah, Azid, Azman, Abdullah Sani, Muhamad Shirwan, Samsudin, Mohd Saiful, Ku Yusof, Ku Mohd Kalkausar, Muhammad Amin, Siti Noor Syuhada, Khalit, Saiful Iskandar
Format: Article
Language:English
English
Published: Penerbit UTM Press 2019
Subjects:
Online Access:http://irep.iium.edu.my/70673/
http://irep.iium.edu.my/70673/
http://irep.iium.edu.my/70673/1/70673_Development%20of%20missing%20data%20prediction.pdf
http://irep.iium.edu.my/70673/2/70673_Development%20of%20missing%20data%20prediction_WOS.pdf
Description
Summary:Carbon monoxide (CO) is one of the most important pollutants since it is selected for API calculation. Therefore, it is paramount to ensure that there is no missing data of CO during the analysis. There are numbers of occurrences that may contribute to the missing data problems such as inability of the instrument to record certain parameters. In view of this fact, a CO prediction model needs to be developed to address this problem. A dataset of meteorological and air pollutants value was obtained from the Air Quality Division, Department of Environment Malaysia (DOE). A total of 113112 datasets were used to develop the model using sensitivity analysis (SA) through artificial neural network (ANN). SA showed particulate matter (PM10) and ozone (O3) were the most significant input variables for missing data prediction model of CO. Three hidden nodes were the optimum number to develop the ANN model with the value of R2 equal to 0.5311. Both models (artificial neural network-carbon monoxide-all parameters (ANN-CO-AP) and artificial neural network-carbon monoxide-leave out (ANN-CO-LO)) showed high value of R2 (0.7639 and 0.5311) and low value of RMSE (0.2482 and 0.3506), respectively. These values indicated that the models might only employ the most significant input variables to represent the CO rather than using all input variables.