On Hierarchical Multiple Imputation Method for Handling Missing Data

Document Type : Research Paper

Authors

1 Department of Statistics, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran

2 Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran. and Kerman Chamber of Commerce, Industries, Mines and Agriculture, Kerman, Iran

3 Kerman Chamber of Commerce, Industries, Mines and Agriculture, Kerman, Iran

Abstract

In this work we carry out a multiple imputation technique for handling missing observations. We propose an algorithm, which performs a hierarchical multiple imputation using edition rules to impute missing values. We assess our algorithm using a simulation study and a numerical application of our algorithm in dataset of Kerman Chamber of Commerce, Industries, Mines and Agriculture is presented for more illustration.

Keywords


[1] Charu C Aggarwal and Saket Sathe. Outlier ensembles: An introduction. Springer, 2017.
[2] Malik Agyemang, Ken Barker, and Rada Alhajj. A comprehensive survey of numeric and symbolic outlier mining  techniques. Intelligent Data Analysis, 10(6):521{538, 2006.
[3] Zohreh Akbari and Rainer Unland. Automated determination of the input parameter of dbscan based on outlier detection. In IFIP International Conference on Arti cial Intelligence Applications and Innovations, pages 280{291. Springer, 2016.
[4] Krishnan Bhaskaran and Liam Smeeth. What is the di erence between missing completely at random and missing at random? International Journal of Epidemiology, 43(4):1336{1339, 2014.
[5] Nicole M Butera, Siying Li, Kelly R Evenson, Chongzhi Di, David M Buchner, Michael J LaMonte, Andrea Z LaCroix, and Amy Herring. Hot deck multiple imputation for handling missing accelerometer data. Statistics in Biosciences, 11(2):422{448, 2019.
[6] S van Buuren and Karin Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in r. Journal of statistical software, pages 1{68, 2010.
[7] James R Carpenter, Michael G Kenward, and Ian R White. Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Statistical methods in medical research, 16(3):259{275, 2007.
[8] Ya Chen, Yongjun Li, Huaqing Wu, and Liang Liang. Data envelopment analysis with missing data: A multiple linear regression analysis approach. International Journal of Information Technology & Decision Making, 13(01):137{153, 2014.
[9] Zhangyu Cheng, Chengming Zou, and Jianwei Dong. Outlier detection using isolation forest and local outlier factor. In Proceedings of the conference on research in adaptive and convergent systems, pages 161{168, 2019.
[10] Tamraparni Dasu and Theodore Johnson. Exploratory data mining and data cleaning. John Wiley & Sons, 2003.
[11] Ivan P Fellegi and David Holt. A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353):17{35, 1976.
[12] Gary Fraser and Ru Yan. Guided multiple imputation of missing data: using a subsample to strengthen the missing-at-random assumption. Epidemiology, pages 246{252, 2007.
[13] Alex A Freitas. Data mining and knowledge discovery with evolutionary algorithms. Springer Science & Business Media, 2013.
[14] Salvador Garca, Julian Luengo, and Francisco Herrera. Data preprocessing in data mining. Springer, 2015.
[15] Benjamin Yael Gravesteijn, Charlie Aletta Sewalt, Esmee Venema, Daan Nieboer, Ewout W Steyerberg, and CENTER-TBI Collaborators. Missing data in prediction research: A  ve-step approach for multiple imputation, illustrated in the center-tbi study. Journal of neurotrauma, 38(13):1842{1857, 2021.
[16] Simon Grund, Oliver Ludtke, and Alexander Robitzsch. Multiple imputation of missing data in multilevel models with the r package mdmb: a  exible sequential modeling approach. Behavior Research Methods, pages 1{19, 2021.
[17] Julie Josse and Francois Husson. Handling missing values in exploratory multivariate data analysis methods. Journal de la Societe Francaise de Statistique, 153(2):79{99, 2012.
[18] Hyun Kang. The prevention and handling of the missing data. Korean Journal of Anes-thesiology, 64(5):402, 2013.
[19] Shahidul Islam Khan and Abu Sayed Md Latiful Hoque. Sice: an improved missing data imputation technique. Journal of Big Data, 7(1):1{21, 2020.
[20] Hang J Kim, Alan F Karr, and Jerome P Reiter. Statistical disclosure limitation in the presence of edit rules. Journal of Ocial Statistics, 31(1):121{138, 2015.
[21] Sang Kyu Kwak and Jong Hae Kim. Statistical data preparation: management of missing values and outliers. Korean Journal of Anesthesiology, 70(4):407, 2017.
[22] Roderick JA Little and Donald B Rubin. Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019.
[23] Daniel McNeish. Missing data methods for arbitrary missingness with small samples. Journal of Applied Statistics, 44(1):24{39, 2017.
[24] Jared S Murray et al. Multiple imputation: a review of practical and theoretical  ndings. Statistical Science, 33(2):142{159, 2018.
[25] Irfan Pratama, Adhistya Erna Permanasari, Igi Ardiyanto, and Rini Indrayani. A review of missing values handling methods on time-series data. In 2016 International Conference on Information Technology Systems and Innovation (ICITSI), pages 1{6. IEEE, 2016.
[26] Burim Ramosaj and Markus Pauly. Predicting missing values: a comparative study on non-parametric approaches for imputation. Computational Statistics, 34(4):1741{1764, 2019.
[27] Peter J Rousseeuw and Mia Hubert. Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):73{79, 2011.
[28] Donald B Rubin. Multiple imputation after 18+ years. Journal of the American statistical Association, 91(434):473{489, 1996.
[29] Donald B Rubin. Multiple imputation for nonresponse in surveys, volume 81. John Wiley & Sons, 2004.
[30] Akiyo Sasaki-Otomaru, Kotaro Yamasue, Osamu Tochikubo, Kyoko Saito, and Masahiko Inamori. Association of home blood pressure with sleep and physical and mental activity, assessed via a wristwatch-type pulsimeter with accelerometer in adults. Clinical and Experimental Hypertension, 42(2):131{138, 2020.
[31] Joseph L Schafer. Analysis of incomplete multivariate data. CRC press, 1997.
[32] Joseph L Schafer and Maren K Olsen. Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate behavioral research, 33(4):545{571,1998.
[33] Shaun Seaman, John Galati, Dan Jackson, and John Carlin. What is meant by "missing at random"? Statistical Science, 1:257{268, 2013.
[34] Ronald E Shier. Maximum z scores and outliers. The American Statistician, 42(1):79{80, 1988.
[35] K Shobha and S Nickolas. Imputation of multivariate attribute values in big data. In Smart intelligent computing and applications, pages 53{60. Springer, 2019.
Volume 10, Issue 2
Special Issue Dedicated to Professor M. Radjabalipour on the occasion of his 75th birthday.
October 2021
Pages 103-114
  • Receive Date: 16 June 2021
  • Revise Date: 01 August 2021
  • Accept Date: 13 August 2021