تحلیل پوششی داده‌ها با داده‌های گمشده

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری، گروه ریاضی، واحد پارس آباد مغان، دانشگاه آزاد اسلامی، پارس آباد مغان، ایران

2 استادیار، گروه ریاضی، واحد پارس‌آباد مغان، دانشگاه آزاد اسلامی، پارس‌آباد مغان، ایران

3 کارشناسی ارشد، گروه ریاضی، واحد پارس‌آباد مغان، دانشگاه آزاد اسلامی، پارس‌آباد مغان، ایران

چکیده

داده‌های گمشده در کاربردهای تحلیل پوششی داده‌ها یک بیماری مزمن محسوب می‌شوند. خیلی از ‏اوقات، متغیرهای مهم ورودی یا خروجی پوشش ناکامل دارند و یا اینکه واحدهای تصمیم‌گیری همه‌ی ‏آمارهای لازم را گزارش نمی‌کنند. بنابراین، مقادیر گمشده در ورودی‌ها و خروجی‌ها را نمی‌توان با ‏مدل‌های اصلی تحلیل پوششی داده‌ها مورد بررسی قرار داد. در این مقاله، روش‌هایی را برای پیدا کردن ‏داده‌های گمشده در حالتی که داده‌ها قطعی هستند، ارائه می‌کنیم. در این مقاله پس از تشریح مفاهیم ‏ضروری داده‌های گمشده، برخی از روش‌های جانهی داده‌های گمشده که موجب کاهش پیچیدگی تحلیل ‏داده‌ها می‌شود تشریح می‌شود. روش‌های مختلفی برای جانهی داده‌های بی‌پاسخ موجود است از جمله ‏روش‌های گوناگون جانهی ساده و جانهی چندگانه. این مقاله نخستین کوشش سیستماتیک برای ‏بهره‌مندی از داده‌های حاوی مقادیر گمشده با بهره‌مندی از رویکردهای آماری در ‏DEA‏ است. به طور ‏خاص، بررسی می‌کنیم که اگر درایه‌های خالی را در مجموعه‌ی داده‌ها نگه داریم و یک مقدار عددی خاص ‏به آن‌ها اختصاص دهیم، چه اتفاقی می‌افتد. برای نشان دادن طرز کار روش‌های پیشنهادی، از این روش‌ها ‏برای ارزیابی مجموعه‌ای از مدارس متوسطه‌ی دولتی یونان که برخی دارای مقادیر گمشده در ورودی ‏یا خروجی هستند، بهره‌مندی خواهد شد.‏

کلیدواژه‌ها


عنوان مقاله [English]

Data envelopment analysis with missing data

نویسندگان [English]

  • Bahman Fasihi 1
  • Hossein Azizi 2
  • Zeynab Gholizadeh Gazvar 3
1 PhD Student, Department of Mathematics, Pars Abad Moghan Branch, Islamic Azad University, Pars Abad Moghan, Iran
2 Assistant Professor, Department of Mathematics, Parsabad Moghan Branch, Islamic Azad University, Parsabad Moghan, Iran
3 M.Sc., Department of Mathematics, Parsabad Moghan Branch, Islamic Azad University, Parsabad Moghan, Iran
چکیده [English]

Missing data is a chronic disease in applications of data envelopment analysis. Very often, ‎important input or output variables are not completely specified and/or the decision-‎making units do not report all the required statistics. Therefore, the missing values in the ‎inputs and outputs cannot be studied using the original data envelopment analysis models. ‎This paper introduces methods for finding missing data when the existing data is certain. ‎In this article, after explaining the essential concepts of missing values, we describe some ‎methods of missing value imputation that reduce the complexity of data analysis. There ‎are several methods for imputing missing data, including various methods of simple ‎imputation and multiple imputation. This paper is the first systematic attempt to utilize data ‎containing missing values using statistical approaches in the DEA. In particular, we ‎examine what happens if we keep empty entries in the data set and assign a certain ‎numeric value to them. To show how the proposed methods work, they will be used to ‎evaluate a set of secondary public schools in Greece in some of which there are missing ‎input or output values.‎

کلیدواژه‌ها [English]

  • Data Envelopment Analysis
  • missing data
  • Efficiency measurement
[1]  Charnes, A., Cooper, W.W., Rhodes, E. Measuring the efficiency of decision making units, European Journal of Operational Research, 2, 1978, 429–444.
[2] Post, T., Cherchye, L., Kuosmanen, T. Nonparametric efficiency estimation in stochastic environments, Operations Research, 50(4), 2002, 645–655.
[3] Kuosmanen, T., Post, T., Scholtes, S. Non-parametric tests of productive efficiency with errors-in-variables. Journal of Econometrics, 136(1), 2007, 131–162.
[4] Simar, L., Wilson, P. Statistical inference in nonparametric frontier models: The state of the art. Journal of Productivity Analysis, 13(1), 2000, 49–78.
[5] Griliches, Z. Economic data issues. In: Griliches Z and Intriligator MD (eds). Handbook of Econometrics, Vol. III, Chapter 25. Elsevier: Amsterdam/New York, 1986.
[6] Kao, C., Liu, S.-T. Data envelopment analysis with missing data: An application to University libraries in Taiwan, Journal of the Operational Research Society, 51(8), 2000, 897–905.
[7] Gardijan, M., Lukač, Z. Measuring the relative efficiency of the food and drink industry in the chosen EU countries using the data envelopment analysis with missing data, Central European Journal of Operations Research, 26, 2018, 695–713.
[8] Chen, C., Ren, J., Tang, L., Liu, H. Additive integer-valued data envelopment analysis with missing data: A multi-criteria evaluation approach, PloS one, 15(6), 2020, e0234247.
[9] Duarte, L.T., Mussio, A.P., Torezzan, C. Dealing with missing information in data envelopment analysis by means of low-rank matrix completion, Annals of Operations Research, 286, 2020, 719–732.
[10] Stead, A.D., Wheat, P. The case for the use of multiple imputation missing data methods in stochastic frontier analysis with illustration using English local highway data, European Journal of Operational Research, 280(1), 2020, 59-77.
[11]  Little, R.J.A., Rubin, D.B. Statistical Analysis with Missing Data. New York: Wiley, 1987.
[12]  Fleiss J.L., Levin B., Paik M.C. Statistical Methods for Rates and Proportions. 3rd ed. New York: John Wiley & Sons, 2002.
[13]  Ibrahim, J.G., Chen, M.H., Lipsitz, S.R. Bayesian methods for generalized linear models with covariates missing at random, Canadian Journal of Statistics, 30(1), 2002, 55–78.
[14]  Karimlou, M., Jandaghi, G.R., Mohammad, K., Wolfe, R., Azam, K. A comparison of parameter estimates in standard logistic regression using WinBUGS MCMC and MLE methods in R for different sample sizes, Far East Journal of Theoretical Statistics, 19(2), 2006, 281–292.
[15]  Rubin, D.B. Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91, 1996, 473–489.
[16]  Little, R.J A., Rubin, D.B. Statistical Analysis with Missing Data, John Wiley and Sons, 2002.
[17]  Cohen, M.P. A new approach to imputation, American Statistical Association Proceedings of the Section on Survey Research Methods, 1996, 293–298.
[18]  Song, Q., Shepperd, M. Missing data imputation techniques, International Journal of Business Intelligence and Data Mining, 2(3), 2007, 261–291.
[19]  Dempster, A.P., Laird, N.M., Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B, 1977, 1–38.
[20]  Catellier, D.J., Hannan, P.J., Murray, D.M., Addy, C.L., Conway, T.L., Yang, S., Rice, J.C. Imputation of missing data when measuring physical activity by accelerometry, Medicine and science in sports and exercise, 37 (11 Suppl), 2005, 555–562.
[21]  Tanner, M.A., Wong W.H. The calculation of posterior distribution by dats augmentation (with discussion), Journal of the American Statistical Association, 82(398), 1987, 528–550.
[22]  Banker, R.D., Charnes, A., Cooper, W.W. Some models for estimating technical and scale inefficiencies in data envelopment analysis, Management Science, 30, 1984, 1078–1092.
[23]  Smirlis, Y.G. Maragos, E.K. and Despotis, D.K. Data envelopment analysis with missing values: An interval DEA approach, Applied Mathematics and Computation, 2006, 177, 1–10.
[24]  Conceicao Silva Portela, M.A., Thanassoulis, E. Decomposing school and school-type efficiency, European Journal of Operational Research, 132, 2001, 357–373.
[25]  Bradley, S., Johnes, G., Millington, J. The effect of competition of secondary schools in England, European Journal of Operational Research, 135, 2001, 545–568.
[26]  Kirjavainen, T., Loikkanen, H. Efficiency differences of Finnish Senior secondary schools: An application of DEA and Tobit analysis, Economics of Education Review, 1998, 17, 377–394.
[27]  Soteriou, A., Karahana, E., Papanastasiou, C., Diakourakis, M. Using DEA to evaluate the efficiency of secondary schools: The case of Cyprus, International Journal of Educational Management, 12, 1998, 65–73.
[28]  Maragos, E.K., Despotis, D.K. The evaluation of the efficiency with data envelopment analysis in case of missing values: A fuzzy approach, WSEAS Transactions on Mathematics, 3(3), 2004, 656–663.
[29]  Muñiz, M.A. Separating managerial inefficiency and external conditions in data envelopment analysis, European Journal of Operational Research, 143(3), 2002, 625–643.
[30]  Azizi, H., Amirteimoori, A., Kordrostami, S. Measurement of the worst practice of decision-making units: Incorporating both undesirable outputs and non-discretionary inputs into imprecise DEA, Modern Researches in Decision Making, 3(2), 2018, 197-222. (In Persian)
[31]  Azizi, H., Amirteimoori, A., Kordrostami, S. A data envelopment analysis approach with efficient and inefficient frontiers for supplier selection in the presence of both undesirable outputs and imprecise data, Modern Researches in Decision Making, 1(2), 2016, 139-170. (In Persian)
[32]  Azizi, H. Efficiency assessment in data envelopment analysis using efficient and inefficient frontiers, Management Research in Iran, 16(3), 2012, 153–173. (In Persian)
[33]  Azizi, H., Jahed, R. Supplier Selection in Volume Discount Environments in the Presence of Both Cardinal and Ordinal Data: A New Approach Based On Double Frontiers DEA, Management Research in Iran, 19(3), 2015, 191–217. (In Persian)
[34]  Azizi, H., Amirteimoori, A. Flexible Measures in Production Process: A New Approach Based On Double-Frontier DEA, Modern Researches in Decision Making, 2(2), 2017, 197-216. (In Persian)
[35]  Azizi, H. New models for selecting third-party reverse logistics providers in the presence of multiple dual-role factors: Data envelopment analysis with double frontiers, Decisions and Operations Research, 5(2), 2020, 221-232. (In Persian)