COMPARATIVE STUDY: FEATURE SELECTION METHODS IN THE BLENDED LEARNING ENVIRONMENT

Gabrijela Dimić, Dejan Rančić, Ivan Milentijević, Petar Spalević, Katarina Plećić

DOI Number
-
First page
95
Last page
116

Abstract


Research presented in this paper deals with the unknown behavior pattern of students in the blended learning environment. In order to improve prediction accuracy it was necessary to determine the methodology for students` activities assessments. The Training set was created by combining distributed sources – Moodle database and traditional learning process. The methodology emphasizes data mining preprocessing phase: transformation and features selection. Information gain, Symmetrical Uncert Feature Eval, RelieF, Correlation based Feature Selection, Wrapper Subset Evaluation, Classifier Subset Evaluator features selection methods were implemented to find the most relevant subset. Statistical dependence was determined by calculating mutual information measure. Naïve Bayes, Aggregating One-Dependence Estimators, Decision tree and Support Vector Machines classifiers have been trained for subsets with different cardinality. Models were evaluated with comparative analysis of statistical parameters and time required to build them. We have concluded that the RelieF, Wrapper Subset Evaluation and mutual information present the most convenient features selection methods for blended learning environment. The major contribution of the presented research is selecting the optimal low-cardinal subset of students’ activities and a significant prediction accuracy improvement in blended learning environment.

Keywords

Blended Learning, Educational Data Mining, Features Selection, Mutual information

Full Text:

PDF

References


U. Fayyad,G. Shapiro, P. and P.Smyth, "From data mining to knowledge discovery in databases", AI Magazine, Vol.17, No.3, pp.37–54, 1996.

W. Frawley, J.Shapiro, G. P. and C.J. Matheus, "Knowledge discovery in databases: An overview", AI Magazine, Vol.13, No.3, pp.57–70, 1992

I.H.Witten and E.Frank, "Data mining – Practical Machine Learning tools and Techniques (3rd Edition.)," Book, Morgan Kaufmann Publisher, 2011.

C. Romero,S Ventura,"Educational Data Mining: a Survey from 1995 to 2005", Expert Systems with Applications, Vol.33, No.1, pp.135-146, 2007.

R. Baker, K.Yacef, "The State of Educational Data Mining in 2009: A Review and Future Visions", Journal of Educational Data Mining, No.1, Vol.1, pp.3-17, 2009.

P.-N. Tan, M. Steinbach, V.Kumar, "Introduction to Data Mining", Book, Publisher:Addison-Wesley, 2006.

J.Pearl,"Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference", Publishers Morgan Kaufmann, San Mateo, CA, 1988.

H. Zhang,"The optimality of Naive Bayes",Proc. 17th Int.FLAIRS conference, AAAI Press, 2004.

P. Domingos and M. Pazzani, "On the Optimality of the Simple Bayesian Classifier under Zero-One Loss", Machine Learning, Vol.29, Issue 2, pp.103–130, 1997.

G. Webb, J. Boughton, Z. Wang, "Not So Naive Bayes: Aggregating One-Dependence Estimators", Machine Learning, Vol.58, Issue 1, pp.5-24, 2005.

B. Minaei-Bidgoli, Kashy, D. A. Kortemeyer, G. and Punch, W. F."Predicting student performance: an application of data mining methods with an educational web-based system", 33rd Annual Conference on Frontiers in Education (FIE 2003), Vol. 1, pp. 13-18, 2003.

M. Cocea, S Weibelzah, "Can Log Files Analysis Estimate Learners’ Level of Motivation?", Workshop on Adaptivity and User Modeling in Interactive Systems, pp.32-35, 2006.

A. J. Angel, T. Daradoumis, J. Faulin and F. Xhafa, "A data analysis model based on control charts to monitor online learning processes", International Journal Business Intelligence and Data Mining, Vol.4, No.2, pp.159 – 174, 2009.

X.Li, Q.Luo and J.Yuan, "Personalized recommendation service system in e-learning using web intelligence", In Proc. 7th Int. conf. Computational Science, Part III, pages 531–538, ICCS 2007.

G. Chen at all. "Discovering Decision Knowledge from Web Log Portfolio for Managing Classroom Processes by Applying Decision Tree and Data Cube Technology", Journal of Educational Computing Research, Vol.23, No.3, pp.305–332, 2000.

S B. Kotsiantis, C.Pierrakeas, P.E. Pintelas, " Preventing student dropout in distance learning using machine learning techniques" In proc. 7th Int.Conf. on Knowledge-Based Intelligent Information and Engineering Systems (KES 2003), pp.267-274, 2003.

W. Hämäläinen, M. Vinni, "Comparison of machine learning methods for intelligent tutoring systems", In proc. of the 8th Int. Conf. on Intelligent Tutoring Systems, pp. 525-534, June 2006.

M.Cocea, S.Weibelzahl, "Cross-System Validation of Engagement Prediction from Log Files." Second European Conference on Technology Enhanced Learning, EC-TEL 2007, pp. 14–25, 2007.

C. Romero, S. Ventura, P.G. Espejo and C. Hervas. "Data mining algorithms to classify students", In Educational data mining 2008: Proceedings of the 1st international conference on educational data mining, 8–17, 2008.

M.Ilić, D. Rančić, P. Spalević, "Comparison of data mining algorithms, inverted index search and suffix tree clustering search", FACTA UNIVERSITATIS Series: Automatic Control and Robotics Vol. 15, No 3, 2016, pp. 171 - 185 DOI: 10.22190/FUACR1603171I

Moodle, a free open source course management system for online learning, http://moodle.org/ (2006)

A.L. Blum and P. Langley, "Selection of Relevant Features and Examples in Machine Learning," Artificial Intelligence, vol. 97, pp. 245-271, 1997.

H. Liu and H. Motoda, "Feature Extraction, Construction and Selection: A Data Mining Perspective", Springer-Verlag New York Inc 2013.

M. Ben-Bassat, "Pattern Recognition and Reduction of Dimensionality," Handbook of Statistics-II, pp. 773-791, North Holland, 1982.

P. Mitra, C.A. Murthy, and S.K. Pal, "Unsupervised Feature Selection Using Feature Similarity," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 3, pp. 301-312, March, 2002.

K. Kira and L.A. Rendell, "The Feature Selection Problem: Traditional Methods and a New Algorithm," In Proc.10th Nat’l Conf. Artificial Intelligence, pp. 129-134, 1992.

R. Kohavi and G.H. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, No. 1-2, pp. 273-324, 1997.

E. Leopold and J. Kindermann, "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?", Machine Learning, Vol. 46, pp. 423-444, 2002.

K. Nigam, A.K. Mccallum, S. Thrun, and T. Mitchell, "Text Classification from Labeled and Unlabeled Documents Using EM," Machine Learning, Vol. 39, pp.103-134, 2000.

Y. Rui, T.S. Huang, and S. Chang, "Image Retrieval: Current Techniques, Promising Directions and Open Issues", Journal of Visual Communication and Image Representation, Vol. 10, No. 1, pp. 39-62, 1999.

K.S. Ng and H. Liu, "Customer Retention via Data Mining," AI Review, Vol. 14, No.6, pp. 569-590, 2000.

W. Lee, S.J. Stolfo, and K.W. Mok, "Adaptive Intrusion Detection: A Data Mining Approach," AI Review, Vol. 14, No. 6, pp. 533-567, 2000.

E. Xing, M. Jordan, and R. Karp, "Feature Selection for High-Dimensional Genomic Microarray Data,"Proc. 18th Int'l Conf. Machine Learning,pp. 601-608, 2001.

Y.Yang and J Pederson, "A comparative study on feature selection in text categorization", Proceedings of ICML-97, 14th International Conference on Machine Learning, pp. 412–420, 1997.

G. Forman, "An extensive empirical study of feature selection metrics for text classification", Journal of Machine Learning Research, Vol.3, pp. 1289-1305, 2003.

M. Dash and H. Liu, "Feature selection for classification," Intelligent Data Analysis, Vol. 1, No. 1-4, pp.131–156, 1997.

M. Ramaswami, R. Bhaskaran, "A CHAID Based Performance Prediction Model in Educational Data Mining," International Journal of Computer Science Issues, Vol. 7, Issue 1, No. 1, January 2010.

Z. J. Kovacic, "Early Prediction of Student Success: Mining Students Enrolment Data", Informing Science & IT Education Conference, pp.647–665, 2010.

S. Kotsiantis, C. Pierrakeas, and P. Pintelas, "Prediction of Student’s Performance in Distance Learning Using Machine Learning Techniques", Applied Artificial Intelligence, Vol. 18, No. 5, pp. 411-426, 2004.

N. Rachburee and W. Punlumjeak, "A comparison of feature selection approach between greedy, IG-ratio, Chi-square, and mRMR in educational mining", 7th International Conference on Information Technology and Electrical Engineering , pp. 420-424, DOI:10.1109/ICITEED.2015.7408983, 2015.

B. Trstenjak and D. Đonko, "Determining the impact of demographic features in predicting student success in Croatia," Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention, pp. 1222-1227, DOI: 10.1109/MIPRO.2014.6859754, 2014.

K. Kira and L. Rendell,"A practical approach to feature selection", ML92 Proceedings of the 9th International Workshop on Machine Learning, pp. 249-256, Morgan Kaufmann Publishers Inc. San Francisco, CA, 1992.

R. Kohavi and G.H. John, "Wrappers for Feature Subset Selection", Artificial Intelligence, Vol. 97, Issues 1-2, pp. 273-324, 1997.

S. Piramuthu "Evaluating feature selection methods for learning in data mining applications" European Journal of Operational Research, Vol. 156, Issue 2, pp.483–494, 2004.

D. Koller and M. Sahami, "Toward optimal feature selection", Machine Learning: Proc. of the 13th International Conference, Morgan Kaufmann, 1996.

H. Liu and H. Motoda, "Feature Selection for Knowledge Discovery and Data Mining" Book, Kluwer Academic Publishers Norwell, MA, 1998.

W. Duch at all, "Feature Ranking, Selection and Discretization", International Conference on Artificial Neural Networks (ICANN) and International Conference on Neural Information Processing , pp. 251-254, 2003.

M.A. Hall, "Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning", Proc. 17th Int'l Conf. Machine Learning,pp. 359-366, 2000.

H. Liu and R. Setiono, "A Probabilistic Approach to Feature Selection-A Filter Solution", Proc. 13th Int'l Conf. Machine Learning,pp. 319-327, 1996.

L. Yu and H. Liu, "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution", Proc. 20th Int’l Conf.Machine Learning, pp. 856-863, 2003.

S. Das, "Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection", Proc. 18th Int’l Conf. Machine Learning, pp. 74-81, 2001.

M. Dash, H. Liu, "Consistency-based feature selection", Journal Artificial Intelligence, Vol.151, Issues 1-2 , pp.155-176, 2003.

J. R. Quinlan, "C4.5: Programs for Machine Learning", Book, Morgan Kaufmann Publishers Inc. San Francisco, CA, 1993 .

I. Kononenko, "Estimating features:Analysis and extensions of RELIEF", In: Proceedings of the 17th European Conference on Machine Learning, pp. 171-182, 1994.

L.Hu and L. Zhang, "Real-time internet traffic identification based on decision tree", World Automation Congress (WAC), pp.1-3, 2012.

M. A. Hall, "Correlation–based Feature Selection for Machine Learning" PhD thesis, University of Waikato, 1999.

G. Dimić, D. Prokin, K. Kuk, P.Spalević,"The use of data mining methods for analyzing and evaluating course quality in the Moodle system", Международна научна конференция УНИТЕХ’10, Габрово, pp. 309-315,2010.

H. Liu, L. Yu, "Toward integrating feature selection algorithms for classification and clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, pp.491-502, 2005.

R.O. Duda, P.E. Hart, and D.G. Stork,"Pattern classification", Book,Wiley-Interscience Publication, 2nd edition, 2000.

J. R. Landis and G.G. Koch, "The Measurement of Observer Agreement for Categorical Data", Biometrics, Vol. 33, No. 1, pp. 159-174, 1977.

Weka 3: Data Mining Software in Java, Available: http://www.cs.waikato.ac.nz/ml/weka/




DOI: https://doi.org/10.22190/FUACR1702095D

Refbacks

  • There are currently no refbacks.


Print ISSN: 1820-6417
Online ISSN: 1820-6425