### COMPARATIVE STUDY OF MUTATION OPERATORS IN THE GENETIC ALGORITHMS FOR THE K-MEANS PROBLEM

**DOI Number**

**First page**

**Last page**

#### Abstract

The k-means problem and the algorithm of the same name are the most commonly used clustering model and algorithm. Being a local search optimization method, the k-means algorithm falls to a local minimum of the objective function (sum of squared errors) and depends on the initial solution which is given or selected randomly. This disadvantage of the algorithm can be avoided by combining this algorithm with more sophisticated methods such as the Variable Neighborhood Search, agglomerative or dissociative heuristic approaches, the genetic algorithms, etc. Aiming at the shortcomings of the k-means algorithm and combining the advantages of the k-means algorithm and rvolutionary approack, a genetic clustering algorithm with the cross-mutation operator was designed. The eﬃciency of the genetic algorithms with the tournament selection, one-point crossover and various mutation operators (without any mutation operator, with the uniform mutation, DBM mutation and new cross-mutation) are compared on the data sets up to 2 millions of data vectors. We used data from the UCI repository and special data set collected during the testing of the highly reliable semiconductor components. In this paper, we do not discuss the comparative eﬃciency of the genetic algorithms for the k-means problem in comparison with the other (non-genetic) algorithms as well as the comparative adequacy of the k-means clustering model. Here, we focus on the inﬂuence of various mutation operators on the eﬃciency of the genetic algorithms only.

#### Keywords

#### Full Text:

PDF#### References

Z.-W. Xu: Cloud-Sea Computing Systems: Towards Thousand-Fold Improvement in Performance per Watt for the Coming Zettabyte Era. Journal of Computer Science and Technology, 29 (2) (2014), 177-181, DOI:10.1007/s11390-014-14202.

S. Vempala and G. Wang: A spectral algorithm for learning mixtures of distributions. FOCS. (2002), 841-860.

L. Kazakovtsev and A. Antamoshkin: Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica, 38 (3) (2014), 229240.

A. S. Shirkhorshidi, S. Aghabozorgi and T. Y. Wah: A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data. PLoS ONE 10 (12) (2015), e0144059, DOI:10.1371/journal.pone.0144059.

P. Olukanmi, F. Nelwamondo and T. Marwala: Rethinking k-means clustering in the age of massive datasets: a constant-time approach. Neural Comput & Applic. (2019), DOI:10.1007/s00521-019-04673-0

H. Steinhaus: Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci. Cl. III. IV (1956), 801-804.

J. B. MacQueen: Some Methods of Classiﬁcation and Analysis of Multivariate Observations. Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, 1 (1967), 281–297.

H. Norman: SPSS Statistical Package for the Social Sciences. Encyclopedia of Information Systems 13(1) (2003), 187-196.

D. Arthur and S. Vassilvitskii: K-Means++: The Advantages of Careful Seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA 2007), New Orleans, Louisiana, USA (2007), DOI: 10.1145/1283383.1283494.

B. B. Bhusare and S. M. Bansode: Centroids Initialization for K-Means Clustering using Improved Pillar Algorithm. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET). 3 Issue 4 (2014).

S. Mahmud, M. Rahman and N. Akhtar: Improvement of K-means clustering algorithm with better initial centroids based on weighted average. 7th International Conference on Electrical and Computer Engineering. IEEE. (2012), ISBN 9781467314367. DOI:10.1109/icece.2012.6471633.

M. Mitchell, J. H. Holland and S. Forrest: When Will a Genetic Algorithm Outperform Hill Climbing. Advances in neural information processing systems (1994), 51-58.

J. D. Bagley: The behavior of adaptive systems which employ genetic and correlation algorithms: technical report. University of Michigan, 1967.

J. Holland: Genetic Algorithms, computer programs that evolve in ways that even their creators do not fully understand. Scientiﬁc American 267 (1) (1992), 66-72.

D. E. Goldberg: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, New York, 1989.

A. Konak, D. W. Coit and A. E. Smith: Multi-objective optimization using genetic algorithms: a tutorial. Reliability Engineering & System Safety 91(9) (2006), 992-1007, DOI:10.1016/j.ress.2005.11.018.

J. J. Grefenstette: Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems Man and Cybernetics 16 (1) (1986), 122-128.

O. Alp, E. Erkut and Z. Drezner: An Eﬃcient Genetic Algorithm for the p-Median Problem. Annals of Operations Research 122 (2003), 21-42, DOI:10.1023/A:1026130003508.

M. N. Neema, K. M. Maniruzzaman and A. Ohgai: New Genetic Algorithms Based Approaches to Continuous p-Median Problem. Netw. Spat. Econ. 11, (2011), 83-99, DOI:10.1007/s11067-008-9084-5.

Comparative Study of Mutation Operators in the k-Means GA 613

M. Srinivas, L. M. Patnaik: Genetic algorithms: a survey. Computer 27 (6) (1994), p.17-26, DOI: 10.1109/2.294849.

M. Garey, D. Johnson and H. Witsenhausen: The complexity of the generalized Lloyd - Max problem. IEEE Transactions on Information Theory 28 (2) (1982), 255-256. DOI:10.1109/TIT.1982.1056488.

D. Aloise, A. Deshpande, P. Hansen and P. Popat: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75 (2) (2009) 245-249, DOI:10.1007/s10994-009-5103-0.

S. Dasgupta and Y. Freund: Random Projection Trees for Vector Quantization. IEEE Transactions on Information Theory 55 (7) (2009), 32293242, DOI:10.1109/TIT.2009.2021326.

D. Goldberg: Genetic Algorithms in Search, Optimization, And Machine Learning. Addison-Wesley, New York, (1989).

P. Chi: Genetic Search with Proportion Estimation Proceedings of the Third Int. Con. on Genetic Algorithms (ICGA), San Mateo, California (1989), 92-97.

Y. Hu, J. Bi: K-means clustering algorithm based on genetic optimization. Journal of Computer System Applications 19(6) (2010), 52-55.

J. A. Hartigan and M. A. Wong: Algorithm AS 136:A K-means clustering algorithm. Appl. Stat. 28(1) (2013), 100-108.

K. Krishna and M. Murty: Genetic K-means algorithm. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics 29(3) (1999), 433-439.

P. Rousseeuw: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53-65.

B. Auffarth: Clustering by a genetic algorithm with biased mutation operator. IEEE Congress on Evolutionary Computation, Barcelona (2010), 1-8, DOI: 10.1109/CEC.2010.5586090.

G. Schwarz: Estimating the Dimension of a Model. Annals of Statistics 6 (2) (1978), 461-464, DOI:10.1214/aos/1176344136.

R. Tibshirani, G. Walther and T. Hastie: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society 36 (2001), 411-423.

W. M. Rand: Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association 66(336) (1971), 846-850, DOI:10.1080/01621459.1971.10482356.

J. H. Holland: Adaptation in natural and artiﬁcial systems. MIT Press, Cambridge (1992).

D. B. Fogel and J. Atmar: Comparing genetic operators with gaussian mutations in simulated evolutionary processes using linear systems. Biol. Cybern. 63 (1990), 111-114.

C. Liu and A. Kroll: On designing genetic algorithms for solving small- and medium-scale traveling salesman problems. LNCS 7269 (2012), 283-291.

E. Osaba, R. Carballedo, F. Diaz, E. Onieva, I. de la Iglesia and A. Perallos: Crossover versus mutation: a comparative analysis of the evolutionary strategy of genetic algorithms applied to combinatorial optimization problems. Sci. World. J. (2014), DOI:10.1155/2014/154676

J. Walkenhorst, T. Bertram: (2011) Multikriterielleoptimierungsverfahren fr pickup-and-delivery-probleme. Proceedings of 21. Workshop computational intelligence, Dortmund, Germany (2011), 6176.

S. S. Cheng, Y. H. Chao, H. M. Wang and H. C. Fu: A Prototypes-Embedded Genetic K-means Algorithm. 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong (2006), 724-727, DOI:10.1109/ICPR.2006.155.

E. S. Correa, M. T. A. Steiner, A. A. Freitas and C. Carnieri: A Genetic Algorithm for the P-median Problem. Proc. 2001 Genetic and Evolutionary Computation Conference (GECCO-2001), San Francisco (2001), 1268-1275.

Y. Alkhalifah and R. L. Wainwright: A genetic algorithm applied to graph problems involving subsets of vertices. Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753), Portland, OR, USA 1 (2004), 303-308, DOI:10.1109/CEC.2004.133087.

R. Dash and R. Dash: Comparative Analysis of K-means and Genetic Algorithm based Data Clustering. International Journal of Advanced Computer and Mathematical Sciences 3 (2) (2012), 257-265.

M. Mahmoudi and K. Shahanaghi: A Genetic Algorithm For P-Median Location Problem. IJERA 3 (1) (2013), 386-389.

L. A. Kazakovtsev, V. I. Orlov, A. A. Stupina and V. L. Kazakovtsev: Modiﬁed genetic algorithm with greedy heuristic for continuous and discrete pmedian problems. Facta universitatis - series: Mathematics and Informatics 30 (1) (2015), 89-106.

N. Alibabaie, M. Ghasemzadeh and C. Meinel: A variant of genetic algorithm for non-homogeneous population. ITM Web of Conferences 9 (2017), 02001, DOI:10.1051/itmconf/20170902001.

O. Hall and I. Barak, J. C. Bezdek: Clustering with a genetically optimized approach. IEEE Trans. Evo. Computation 3(3) (1999), 103-112.

I. P. Rozhnov, V. I. Orlov and L. A. Kazakovtsev: VNS-Based Algorithms for the Centroid-Based Clustering Problem. Facta Universitatis - Series: Mathematics and Informatics 34 (5) (2019), 957-972, DOI:10.22190/FUMI1905957R.

Individual household electric power consumption Data Set. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption], access date 28.05.2020.

V. I. Orlov and V. V. Fedosov: ERC clustering dataset (2016) [http://levk.info/Data1526 7parts.csv].

V. I. Orlov, D. V. Stashkov, L. A. Kazakovtsev, I. P. Rozhnov, O. B. Kazakovtseva and I. R. Nasyrov: Improved method of forming production batchs of electronic components with special quality requirements. Modern high technology. 1 (2018), 37-42.

X. Tao, X. Hu and Y. Liu: Review of big data research. Journal of System Simulation (2013), 142-146.

Y.Wang, X. Jin and X. Cheng: Network Big Data: Status and Prospect. Chinese Journal of Computers 06 (2013), 3-16.

DOI: https://doi.org/10.22190/FUMI2004091L

### Refbacks

- There are currently no refbacks.

ISSN 0352-9665 (Print)