Comparison of the Influence of Different Normalization Methods on Tweet Sentiment Analysis in the Serbian language

Adela Ljajić, Ulfeta Marovac, Milena Stanković

Given the growing need to quickly process texts and extract information from the data for various purposes, correct normalization that will contribute to better and faster processing is of great importance. The paper presents the comparison of different methods of short text (tweet) normalization.  The comparison is illustrated by the example of text sentiment analysis.  The results of an application of different normalizations are presented, taking into account time complexity and sentiment algorithm classification accuracy. It has been shown that using cutting to n-gram normalization, better or similar results are obtained compared to language-dependent normalizations. Including the time complexity, it is concluded that the application of this language-independent normalization gives optimal results in the classification of short informal texts.


sentiment analysis, normalization, stemming, n-gram, lemmatization, data mining

