THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY
Abstract
Keywords
Full Text:
PDFReferences
Aggarwal, C. C.: Machine Learning for Text. s.l.:Springer, 2018.
Alshammari, R.: Arabic Text Categorization using Machine Learning. International Journal of Advanced Computer Science and Applications, 9(3), pp. 226-230, 2018.
Batanović, V., Furlan, B. & Nikolić, B.: A Software System for Determining the Semantic Similarity of Short Texts in Serbian. Belgrade, 19thTelecommunications Forum (TELFOR) Proceedings of Papers, 2011.
Batanović, V., Nikolić, B. & Milosavljević, M.: Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset. Portorož, Slovenia, Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), 2016.
Batanović, V. & Nikolić, B.: Sentiment Classiffication of Documents in Serbian: The Effects of Morphological Normalization and Word Embeddings, Telfor Journal, 9(2), 2017.
Bird, S., Klein, E. & Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. s.l.:O'Reilly Media, Inc., 2009.
Ceska, Z. & Fox, C.: The Influence of Text Pre-processing on Plagiarism Detection. Borovets, Bulgaria, International Conference RANLP, 2009.
Feldman, R. & Sanger, J.: The Text Mining Handbook. s.l.:Cambridge University Press, 2006.
Jones, T.: Serbian Stemmer Analysis. [Online], 2017., Available at: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Serbian_Stemmer_Analysis [Accessed October 2018].
Kajan, E., Pljasković, A. & Crnišanin, A.: Normalizacija tekstualnih dokumenata na sprskom jeziku u cilju effikasnijeg pretraživanja u sistemima e-uprave. Zlatibor, ETRAN, 2012.
KAPK: Commission for accreditation and quality assurance, Guide for students. [Online], 2018. Available at: http://www.kapk.org [Accessed 2018].
Kešelj, V. & Šipka, D.: For the greedy and the optimal subsumption-based stemmer for Serbian: A Suffix Subsumption-Based Approach to Building Stemmers and Lemmatizers for Highly Inflectional Languages with Sparse Resources. Infotheca, Tom 9(1-2), pp. 23a-33a, 2008.
Lita, L. V., Ittycheriah, A., Roukos, S. & Kambhatla, N.: Truecasing. Sapporo, Japan, ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 2003.
Ljubešić, N., Boras, D. & Kubelka, O.: Retrieving Information in Croatian: building a simple and efficient rule-based stemmer. Zagreb, 1st International Conference The Future of Information Sciences (INFuture), 2007.
Manning, C. D., Raghavan, P. & Schütze, H.: Introduction to Information Retrieval. s.l.:Cambridge University Press, 2008.
Milošević, N.: Stemmer for Serbian language, s.l.: arXiv preprint arXiv:1209.4471, 2012.
Miner, G. et al.: Practical Text Mining and Statistical Analysis for Nonstructured Text Data Applications. s.l.:Academic Press, 2012.
Porter, M. F.: An algorithm for suffix stripping. Program, 14(3), pp. 130-137, 1980.
Schütze, H. & Silverstein, C.: Projections for efficient document clustering. Philadelphia, Pennsylvania, SIGIR '97 Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, 1997.
Službeni glasnik RS: Pravilnik o standardima i postupku za akreditaciju visokoškolskih ustanova, Službeni glasnik RS, broj 88/17. [Online], 2017. Available at: http://www.kapk.org/en/accreditation/ [Accessed 2018]
Službeni glasnik RS: Zakon o visokom obrazovanju, Službeni glasnik Republike Srbije, broj 73/18. [Online], 2018. Available at: http://www.parlament.gov.rs [Accessed 2018].
Stranieri, A. & Zeleznikow, J.: Knowledge Discovery from Legal Databases. s.l.:Springer, 2005.
Vitas, D. et al.: The serbian language in the digital age. s.l.:Springer, Berlin, Heidelberg, 2012.
DOI: https://doi.org/10.22190/FUMI1905973D
Refbacks
- There are currently no refbacks.
ISSN 0352-9665 (Print)