Aldina R. Avdić, Ulfeta A. Marovac, Dragan S. Janković

DOI Number
First page
Last page


The development of information technology increases its use in various spheres of human activity, including healthcare. Bundles of data and reports are generated and stored in textual form, such as symptoms, medical history, and doctor’s observations of patients' health. Electronic recording of patient data not only facilitates day-to-day work in hospitals, enables more efficient data management and reduces material costs, but can also be used for further processing and to gain knowledge to improve public health. Publicly available health data would contribute to the development of telemedicine, e-health, epidemic control, and smart healthcare within smart cities. This paper describes the importance of textual data normalization for smart healthcare services. An algorithm for normalizing medical data in Serbian is proposed in order to prepare them for further processing (F1-score=0,816), in this case within the smart health framework. By applying this algorithm, in addition to the normalized medical records, corpora of keywords and stop words, which are specific to the medical domain, are also obtained and can be used to improve the results in the normalization of medical textual data. 


telemedicine; e-health; epidemic control; smart healthcare; medical data mining.

Full Text:



bibitem {r1} {sc A. Solanas, {rm} C. Patsakis, {rm} M. Conti, {rm} I. S. Vlachos, {rm} V. Ramos, {rm} F. Falcone {rm and} A. Martinez-Balleste}: textit{Smart health: a context-aware health paradigm within smart cities}, IEEE Communications Magazine, vol. 52, no. 8, pp. 74-81, 2014.

bibitem{r2} {sc S. P. Mohanty, {rm} U. Choppali {rm and} E. Kougianos}: textit{Everything you wanted to know about smart cities: The internet of things is the backbone}, IEEE Consumer Electronics Magazine, vol.5, no. 3, pp. 60-70, 2016.

bibitem{r3} {sc S. M. Meystre, {rm} G. K. Savova, {rm} K. C. Kipper-Schuler {rm and} J. F. Hurdle}: textit{Extracting information from textual documents in the electronic health record: a review of recent research}, Yearbook of medical informatics, vol. 17, no. 1, pp. 128-144, 2008.

bibitem{r4} {sc W. Sun, {rm} Z. Cai, {rm} Y. Li, {rm} F. Liu, {rm} S. Fang {rm and} G. Wang}: "textit{Data processing and text mining technologies on electronic medical records: a review,} Journal of healthcare engineering, vol. 2018, pp. 1-10, 2018.

bibitem{r5} {sc I. Yoo, {rm} P. Alafaireet, {rm} M. Marinov, {rm} K. Pena-Hernandez, {rm} R. Gopidi, {rm} J. F., Chang {rm and} L. Hua}, textit{Data mining in healthcare and biomedicine: a survey of the literature}, Journal of medical systems, vol. 36, no. 4, pp. 2431-2448, 2012.

bibitem{r6} {sc K. Lee, {rm} S. A. Hasan, {rm} O. Farri, {rm} A. Choudhary, {rm and} A. Agrawal}, textit{Medical Concept Normalization for Online User-Generated Texts}, In 2017 IEEE International Conference on Healthcare Informatics (ICHI), pp. 462-469, 2017.

bibitem{r7}{sc Y. Wang,{rm} Z. Yu,{rm} Y. Jiang,{rm} et al.}: textit{ Automatic symptom name normalization in clinical records of traditional Chinese medicine}, BMC Bioinformatics 11, 40 (2010) doi:10.1186/1471-2105-11-40

bibitem{r8} {sc G. K. Savova, {rm} J. J. Masanz, {rm} P. V. Ogren, {rm} J. Zheng, {rm} S. Sohn, {rm} K. C. Kipper-Schuler {rm and} C. G. Chute}, textit{Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications}, Journal of the American Medical Informatics Association, vol. 17, no. 5, pp. 507-513, 2010.

bibitem{r9} {sc U. Marovac, {rm} A. Pljaskovic, {rm} A. Crnisanin {rm and} E Kajan}, textit{N-gram analysis of text documents in Serbian language}, In Telecommunications Forum (TELFOR), pp. 1385-1388 , 2012.

bibitem{facta} {sc A. Ljajić, {rm} U. Marovac {rm and} M. Stanković}: textit{Comparasion of the influnce of diferent normalization methods on tweet sentiment analysis in Serbian language}, Facta Universitatis (NIŠ) Ser. Math. Inform. Vol. 33, No 5 (2018), 683–696 , M51

bibitem{r10} {sc A. Pljasković, {rm} D. Avdić, {rm} U. Marovac, {rm} A. Crnišanin {rm and } D. Rančić }, textit{Pretraživanje dokumenata na srpskom jeziku za potrebe m-Uprave}, ETRAN, pp. RT4.6, 2013.

bibitem{r11} {sc P. Rajković, {rm} D. Janković {rm and} D. Vučković}: textit{Adaptation and Application of Daitch – Mokotoff SoundEx Algorithm on Serbian Names}, Conf. PRIM (book of abstracts), pp. 21, Kragujevac 2006.

bibitem{r12} {sc P. Rajković, {rm} D. Janković {rm and} D. Vucković}, textit{Using String Comparison Algorithms for Serbian Names}, Proceedings XLI International scientific conference on Information, communication and energy systems and technologies – ICEST, pp. 221-224, Sofia, June 29th – July 1st, 2006.

bibitem{r13}{sc H. Dalianis}, textit{Characteristics of Patient Records and Clinical Corpora. In: Clinical Text Mining}, Springer, Cham, 2018.

bibitem{r14} {sc M. Batty}, textit{Big data, smart cities and city planning}, Dialogues in Human Geography, vol. 3, no.3, pp. 274-279, 2013.

bibitem{r15} {sc A. Avdić {rm and} D. Janković}, textit{Healthcare in smart cities- privacy and security issues}, CPMMI 2018, 5th International Conference of Contemporary problems of mathematics, mechanics and informatics (CPMMI) Novi Pazar, Serbia, June 17-19, 2018

bibitem{r16} {sc V1. Vincze, {rm} G. Szarvas, {rm} R. Farkas, {rm}G. Móra {rm and} Csirik J}: textit{The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes}, BMC Bioinformatics. 2008 Nov 19;9 Suppl 11:S9. doi: 10.1186/1471-2105-9-S11-S9.

bibitem{r17} {sc C. Ehrentraut,{rm} H. Tanushi,{rm} J. Tiedemann,{rm and} H. Dalianis } (2012). textit{Detection of hospital acquired infections in sparse and noisy Swedish patient records.}, In Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012) Held in Conjunction with Coling 2012, Bombay. ACM Digital Library

bibitem{r18} {sc A. Ljajić {rm and} U. Marovac}: textit{Improving Sentiment Analysis for Twitter Data by Handling Negation Rules in the Serbian Language}. Computer Science and Information Systems, Vol. 16, No. 1, 289-311. (2019),

bibitem{icd10}textit{Međunarodna Statistička Klasifikacija Bolesti i Srodnih Zdravstvenih Problema, Deseta revizija},{sc editor Dr Miljan Ljubičić}, Institut za javno zdravlje Srbije “Dr Milan Jovanović Batut”, edition 2010, vol. 1,(2013)



  • There are currently no refbacks.

© University of Niš | Created on November, 2013
ISSN 0352-9665 (Print)
ISSN 2406-047X (Online)