A STRUCTURE BASED ON TROCR TRANSFORMER AND LARGE LANGUAGE MODEL FOR CLASSIFICATION OF HANDWRITTEN TEXTS

Hossein KardanMoghaddam; Adel Akbarimajd; Mohammad Ranjbarpour; Mahdi Nooshyar; Shahram Jamali

A STRUCTURE BASED ON TROCR TRANSFORMER AND LARGE LANGUAGE MODEL FOR CLASSIFICATION OF HANDWRITTEN TEXTS

Hossein KardanMoghaddam, Adel Akbarimajd, Mohammad Ranjbarpour, Mahdi Nooshyar, Shahram Jamali

DOI Number

https://doi.org/10.2298/FUEE2504697K

First page

697

Last page

714

Abstract

Processing handwritten texts and classification and their content analysis are among the most important problems in the realm of text analysis. Microsoft has presented pre-trained TrOCR models for printed and hand-written texts. These models due to prior pre-training are better starting point for image processing. For using TrOCR with the aim of detecting printed and handwritten texts, we can use fine-tuning technique on pre-trained model using different datasets. This process helps the model to learn better the specific features of image processing and hand-written or semi-handwritten texts. TrOCR uses transformer models for OCR and its fine-tuning on special datasets especially, hand-written datasets is a common task. TrOCR model from Microsoft extracts text from these images, and in this research a structure based on TrOCR and LLM has been proposed whose aim is extraction of hand-written texts from existing images in a dataset (English handwritten line dataset) and converting them to text data and then this data has been given to LLM as an input so that the extracted texts can be classified (using BART model) based on different subjects and contents.

Keywords

Large Language Model, Fine-tuning, TrOCR, Neural Network, Deep Learning

Full Text:

PDF

References

D. W. Otter, J. R. Medina and J. K. Kalita, "A Survey of the Usages of Deep Learning for Natural Language Processing," IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604-624, Feb. 2021.

A. Khosravi and H. Abdolhosseini, "Personality in Social Networks Using Thematic Modelling of User Feedback", Soft Comput. J., vol. 11, no. 2, pp. 51-60, 2023.

F. Pourgholamali, M. Kahani and E. Asgarian, "Exploiting Big Data Technology for Opinion Mining", Soft Comput. J., vol. 9, no. 1, pp. 26-39.

F. A. Acheampong, C. Wenyu and H. NunooMensah, "Text-based Emotion Detection: Advances, Challenges, and Opportunities", Eng. Rep., vol. 2, no. 7, p. e12189, 2020.

B. Kratzwald, S. Ilić, M. Kraus, S. Feuerriegel and H. Prendinger, "Deep Learning for Affective Computing: Text-Based Emotion Recognition in Decision Support," Decis. Support Syst., vol. 115, pp. 24-35, Nov. 2018.

F. Zare Mehrjardi, M. Yazdian-Dehkordi and A. Latif, "Evaluating Classical Machine Learning and Deep-Learning Methods in Sentiment Analysis of Persian Telegram Message", Soft Comput. J., vol. 11, no. 1, pp. 88-105, 2022.

M. Feizi-Derakhshi, Z. Mottaghinia and M. Asgari-Chenaghlu, "Persian Text Classification Based on Deep Neural Networks", Soft Comput. J., vol. 11, no. 1, pp. 120-139.

S. Freyberg and H. Hauser, "The Morphological Paradigm in Robotics", Stud. Hist. Philos. Sci., vol. 100, pp. 1-11, 2023.

A. Ganesh, A. Jaya and C. Sunitha, "An Overview of Semantic Based Document Summarization in Different Languages", ECS Trans, vol. 107, no. 1, pp. 6007-6017, 2022.

L. Geiszler, "Imitation in Automata and Robots: A Philosophical Case Study on Kempelen", Stud. Hist. Philos. Sci., vol.100, pp. 22-31, Aug. 2023.

V. K. Finn, "Exact Epistemology and Artificial Intelligence", Autom. Document. Math. Linguist., vol. 54, pp.140-173, 2020.

F. Ansari, "Knowledge Management 4.0: Theoretical and Practical Considerations in Cyber Physical Production Systems”, IFAC-PapersOnLine, vol. 52, no. 13, pp. 1597-1602, 2019.

E. Baralis, L. Cagliero, S. Jabeen, A. Fiori and S. Shah, "Combining Semantics and Social Knowledge for News Article Summarization", in Data Mining and Analysis in the Engineering Field, IGI Global, 2014, pp. 209-230.

L. Waardenburg and M. Huysman, "From Coexistence to Co-creation: Blurring Boundaries in the Age of AI", Inf. Organ., vol.32, no. 4, p. 100432, Dec 2022.

S. Gupta, S. Modgil, A. Kumar, U. Sivarajah and Z. Irani, "Artificial Intelligence and Cloud-Based Collaborative Platforms for Managing Disaster, Extreme Weather and Emergency Operations", Int. J. Prod. Econ., vol. 254, p. 108642, Dec 2022.

C. R. Dhivyaa, K. Nithya, R. Dharshini, R. Sudhakar., K. Sathis Kumar and T. Janani. "Fine-Tuned Convolutional Neural Networks for Tamil Handwritten Text Recognition.", In Proceedings of the 8th International Conference on Communication and Electronics Systems (ICCES), 2023, pp. 887-893.

D. Parres and R. Paredes, "Fine-Tuning Vision Encoder–Decoder Transformers for Handwriting Text Recognition on Historical Documents" In Proceedings of International Conference on Document Analysis and Recognition. 2023, pp. 253-268.

M. P. Kalra, A. Kushwaha and P. P. Vuppuluri, "LLM Powered HTR: Integrating Handwritten Text Recognition System with Large Language Model," In Proceedings of IEEE Students Conference on Engineering and Systems (SCES), Prayagraj, India, 2024, pp. 1-6.

J. Kohút and M. Hradiš. "Fine-tuning is a Surprisingly Effective Domain Adaptation Baseline In Handwriting Recognition", In Proceedings of International Conference on Document Analysis and Recognition, arXiv:2302.06308, 2023.

P. Kumar and B. Raman, "A BERT Based Dualchannel Explainable Text Emotion Recognition System," Neural Netw., vol. 150, pp. 392-407, June 2022.

M. Badpeima, H. Shirazi and S.S. Sadidpur, "Determining the Polarity of Persian Texts Using LSTM Recurrent Networks", In Proceedings of the 3rd International Conference on Electrical, Electronic, and Computer Engineering, Norway, 2016 [In Persian].

A. Onan, "Sentiment Analysis on Massive Open Online Course Evaluations: A Text Mining and Deep Learning Approach", Comput. Appl. Eng. Educ., vol. 29, no. 3, pp. 572-589, 2021.

H. Han, Z. Ke, X. Nie, L. Dai and W. Slamu, "Multimodal Fusion with Dual-Attention Based on Textual Double-Embedding Networks for Rumor Detection," Appl. Sci., vol. 13, no. 8, p. 4886, 2023.

N. Majma and S.Bashtin, "Detection of Plagiarism in Scientific Texts Based on Text Blocking and Cosine Similarity Criteria", Soft Comput. J., vol. 11, no. 1, pp.60-71, 2022.

R. Behzadidoost, F. Mahan and H. Izadkhah, "Granular Computing-Based Deep Learning for Text Classification," Inf. Sci., vol. 652, p. 119746, 2024.

B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2298-2304, 2017.

M. Li, T. Lv, J. Chen, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li and Wei, F., " TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models", In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 13094-13102, 2023.

X. Zhou, C. Yao, H. Wen, et al., " EAST: An Efficient and Accurate Scene Text Detector", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp.5551-5560.

H. Sak, A. Senior and F. Beaufays, "Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition",, arXiv:1402.1128, 2014.

[Online]. Available: https://github.com/JaidedAI/EasyOCR

[Online]. Available: https://github.com/PaddlePaddle/PaddleOCR

[Online]. Available: https://github.com/faustomorales/keras-ocr

G. Kim, T. Hong, M. Yim, J. Y. Nam, J. Park, J. Yim, W. Hwang, S. Yun, D. Han and S. Park. "Ocr-Free Document Understanding Transformer", In Proceedings of European Conference on Computer Vision, 2022, pp. 498-517.

Y. Xu, M. Li, L. Cui, S. Huang, F. Wei and M. Zhou, "Layoutlm: Pre-training of text and layout for document image understanding." Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020, pp. 1192-1200.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez and I. Polosukhin, " Attention is All You Need", Adv. Neural Inf. Process. Syst., vol. 30, pp. 1-15, 2017.

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell and S. Agarwal, " Language Models Are Few-Shot Learners", Adv. Neural Inf. Process. Syst., vol. 33, pp. 1877-1901, 2020.

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat and R. Avila, "Gpt-4 Technical Report", arXiv preprint arXiv:2303.08774, 2023.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding", In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171-4186.

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena and P. J. Liu, " Exploring the Limits of Transfer Learning with a Unified Text-To-Text Transformer", J. Mach. Learn. Res., vol. 21, no. 140, pp. 1-67, 2020.

K. Lv, Y. Yang, T. Liu, Q. Gao, Q. Guo, and X. Qiu, "Full Parameter Fine-Tuning for Large Language Models with Limited Resources", arXiv preprint arXiv: 2306.09782, 2023.

D. Narayanan, M. Shoeybi, J. Casper, P. LeGresley, M. Patwary, V. Korthikanti, D. Vainbrand, P. Kashinkunti, J. Bernauer, B. Catanzaro and A. Phanishayee, "Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LМ", In Proceedings of the International Conference for High Performance Computing, Networking, Storage аnd Analysis, 2021, pp. 1-15.

O. Sharir, B. Peleg, and Y. Shoham, "The Cost of Training NLP Models: A Concise Overview," arXiv preprint arXiv: 2004.08900, 2020.

J. Dodge, G. Ilharco, R. Schwartz, A. Farhadi, H. Hajishirzi and N. Smith, "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, And Early Stopping", arXiv preprint arXiv:2002.06305, 2020.

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov et al. "Llama 2: Open Foundation and Fine-Tuned Chat Models", arXiv preprint arXiv:2307.09288, 2023.

Research Group on Computer Vision and Artificial Intelligence — Computer Vision and Artificial Intelligence (https://fki.tic.heia-fr.ch/databases).

Handwritten Line Text Recognition using Deep Learning with Tensorflow. [Online]. Available: https://github.com/sushant097/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow

English handwritten line dataset. [Online]. Available: https://www.kaggle.com/datasets/sushant097/english-handwritten-line-dataset

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov and L. Zettlemoyer, "BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, And Comprehension", arXiv preprint arXiv:1910.13461, 2019.

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov, "Roberta: A Robustly Optimized BERT Pretraining Approach", arXiv preprint arXiv:1907.11692, 2019.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever. "Language Models are Unsupervised Multitask Learners" OpenAI blog, vol. 1, no. 8, p. 9, 2019.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", arXiv preprint arXiv:2010.11929, 2020.

Y.-M, Chae and T. Davidson, "Large Language Models for Text Classification: From Zero-Shot Learning to Fine-Tuning", SocArXiv: 10.31235/osf.io/sthwk, Aug. 2023.

Z. Wang, Y. Pang and Y. Lin. "Large Language Models Are Zero-Shot Text Classifiers." arXiv preprint arXiv:2312.01044, 2023.

S. Zhong, J. Zeng, Y. Yu and B. Lin, "Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs", Int. J. Data Sci. Anal., vol. 2025, pp. 1-22, 2025.

F. Dennstaedt, P. Windisch, I. Filchenko, J. Zink, P. M. Putora, A. Shaheen, R. Gaio, N. Cihoric, M. Wosny, S. Aeppli, et al., " Application of a General Large Language Model-Based Classification System to Retrieve Information about Oncological Trials ", medRxiv: 10.1159/000546946, 2024.

Y. Guo, A. Ovadje, M. A. Al-Garadi and A. Sarker, "Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data", J. Am. Med. Inform. Assoc., vol. 31, no. 10, pp. 2181-2189, 2024.

M. Liu and G. Shi, "Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning", arXiv preprint arXiv:2409.01466, 2024.

M. M. Mohajeri, M. J. Dousti and M. N. Ahmadabadi, "CoCoP: Enhancing Text Classification with LLM through Code Completion Prompt", arXiv preprint arXiv:2411.08979, 2024.

Y. Zhang, M. Wang, Q. Li, P. Tiwari and J. Qin, "Pushing the Limit of LLM Capacity for Text Classification", In Proceedings of the ACM on Web Conference, 2025, pp. 1524-1528.

Yin, Kai, Chengkai Liu, Ali Mostafavi, and Xia Hu., "CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-Label Social Media Text Classification in Disaster Informatics", arXiv preprint arXiv:2406.15477, 2024.

F. Di Palo, P. Singhi and B. Fadlallah. "Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale", arXiv preprint arXiv:2411.05045, 2024.

Refbacks

There are currently no refbacks.

ISSN: 0353-3670 (Print)

ISSN: 2217-5997 (Online)

COBISS.SR-ID 12826626

Username
Password
Remember me