SUCCESS OF AI MATH SOLVER TOOL IN SOLVING NON-STANDARD MATHEMATICS COMPETITION PROBLEMS

Marko Stanković; Aleksandar Milenković; Marina Svičević; Nemanja Vučićević

doi:10.22190/FUTLTE250429005S

SUCCESS OF AI MATH SOLVER TOOL IN SOLVING NON-STANDARD MATHEMATICS COMPETITION PROBLEMS

Marko Stanković, Aleksandar Milenković, Marina Svičević, Nemanja Vučićević

DOI Number

https://doi.org/10.22190/FUTLTE250429005S

First page

Last page

Abstract

Artificial intelligence is increasingly transforming how students learn, including their approach to mathematics and problem-solving, by offering additional support and assistance—a trend that continues to attract research interest. One line of research focuses on helping students prepare for math competitions by solving more complex mathematical problems. In addition to regular national math competitions, which allow students to progress to international mathematical Olympiads, there are also competitions aimed at popularizing mathematics and developing logical thinking in students. One such competition is the international Kangaroo competition. In this paper, we examine the performance of the AI Math Solver, available on the Interactive Mathematics platform, in solving tasks from the 2024 Kangaroo competition. The selected tasks targeted three student categories: 3^rd and 4^th grade elementary, 7th and 8th grade elementary, and 3^rd and 4^th grade high school students. The problems were uploaded as images (screenshots) in both Serbian and English, since visual elements frequently appear in the problem formulations and answer choices in the Kangaroo competition. The results are presented in two sections: a qualitative analysis of selected problems that illustrate common patterns and errors, and a quantitative analysis that summarizes the tool’s overall performance. Out of a total of 84 tasks, in both Serbian and English, the solver correctly answered 24, corresponding to a success rate of just under 30% in both languages. Furthermore, some tasks solved in Serbian were not solved in English, and vice versa. Additionally, differences were observed in the distribution of correct answers across tasks of varying difficulty levels.

Keywords

AI tools, Kangaroo competition, math education, non-standard tasks

Full Text:

PDF

References

Ahn, J., Verma, R., Lou, R., Liu, D., Zhang, R., & Yin, W. (2024). Large Language Models for Mathematical Reasoning: Progresses and Challenges. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, 225–237. Association for Computational Linguistics.https://aclanthology.org/2024.eacl-srw.17/

Akveld, M., Caceres-Duque, L. F., Nieto Said, J. H., & Sánchez Lamoneda, R. (2020). The Math Kangaroo Competition. Espacio Matemático1(2), 74-91. https://doi.org/10.3929/ETHZ-B-000456237

Castelvecchi, D. (2024). DeepМind hits milestone in solving maths problems — AI’s Next Grand Challenge. Nature, 632(8024), 236–237. https://doi.org/10.1038/d41586-024-02441-2

Cherian, A., Peng, K., Lohit, S., Smith, K.A., & Tenenbaum, J.B. (2023). Are Deep Neural Networks SMARTer Than Second Graders? 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10834-10844.https://doi.org/10.1109/cvpr52729.2023.01043

Cherian, A., Peng, K.-C., Lohit, S., Matthiesen, J., Smith, K., & Tenenbaum, J.B. (2025). Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads. NIPS '24: Proceedings of the 38th International Conference on Neural Information Processing Systems, 15779-15800 https://dl.acm.org/doi/10.5555/3737916.3738420

DeepMind. (2024). AI achieves silver-medal standard solving International Mathematical Olympiad problems. DeepMind Blog. Retrieved December 2, 2024, from https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

Elbanna, S., & Armstrong, L. (2023). Exploring the integration of ChatGPT in education: adapting for the future. In Management & Sustainability: An Arab Review3(1), 16–29.https://doi.org/10.1108/msar-03-2023-0016

Frieder, S., Pinchetti, L., Chevalier, A., Griffiths, R.R., Salvatori, T., Lukasiewicz, T., Petersen, P., & Berner, J. (2024). Mathematical Capabilities of ChatGPT. Proceedings of the 37th International Conference on Neural Information Processing Systems, 27699–27744. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2023/file/58168e8a92994655d6da3939e7cc0918-Paper-Datasets_and_Benchmarks.pdf

Koncel-Kedziorski, R., Roy, S., Amini, A., Kushman, N., & Hajishirzi, H. (2016). MAWPS: A Math Word Problem Repository. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1152–1157. Association for Computational Linguistics. https://doi.org/10.18653/v1/n16-1136

Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences13(4), 410. MDPI. https://doi.org/10.3390/educsci13040410

Lu, P., Bansal, H., Xia, T., Liu, J., Li, C., Hajishirzi, H., Cheng, H., Chang, K.-W., Galley, M., & Gao, J. (2024). MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts. Proceedings of ICLR.https://openreview.net/attachment?id=KUNzEQMWU7&name=pdf

Marchisio, K., Ko, W., Bérard, A., Dehaze, T., & Ruder, S. (2024). Understanding and mitigating language confusion in LLMs. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6653–6677.Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.380

Memarian, B., & Doleck, T. (2023). ChatGPT in education: Methods, potentials, and limitations. Computers in Human Behavior: Artificial Humans 1(2), 100022. Elsevier BV. https://doi.org/10.1016/j.chbah.2023.100022

Simjanović, D., Randjelović, B., Vesić, N., & Penjišević, A. (2022). Examples of mathematical problems in primary and secondary education that include the actual calendar year. Facta Universitatis, Series: Teaching, Learning and Teacher Education, 5(2), 191–200. https://doi.org/10.22190/futlte210617015s

Spasić, A. J., & Janković, D. S. (2023). Using ChatGPT Standard Prompt Engineering Techniques in Lesson Preparation: Role, Instructions and Seed-Word Prompts. 2023 58th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST), 47–50. https://doi.org/10.1109/icest58410.2023.10187269

Stanković, M., Milenković, A., Svičević, M., & Vučićević, N. (2025). Performance of an AI Tool in Solving Non-Standard Mathematics Competition Problems. 1st International Scientific Conference Education and Artificial Intelligence (EDAI 2024), 165–174. https://doi.org/10.46793/EDAI24.165S

Sundaram, S. S., Gurajada, S., Padmanabhan, D., Abraham, S. S., & Fisichella, M. (2024). Does a language model “understand” high school math? A survey of deep learning based word problem solvers. Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery 14(4). https://doi.org/10.1002/widm.1534

Trinh, T. H., Wu, Y., Le, Q. V., He, H., & Luong, T. (2024). Solving olympiad geometry without human demonstrations. In Nature, 625(7995), 476–482. https://doi.org/10.1038/s41586-023-06747-5

Wei, X. (2024). Evaluating chatGPT-4 and chatGPT-4o: performance insights from NAEP mathematics problem solving. Frontiers in Education, 9,Article1452570. https://doi.org/10.3389/feduc.2024.1452570

Yiu, E., Qraitem, M., Wong, C., Majhi, A. N., Bai, Y., Ginosar, S., Gopnik, A., & Saenko, K. (2024). KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2407.17773

Zhang, F., Li, C., Henkel, O., Xing, W., Baral, S., Heffernan, N., & Li, H. (2024). Math-LLMs: AI Cyberinfrastructure with Pre-trained Transformers for Math Education. International Journal of Artificial Intelligence in Education.https://doi.org/10.1007/s40593-024-00416-y

Zhao, J., Zhang, Z., Zhang, Q., Gui, T., & Huang, X. (2024). LLaMA Beyond English: An Empirical Study on Language Capability Transfer. ArXiv. https://doi.org/10.48550/arXiv.2401.01055

DOI: https://doi.org/10.22190/FUTLTE250429005S

Refbacks

There are currently no refbacks.

ISSN 2560 – 4600 (Print)
ISSN 2560 – 4619 (Online)

Username
Password
Remember me