Comparative Evaluation of Chinese Generative AI Models in Solving Curriculum-Aligned Middle School Mathematics Problems

Shasha Song; Chenyu Meng; Zezhong Yang

doi:10.56557/jogress/2026/v20i110342

Comparative Evaluation of Chinese Generative AI Models in Solving Curriculum-Aligned Middle School Mathematics Problems

PDF Review History Discussion

Published: 2026-03-11

DOI: 10.56557/jogress/2026/v20i110342

Page: 169-181

Issue: 2026 - Volume 20 [Issue 1]

Shasha Song

School of Mathematics and Statistics, Shandong Normal University, Jinan, Shandong, China.

Chenyu Meng

School of Mathematics and Statistics, Shandong Normal University, Jinan, Shandong, China.

Zezhong Yang *

School of Mathematics and Statistics, Shandong Normal University, Jinan, Shandong, China.

*Author to whom correspondence should be addressed.

Abstract

Currently, generative artificial intelligence has become an important auxiliary tool for teaching and learning mathematics in middle school. However, there is still a lack of systematic evaluation of Chinese large language models' ability to solve middle school mathematics problems in the academic community. This study selects five mainstream generative AI models (Tencent Yuanbao, Deepseek, Doubao, Kimi, and Wenxin Yiyan) as research subjects and uses 18 middle school mathematics problems covering three modules (algebra, geometry, and probability) as test samples. Comparative analysis was conducted from four dimensions: problem-solving efficiency, result accuracy, solution completeness, and logical rigor. The completeness of problem-solving thinking was evaluated by whether it included full problem-solving steps, reasoning processes and necessary verification procedures, while logical rigor was assessed based on the coherence of problem-solving steps and the rationality of reasoning grounds. The results indicated that the overall problem-solving accuracy of the five models ranged from 61.11% to 77.78%, the completeness of problem-solving thinking from 77.78% to 88.89%, and the logical rigor from 83.33% to 94.44%. The study found that domestic generative artificial intelligence demonstrated outstanding performance in solving algebraic and probability problems, yet exhibited poor performance in geometric problems due to such issues as inaccurate image recognition and incomplete comprehension of test questions. There were significant disparities in the problem-solving capabilities of the five models: Doubao and Tencent Yuanbao delivered well-balanced overall performance with detailed problem-solving processes, whereas each of the other models had its own shortcomings.

Keywords: Generative artificial intelligence, middle school mathematics, problem-solving ability, model comparison

How to Cite

Song, Shasha, Chenyu Meng, and Zezhong Yang. 2026. “Comparative Evaluation of Chinese Generative AI Models in Solving Curriculum-Aligned Middle School Mathematics Problems”. Journal of Global Research in Education and Social Science 20 (1):169-81. https://doi.org/10.56557/jogress/2026/v20i110342.

Downloads

Download data is not yet available.