With the rapid evolution of Artificial Intelligence (AI), Large Language
Models (LLMs) have reshaped the frontiers of various fields, spanning
healthcare, public health, engineering, science, agriculture, education, arts,
humanities, and mathematical reasoning. Among these advancements, DeepSeek
models have emerged as noteworthy contenders, demonstrating promising
capabilities that set them apart from their peers. While previous studies have
conducted comparative analyses of LLMs, few have delivered a comprehensive
evaluation of mathematical reasoning across a broad spectrum of LLMs. In this
work, we aim to bridge this gap by conducting an in-depth comparative study,
focusing on the strengths and limitations of DeepSeek models in relation to
their leading counterparts. In particular, our study systematically evaluates
the mathematical reasoning performance of two DeepSeek models alongside five
prominent LLMs across three independent benchmark datasets. The findings reveal
several key insights: 1). DeepSeek-R1 consistently achieved the highest
accuracy on two of the three datasets, demonstrating strong mathematical
reasoning capabilities. 2). The distilled variant of LLMs significantly
underperformed compared to its peers, highlighting potential drawbacks in using
distillation techniques. 3). In terms of response time, Gemini 2.0 Flash
demonstrated the fastest processing speed, outperforming other models in
efficiency, which is a crucial factor for real-time applications. Beyond these
quantitative assessments, we delve into how architecture, training, and
optimization impact LLMs' mathematical reasoning. Moreover, our study goes
beyond mere performance comparison by identifying key areas for future
advancements in LLM-driven mathematical reasoning. This research enhances our
understanding of LLMs' mathematical reasoning and lays the groundwork for
future advancements