MATH-Vision

Measuring Multimodal Mathematical Reasoning

NeurIPS DB Track, 2024

Paper Code

🏆

Main Leaderboard

🏆

Open Source Leaderboard

🌿

Wild Leaderboard

🤗

Dataset

🌿

Wild Dataset

🔮

Visualization

[2024-05-20] (a) Zero-shot accuracies of four prominent Large Multimodal Models (LMMs), random chance, and human performance are evaluated on our proposed MATH-V across 16 subjects. Teal means newly introduced subjects. (b) Examples of easy problems in MATH-V failed by top-performing LMMs on MathVista. The three questions come from tests designed for elementary school students.

[2024-02-21] The accuracies of four prominent Large Multimodal Models (LMMs), random chance, and human performance are evaluated on our proposed MATH-Vision (MATH-V) across 16 subjects and 5 levels of difficulty, with Level 1 being the easiest and Level 5 the most challenging. Human performance is assessed using the testmini subset.

Introduction

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks.

To address this issue, we present the Logo MATH-Vision dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on MATH-V, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development.

Main Leaderboard

All models (proprietary and open-weight) on the full 3,040-example MATH-Vision test set.

🏆 Open Source Leaderboard 🌿 MATH-Vision-Wild Leaderboard

🚨 To submit your results to the leaderboard, please send to this email.

#	Model	Source	Date	ALL	Alg	AnaG	Ari	CombG	Comb	Cnt	DescG	GrphT	Log	Angle	Area	Len	SolG	Stat	Topo	TransG
0	Human	Link	2024-04-05	68.82	55.1	78.6	99.6	98.4	43.5	98.5	91.3	62.2	61.3	33.5	47.2	73.5	87.3	93.1	99.8	69.0
1	GPT-5.4 (xhigh reasoning, w/ Python) (3rd-party eval) 🥇	Link	2026-04-01	96.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
2	Gemini 3.1 Pro (thinking high, w/ Python) (3rd-party eval) 🥈	Link	2026-04-01	95.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
3	Kimi K2.6 (w/ Python) 🥉	Link	2026-04-01	93.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4	GPT-5.4 (xhigh reasoning) (3rd-party eval)	Link	2026-04-01	92.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
5	Gemini 3.0 Flash (3rd-party eval)	Link	2026-04-28	90.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
6	Gemini 3.1 Pro (thinking high) (3rd-party eval)	Link	2026-04-01	89.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
7	Doubao-Seed-2.0-lite-0428	Link	2026-04-28	89.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
8	Doubao-Seed-2.0-pro-0215	Link	2026-02-15	88.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
9	Qwen3.5-397B-A17B	Link	2026-02-17	88.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
10	Kimi K2.6 (no tools)	Link	2026-04-01	87.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
11	Gemini 3 Pro (3rd-party eval)	Link	2026-02-17	86.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
12	Doubao-Seed-2.0-lite-0215	Link	2026-02-15	86.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
13	Qwen3.5-122B-A10B	Link	2026-02-16	86.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
14	Qwen3.5-27B	Link	2026-02-16	86.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
15	Gemma 4 31B	Link	2026-04-02	85.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
16	Kimi K2.5 (thinking, w/ Python)	Link	2026-04-01	85.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
17	Claude Opus 4.6 (max effort, w/ Python) (3rd-party eval)	Link	2026-04-01	84.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
18	Kimi K2.5	Link	2026-01-27	84.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
19	Qwen3.5-35B-A3B	Link	2026-02-16	83.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
20	GPT-5.2 (3rd-party eval)	Link	2026-01-27	83.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
21	Gemma 4 26B A4B	Link	2026-04-02	82.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
22	Seed-1.8	Link	2026-03-01	81.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
23	Seed 1.6-Thinking	Link	2025-06-25	77.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
24	Claude Opus 4.5 (3rd-party eval)	Link	2026-01-27	77.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
25	Step3-VL-10B (PaCoRe)	Link	2026-01-15	75.95	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
26	Qwen3-VL-235B-A22B-Thinking	Link	2025-09-20	74.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
27	Gemini 2.5 Pro	Link	2025-03-23	73.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
28	GPT-5	Link	2025-08-26	72.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
29	GPT-5-mini (2025-08-07) (3rd-party eval)	Link	2026-02-16	71.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
30	Claude Opus 4.6 (max effort) (3rd-party eval)	Link	2026-04-01	71.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
31	Claude Sonnet 4.5 (3rd-party eval)	Link	2026-02-16	71.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
32	Step3-VL-10B (SeRe)	Link	2026-01-15	70.81	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
33	dots.vlm1	Link	2025-08-05	69.64	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
34	Seed1.5-VL	Link	2025-05-12	68.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
35	Qwen3-VL-235B-A22B-Instruct	Link	2025-09-20	66.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
36	Claude Opus 4.1 (thinking) (3rd-party eval)	Link	2025-08-05	66.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
37	GLM-4.5V (106B-A12B)	Link	2025-08-26	65.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
38	Step-3 (321B-A38B)	Link	2025-08-26	64.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
39	InternVL3.5 (241B-A28B)	Link	2025-08-26	63.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
40	GLM-4.6V (106B-A12B)	Link	2025-11-01	63.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
41	MiMo-VL-RL	Link	2025-06-04	60.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
42	OpenAI o1	Link	2025-04-10	60.30	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
43	MiMo-VL-RL-2508	Link	2025-08-01	59.65	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
44	Qwen3-VL-8B-Thinking	Link	2025-10-01	59.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
45	Gemma 4 E4B	Link	2026-04-02	59.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
46	Claude 3.7 Sonnet (3rd-party eval, Skywork)	Link	2025-02-24	58.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
47	OpenAI o4-mini (3rd-party eval)	Link	2025-04-16	58.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
48	MiMo-VL-7B-SFT	Link	2025-06-04	57.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
49	Kimi-VL-A3B-Thinking-2506	Link	2025-06-23	56.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
50	Step R1-V-Mini	Link	2025-04-05	56.6	58.0	64.3	62.9	43.2	53.6	28.4	33.7	34.4	56.3	66.5	65.8	69.3	53.3	58.6	30.4	46.4
51	InternVL3.5 (30B-A3B)	Link	2025-08-26	55.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
52	SenseNova V6 Reasoner	Link	2025-04-10	55.39	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
53	GLM-4.1V (9B)	Link	2025-08-26	54.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
54	GLM-4.6V-Flash (9B)	Link	2025-11-01	54.05	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
55	InternVL3.5-38B	Link	2025-08-26	54.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
56	Ovis2.5-9B	Link	2025-08-19	53.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
57	AStar-7B (training-free, Qwen2.5-VL-7B)	Link	2025-02-04	53.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
58	Kimi k1.6 Preview	Link	2025-03-08	53.29	63.19	54.76	66.43	37.34	51.79	35.82	22.12	34.44	59.66	57.23	57.80	67.04	47.95	55.17	17.39	41.67
59	Skywork-R1V3-38B	Link	2025-07-05	52.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
60	Decoupled LLM-LMM (Qwen2.5-VL-72B + Qwen3-32B)	Link	2025-09-27	52.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
61	Gemma 4 E2B	Link	2026-04-02	52.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
62	InternVL3.5-8B	Link	2025-08-26	52.05	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
63	Open-Vision-Reasoner-7B	Link	2025-07-15	51.8	57.7	41.7	59.3	33.1	50.0	26.9	26.0	40.0	44.5	63.0	60.4	68.4	50.0	51.7	21.7	36.9
64	Qianfan-VL-70B	Link	2025-09-16	50.29	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
65	Skywork-R1V2-38B	Link	2025-04-28	49.7	52.6	47.4	73.7	42.1	52.6	36.8	15.8	57.9	73.7	63.2	73.7	57.9	47.4	47.4	21.1	31.6
66	Doubao-1.5-pro	Link	2025-02-28	48.62	55.07	52.38	63.57	34.74	36.90	43.28	25.00	27.78	37.82	62.43	55.40	59.69	43.85	55.17	26.09	37.50
67	Gemini 2.0 Pro (3rd-party eval)	Link	2025-02-05	48.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
68	GPT-4.5	Link	2025-04-10	47.30	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
69	Gemma 3 27B (no think)	Link	2026-04-02	46.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
70	Keye-VL-8B	Link	2025-08-19	46.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
71	GPT-5 (minimal) (3rd-party eval)	Link	2025-08-07	45.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
72	VL-Rethinker-72B	Link	2025-03-25	44.93	49.0	48.8	59.3	35.4	33.9	22.4	24.0	32.2	42.9	56.1	50.0	52.8	41.0	65.5	30.4	34.5
73	LLaVA-Critic-R1 (Qwen2.5-VL-7B base)	Link	2025-08-30	44.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
74	Vision-R1-7B (contamination-flagged)	Link	2025-03-10	43.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
75	InternVL3-78B	Link	2025-04-11	43.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
76	Skywork-R1V2-38B-AWQ	Link	2025-04-28	42.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
77	INFRL-Qwen2.5-VL-72B-Preview	Link	2025-03-25	42.73	49.3	42.9	59.3	31.8	32.7	32.8	22.1	27.8	41.2	54.3	47.6	48.1	38.9	56.9	26.1	33.3
78	Gemini-2 Flash	Link	2025-02-05	41.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
79	ProxyThinker (Qwen2.5-VL-32B + OpenVLThinker-7B expert)	Link	2025-05-30	40.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
80	VL-Rethinker-32B	Link	2025-04-12	40.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
81	ViCrit-RL-72B	Link	2025-06-11	40.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
82	InternVL3.5-4B	Link	2025-08-26	40.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
83	Kimi k1.5	Link	2025-01-22	38.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
84	Virgo-72B	Link	2025-01-03	38.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
85	Qwen2.5-VL-72B	Link	2025-01-26	38.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
86	Claude3.5-Sonnet	Link	2024-06-21	37.99	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
87	Ovis2.5-2B	Link	2025-08-19	37.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
88	AVAR-Thinker-7B	Link	2026-03-05	37.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
89	InternVL3-14B	Link	2025-04-11	37.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
90	Kimi-VL-A3B-Thinking	Link	2025-04-11	36.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
91	Phi-4-reasoning-vision-15B (testmini)	Link	2026-03-06	36.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
92	QvQ-72B-Preview	Link	2024-12-25	35.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
93	URSA-8B + URSA-RM (BoN=32)	Link	2025-01-08	35.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
94	InternVL3-38B	Link	2025-04-11	34.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
95	ThinkLite-VL-7B	Link	2025-04-10	32.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
96	Qianfan-VL-8B	Link	2025-09-16	32.82	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
97	InternVL2.5-78B	Link	2024-12-05	32.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
98	Ovis2-34B	Link	2025-03-25	31.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
99	InternVL2.5-38B	Link	2024-12-05	31.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
100	URSA-8B-PS-GRPO	Link	2025-05-01	31.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
101	TBAC-VLR1-7B	Link	2025-09-03	31.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
102	LLaVA-Critic-R1 (LLaMA-3.2-11B-V base)	Link	2025-08-30	30.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
103	GPT-4o	Link	2024-05-19	30.39	42.0	39.3	49.3	28.9	25.6	22.4	24.0	23.3	29.4	17.3	29.8	30.1	29.1	44.8	34.8	17.9
104	GPT-4 Turbo	Link	2024-05-19	30.26	37.7	33.3	46.4	25.0	28.6	25.3	15.4	27.8	31.9	30.6	29.0	31.9	28.7	37.9	17.4	23.2
105	MMR1-Math-v0-7B	Link	2025-03-13	30.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
106	DualMindVLM-7B	Link	2025-11-20	30.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
107	Ovis2-16B	Link	2025-03-25	30.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
108	R1-Onevision-7B	Link	2025-03-13	29.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
109	InternVL3-8B	Link	2025-04-11	29.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
110	VOLD (Qwen2.5-VL-3B, text-only RL)	Link	2025-10-27	28.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
111	InternVL3-9B	Link	2025-04-11	27.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
112	Claude3-Opus	Link	2024-05-04	27.13	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
113	Qwen2.5-VL-DP-7B (MathV-DP)	Link	2025-07-03	26.9	23.3	30.8	32.2	20.6	27.3	17.4	23.9	22.9	28.6	28.9	30.9	28.8	28.7	37.9	18.4	23.2
114	MM-Eureka-Qwen-7B	Link	2025-03-09	26.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
115	Vision-SR1-7B	Link	2025-08-27	26.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
116	MathGLM-Vision-32B	Link	2024-09-20	26.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
117	VLAA-Thinker-Qwen2.5VL-7B	Link	2025-04-15	26.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
118	MathCoder-VL-8B	Link	2025-02-16	26.1	18.6	32.1	26.4	25.0	10.7	13.4	20.2	14.4	21.0	48.6	32.2	32.1	23.0	29.3	8.7	23.2
119	Qwen2-VL-72B	Link	2024-08-29	25.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
120	Ovis2-8B (testmini)	Link	2025-03-25	25.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
121	OpenVLThinker-7B	Link	2025-03-21	25.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
122	LLaVA-OneVision-72B	Link	2024-08-06	25.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
123	Qwen2.5-VL-7B	Link	2025-01-26	25.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
124	TBAC-VLR1-3B-preview	Link	2025-04-21	25.0	22.0	29.8	32.1	19.5	18.5	16.4	22.1	11.1	25.2	39.3	27.6	28.5	22.9	34.5	17.4	22.0
125	R1-VL-7B (contamination-flagged)	Link	2025-03-17	24.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
126	X-Reasoner-3B (text-only)	Link	2025-10-27	24.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
127	CoT GPT4V	Link	2024-02-21	23.98	26.7	26.2	38.6	22.1	24.4	19.4	27.9	23.3	25.2	17.3	21.4	23.4	23.8	25.9	4.4	25.6
128	R1-Onevision-3B	Link	2025-03-13	23.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
129	MiniCPM-V 2.6	Link	2024-08-06	23.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
130	InternVL2.5-26B	Link	2024-12-05	23.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
131	GPT4V	Link	2024-02-21	22.76	27.3	32.1	35.7	21.1	16.7	13.4	22.1	14.4	16.8	22.0	22.2	20.9	23.8	24.1	21.7	25.6
132	MathCoder-VL-2B	Link	2025-02-16	21.7	15.7	17.9	17.1	19.2	11.3	14.9	26.9	14.4	16.8	38.2	25.4	26.9	15.6	36.2	8.7	25.0
133	MiniCPM-o 2.6	Link	2025-01-13	21.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
134	InternVL3-2B	Link	2025-04-11	21.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
135	Kimi-VL (base, non-thinking)	Link	2025-06-23	21.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
136	Qwen2.5-VL-3B	Link	2025-01-26	21.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
137	Multimath-7B	Link	2024-08-30	20.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
138	InternVL2.5-8B	Link	2024-12-05	19.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
139	Mulberry-Qwen2VL-7B (contamination-flagged)	Link	2024-12-24	19.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
140	Gemini-1.5 Pro	Link	2024-05-17	19.24	20.3	35.7	34.3	19.8	15.5	20.9	26.0	26.7	22.7	14.5	14.4	16.5	18.9	10.3	26.1	17.3
141	InternVL3-1B	Link	2025-04-11	18.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
142	Ovis1.6-Gemma2-9B	Link	2024-09-19	18.78	13.3	15.5	22.1	17.9	11.3	22.4	23.1	20.0	20.2	20.8	18.0	24.7	15.6	20.7	17.4	20.8
143	MAVIS-7B	Link	2024-07-11	18.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
144	Aquila-VL-2B	Link	2024-10-25	17.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
145	Ovis2-2B (testmini)	Link	2025-03-25	17.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
146	Qwen2-VL-DP-7B (MathV-DP)	Link	2025-07-03	17.7	15.2	20.8	20.8	20.2	12.0	7.9	20.3	21.2	16.9	19.2	19.1	23.2	14.4	13.9	17.5	20.9
147	Gemini Pro	Link	2024-02-21	17.66	15.1	10.7	20.7	20.1	11.9	7.5	20.2	21.1	16.8	19.1	19.0	20.0	14.3	13.8	17.4	20.8
148	InternVL-Chat-V1-2-Plus	Link	2024-02-22	16.97	11.3	25.0	15.7	16.9	10.1	11.9	16.4	15.6	19.3	22.5	16.4	22.5	14.3	17.2	4.4	20.8
149	Qwen2-VL-7B	Link	2024-08-29	16.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
150	Math-LLaVA-13B	Link	2024-06-26	15.69	9.0	20.2	15.7	18.2	10.1	10.5	16.4	14.4	16.0	20.2	18.4	17.6	9.4	24.1	21.7	17.9
151	Qwen-VL-Max	Link	2024-02-21	15.59	10.7	19.1	20.0	16.9	12.5	17.9	16.4	12.2	21.0	13.3	14.2	19.8	11.5	20.7	13.0	17.3
152	InternVL2.5-2B	Link	2024-12-05	14.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
153	InternLM-XComposer2-VL	Link	2024-02-21	14.54	9.3	15.5	12.1	15.3	11.3	10.5	14.4	22.2	19.3	19.7	15.6	15.0	11.9	15.5	26.1	15.5
154	InternVL2.5-1B	Link	2024-12-05	14.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
155	GPT 4-CoT (caption)	Link	2024-02-21	13.10	16.5	20.2	34.3	10.4	17.9	19.4	7.7	11.1	10.1	9.8	9.6	9.1	13.5	13.8	8.7	12.5
156	Qwen2-VL-2B	Link	2024-08-29	12.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
157	ShareGPT4V-13B	Link	2024-02-21	11.88	7.5	15.5	16.4	10.7	8.9	9.0	11.5	8.9	7.6	11.6	13.0	17.4	10.3	8.6	8.7	12.5
158	LLaVA-v1.5-13B	Link	2024-02-21	11.12	7.0	14.3	14.3	9.1	6.6	6.0	13.5	5.6	13.5	10.4	12.6	14.7	11.5	13.8	13.0	10.7
159	Qwen-VL-Plus	Link	2024-02-21	10.72	11.3	17.9	14.3	12.7	4.8	10.5	15.4	8.9	14.3	11.6	6.4	10.0	14.3	6.9	8.7	11.31
160	ShareGPT4V-7B	Link	2024-02-21	10.53	5.5	3.6	12.9	10.1	4.8	7.5	11.5	14.4	10.9	16.2	11.8	12.3	9.8	15.5	17.4	11.3
161	SPHINX (V2)	Link	2024-02-21	9.70	6.7	7.1	12.9	7.5	7.7	6.0	9.6	16.7	10.1	11.0	11.8	12.5	8.2	8.6	8.7	6.0
162	LLaVA-v1.5-7B	Link	2024-02-21	8.52	7.0	7.1	10.7	7.1	4.8	10.5	7.7	10.0	9.2	15.6	10.2	9.8	5.3	8.6	4.4	4.8
163	Random Chance	Link	2024-02-21	7.17	1.5	11.9	7.1	9.7	4.8	6.0	22.1	1.1	7.6	0.6	9.4	6.7	8.2	8.6	13.0	7.1

Human*: Average human performance from annotators who have high school diplomas or above.
Subjects: Alg: algebra, AnaG: analytic geometry, Ari: arithmetic, CombG: combinatorial geometry,
Comb: combinatorics, Cnt: counting, DescG: descriptive geometry, GrphT: graph theory, Log: logic,
Angle: metric geometry - angle, Area: metric geometry - area, Len: metric geometry-length,
SolG: solid geometry, Stat: statistics, Topo: topology, TransG: transformation geometry.

Open Source Leaderboard

Open-weight models only on the full 3,040-example MATH-Vision test set.

🏆 Main Leaderboard 🌿 MATH-Vision-Wild Leaderboard

🚨 To submit your results to the leaderboard, please send to this email.

#	Model	Source	Date	ALL	Alg	AnaG	Ari	CombG	Comb	Cnt	DescG	GrphT	Log	Angle	Area	Len	SolG	Stat	Topo	TransG
0	Human	Link	2024-04-05	68.82	55.1	78.6	99.6	98.4	43.5	98.5	91.3	62.2	61.3	33.5	47.2	73.5	87.3	93.1	99.8	69.0
1	Kimi K2.6 (w/ Python) 🥇	Link	2026-04-01	93.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
2	Qwen3.5-397B-A17B 🥈	Link	2026-02-17	88.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
3	Kimi K2.6 (no tools) 🥉	Link	2026-04-01	87.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4	Qwen3.5-122B-A10B	Link	2026-02-16	86.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
5	Qwen3.5-27B	Link	2026-02-16	86.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
6	Gemma 4 31B	Link	2026-04-02	85.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
7	Kimi K2.5 (thinking, w/ Python)	Link	2026-04-01	85.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
8	Kimi K2.5	Link	2026-01-27	84.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
9	Qwen3.5-35B-A3B	Link	2026-02-16	83.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
10	Gemma 4 26B A4B	Link	2026-04-02	82.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
11	Step3-VL-10B (PaCoRe)	Link	2026-01-15	75.95	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
12	Qwen3-VL-235B-A22B-Thinking	Link	2025-09-20	74.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
13	Step3-VL-10B (SeRe)	Link	2026-01-15	70.81	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
14	dots.vlm1	Link	2025-08-05	69.64	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
15	Qwen3-VL-235B-A22B-Instruct	Link	2025-09-20	66.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
16	GLM-4.6V (106B-A12B)	Link	2025-11-01	63.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
17	MiMo-VL-RL-2508	Link	2025-08-01	59.65	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
18	Qwen3-VL-8B-Thinking	Link	2025-10-01	59.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
19	Gemma 4 E4B	Link	2026-04-02	59.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
20	MiMo-VL-7B-SFT	Link	2025-06-04	57.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
21	Kimi-VL-A3B-Thinking-2506	Link	2025-06-23	56.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
22	GLM-4.6V-Flash (9B)	Link	2025-11-01	54.05	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
23	InternVL3.5-38B	Link	2025-08-26	54.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
24	Ovis2.5-9B	Link	2025-08-19	53.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
25	Skywork-R1V3-38B	Link	2025-07-05	52.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
26	Gemma 4 E2B	Link	2026-04-02	52.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
27	InternVL3.5-8B	Link	2025-08-26	52.05	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
28	Open-Vision-Reasoner-7B	Link	2025-07-15	51.8	57.7	41.7	59.3	33.1	50.0	26.9	26.0	40.0	44.5	63.0	60.4	68.4	50.0	51.7	21.7	36.9
29	Qianfan-VL-70B	Link	2025-09-16	50.29	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
30	Skywork-R1V2-38B	Link	2025-04-28	49.7	52.6	47.4	73.7	42.1	52.6	36.8	15.8	57.9	73.7	63.2	73.7	57.9	47.4	47.4	21.1	31.6
31	Gemma 3 27B (no think)	Link	2026-04-02	46.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
32	Keye-VL-8B	Link	2025-08-19	46.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
33	VL-Rethinker-72B	Link	2025-03-25	44.93	49.0	48.8	59.3	35.4	33.9	22.4	24.0	32.2	42.9	56.1	50.0	52.8	41.0	65.5	30.4	34.5
34	LLaVA-Critic-R1 (Qwen2.5-VL-7B base)	Link	2025-08-30	44.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
35	InternVL3-78B	Link	2025-04-11	43.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
36	Skywork-R1V2-38B-AWQ	Link	2025-04-28	42.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
37	INFRL-Qwen2.5-VL-72B-Preview	Link	2025-03-25	42.73	49.3	42.9	59.3	31.8	32.7	32.8	22.1	27.8	41.2	54.3	47.6	48.1	38.9	56.9	26.1	33.3
38	InternVL3.5-4B	Link	2025-08-26	40.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
39	Qwen2.5-VL-72B	Link	2025-01-26	38.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
40	Ovis2.5-2B	Link	2025-08-19	37.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
41	InternVL3-14B	Link	2025-04-11	37.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
42	Kimi-VL-A3B-Thinking	Link	2025-04-11	36.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
43	Phi-4-reasoning-vision-15B (testmini)	Link	2026-03-06	36.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
44	QvQ-72B-Preview	Link	2024-12-25	35.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
45	InternVL3-38B	Link	2025-04-11	34.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
46	Qianfan-VL-8B	Link	2025-09-16	32.82	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
47	InternVL2.5-78B	Link	2024-12-05	32.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
48	Ovis2-34B	Link	2025-03-25	31.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
49	InternVL2.5-38B	Link	2024-12-05	31.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
50	LLaVA-Critic-R1 (LLaMA-3.2-11B-V base)	Link	2025-08-30	30.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
51	Ovis2-16B	Link	2025-03-25	30.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
52	InternVL3-8B	Link	2025-04-11	29.0	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
53	InternVL3-9B	Link	2025-04-11	27.6	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
54	MathCoder-VL-8B	Link	2025-02-16	26.1	18.6	32.1	26.4	25.0	10.7	13.4	20.2	14.4	21.0	48.6	32.2	32.1	23.0	29.3	8.7	23.2
55	Qwen2-VL-72B	Link	2024-08-29	25.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
56	Ovis2-8B (testmini)	Link	2025-03-25	25.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
57	LLaVA-OneVision-72B	Link	2024-08-06	25.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
58	Qwen2.5-VL-7B	Link	2025-01-26	25.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
59	TBAC-VLR1-3B-preview	Link	2025-04-21	25.0	22.0	29.8	32.1	19.5	18.5	16.4	22.1	11.1	25.2	39.3	27.6	28.5	22.9	34.5	17.4	22.0
60	MiniCPM-V 2.6	Link	2024-08-06	23.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
61	InternVL2.5-26B	Link	2024-12-05	23.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
62	MathCoder-VL-2B	Link	2025-02-16	21.7	15.7	17.9	17.1	19.2	11.3	14.9	26.9	14.4	16.8	38.2	25.4	26.9	15.6	36.2	8.7	25.0
63	MiniCPM-o 2.6	Link	2025-01-13	21.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
64	InternVL3-2B	Link	2025-04-11	21.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
65	Kimi-VL (base, non-thinking)	Link	2025-06-23	21.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
66	Qwen2.5-VL-3B	Link	2025-01-26	21.2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
67	InternVL2.5-8B	Link	2024-12-05	19.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
68	InternVL3-1B	Link	2025-04-11	18.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
69	Ovis1.6-Gemma2-9B	Link	2024-09-19	18.78	13.3	15.5	22.1	17.9	11.3	22.4	23.1	20.0	20.2	20.8	18.0	24.7	15.6	20.7	17.4	20.8
70	Aquila-VL-2B	Link	2024-10-25	17.9	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
71	Ovis2-2B (testmini)	Link	2025-03-25	17.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
72	InternVL-Chat-V1-2-Plus	Link	2024-02-22	16.97	11.3	25.0	15.7	16.9	10.1	11.9	16.4	15.6	19.3	22.5	16.4	22.5	14.3	17.2	4.4	20.8
73	Qwen2-VL-7B	Link	2024-08-29	16.3	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
74	Math-LLaVA-13B	Link	2024-06-26	15.69	9.0	20.2	15.7	18.2	10.1	10.5	16.4	14.4	16.0	20.2	18.4	17.6	9.4	24.1	21.7	17.9
75	InternVL2.5-2B	Link	2024-12-05	14.7	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
76	InternLM-XComposer2-VL	Link	2024-02-21	14.54	9.3	15.5	12.1	15.3	11.3	10.5	14.4	22.2	19.3	19.7	15.6	15.0	11.9	15.5	26.1	15.5
77	InternVL2.5-1B	Link	2024-12-05	14.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
78	Qwen2-VL-2B	Link	2024-08-29	12.4	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
79	ShareGPT4V-13B	Link	2024-02-21	11.88	7.5	15.5	16.4	10.7	8.9	9.0	11.5	8.9	7.6	11.6	13.0	17.4	10.3	8.6	8.7	12.5
80	LLaVA-v1.5-13B	Link	2024-02-21	11.12	7.0	14.3	14.3	9.1	6.6	6.0	13.5	5.6	13.5	10.4	12.6	14.7	11.5	13.8	13.0	10.7
81	ShareGPT4V-7B	Link	2024-02-21	10.53	5.5	3.6	12.9	10.1	4.8	7.5	11.5	14.4	10.9	16.2	11.8	12.3	9.8	15.5	17.4	11.3
82	SPHINX (V2)	Link	2024-02-21	9.70	6.7	7.1	12.9	7.5	7.7	6.0	9.6	16.7	10.1	11.0	11.8	12.5	8.2	8.6	8.7	6.0
83	LLaVA-v1.5-7B	Link	2024-02-21	8.52	7.0	7.1	10.7	7.1	4.8	10.5	7.7	10.0	9.2	15.6	10.2	9.8	5.3	8.6	4.4	4.8
84	Random Chance	Link	2024-02-21	7.17	1.5	11.9	7.1	9.7	4.8	6.0	22.1	1.1	7.6	0.6	9.4	6.7	8.2	8.6	13.0	7.1

MATH-Vision-Wild Leaderboard

Real-world photographic testmini (304 examples) captured on paper/iPads/laptops/projectors — tests VLM generalization to varied physical capture conditions.

🏆 Main Leaderboard 🏆 Open Source Leaderboard

Accuracy scores on the testmini split (304 examples) of Logo MATH-Vision-Wild. MATH-Vision-Wild is a photographic variant of MATH-Vision-testmini: the same problems captured in different physical environments (printed paper, iPads, laptops, projectors) under varying lighting, to test VLM generalization to real-world conditions.

🚨 To submit your results to the leaderboard, please send to this email.

#	Model	Source	Date	MathVision-Wild (testmini)	Wild Δ%	MathVision-Screenshot (testmini)	Screenshot Δ%	MathVision (testmini)
1	o4-mini 🥇	Link	2025-04-16	57.2	+2.33%	58.9	+5.37%	55.9
2	Gemini 2.5 Pro Preview 05-06 (thinking) 🥈	Link	2025-05-06	49.0	-23.20%	52.6	-17.55%	63.8
3	Gemini 2.5 Flash Preview 05-20 🥉	Link	2025-05-20	48.0	-17.10%	51.6	-10.88%	57.9
4	Gemini 2.5 Flash Preview 05-20 (thinking)	Link	2025-05-20	47.4	-14.29%	56.2	+1.63%	55.3
5	Doubao-1.5-thinking-vision-pro	Link	2025-05-12	45.7	-21.07%	50.7	-12.44%	57.9
6	Gemini 2.5 Pro Preview 05-06	Link	2025-05-06	42.8	-30.74%	53.3	-13.75%	61.8
7	GPT-4.1-mini	Link	2025-04-14	39.1	-11.34%	41.4	-6.12%	44.1
8	GPT-4.1	Link	2025-04-14	35.5	-12.35%	36.2	-10.62%	40.5
9	Doubao-1.5-vision-pro	Link	2025-01-22	34.9	-23.63%	36.8	-19.47%	45.7
10	Qwen-VL-Max (2025-04-08)	Link	2025-04-08	28.3	-24.53%	27.6	-26.40%	37.5
11	Doubao-1.5-vision-pro-32k	Link	2025-01-22	28.3	-34.79%	37.5	-13.59%	43.4
12	Qwen2.5-VL-32B-Instruct	Link	2025-03-24	28.0	-19.77%	27.6	-20.92%	34.9
13	QVQ-Max	Link	2025-03-28	27.6	-37.84%	30.9	-30.41%	44.4
14	Gemini 2.0 Flash-Lite	Link	2025-02-05	26.6	-31.97%	25.3	-35.29%	39.1
15	Doubao-1.5-vision-lite	Link	2025-01-22	25.7	-22.59%	25.3	-23.80%	33.2
16	Gemini 1.5 Flash	Link	2024-05-23	25.3	-10.60%	26.3	-7.07%	28.3
17	Qwen2.5-VL-72B-Instruct	Link	2025-01-26	24.0	-33.70%	26.6	-26.52%	36.2
18	Gemini 2.0 Flash	Link	2025-02-05	23.0	-52.08%	23.7	-50.62%	48.0
19	GPT-4.1-nano	Link	2025-04-14	22.0	-29.49%	29.3	-6.09%	31.2
20	Gemini 1.5 Pro	Link	2024-05-23	18.4	-52.58%	17.8	-54.12%	38.8
21	Qwen2.5-VL-3B-Instruct	Link	2025-01-26	14.8	-13.45%	16.8	-1.75%	17.1
22	Qwen-VL-Plus (2025-01-25)	Link	2025-01-25	13.8	-25.00%	14.1	-23.37%	18.4
23	Qwen2.5-VL-7B-Instruct	Link	2025-01-26	13.5	+0.00%	15.5	+14.81%	13.5
24	o3	Link	2025-04-16	-	-	59.2	+9.02%	54.3
25	Gemini 2.0 Flash 001	Link	2025-02-05	-	-	40.8	+8.80%	37.5

Wild: photographic capture of testmini problems under real-world conditions.
Screenshot: screenshot capture of the same testmini problems.
Testmini: original digital testmini (baseline).

MATH-V Dataset

Key statistics of MATH-V

Comparison of the level distribution between our
MATH-V and the MATH dataset.

Distribution

levels, subjects and sources distribution of MATH-V.

Some Images from 16 Subjects

BibTeX

@inproceedings{
wang2025mathcodervl,
title={MathCoder-{VL}: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning},
author={Ke Wang and Junting Pan and Linda Wei and Aojun Zhou and Weikang Shi and Zimu Lu and Han Xiao and Yunqiao Yang and Houxing Ren and Mingjie Zhan and Hongsheng Li},
booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025},
url={https://openreview.net/forum?id=nuvtX1imAb}
}
@inproceedings{
wang2024measuring,
title={Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset},
author={Ke Wang and Junting Pan and Weikang Shi and Zimu Lu and Houxing Ren and Aojun Zhou and Mingjie Zhan and Hongsheng Li},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
url={https://openreview.net/forum?id=QWTCcxMpPA}
}

Acknowledgement

We would like to thank MathVista for this website, which is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.