HomeAIMIT scientists are building the world's largest collection of Olympic-level math problems...

MIT scientists are building the world’s largest collection of Olympic-level math problems and making them accessible to everyone

Introducing MathNet: A Comprehensive Resource for Olympiad-Level Mathematics

Every year, countries participating in the International Mathematical Olympiad (IMO) produce booklets of their best and most original problems. These brochures are distributed among delegations and then tend to disappear, never systematically collected or shared widely. This presents a challenge for both AI researchers testing the limits of mathematical thinking and students training for these competitions, often in isolation.

In a groundbreaking effort, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), King Abdullah University of Science and Technology (KAUST), and HUMAIN have addressed this gap by creating MathNet, the largest high-quality dataset of evidence-based math problems to date. This initiative will be showcased at the upcoming International Conference on Learning Representations (ICLR) in Brazil.

The Ambitious Scope of MathNet

MathNet’s strength lies not only in its size but also in its diversity. It contains over 30,000 expert-authored problems and solutions from 47 countries, spanning 17 languages and 143 competitions. This makes it five times larger than any previous dataset of its kind. Unlike earlier datasets, which predominantly sourced from the United States and China, MathNet offers a global perspective, covering four decades of competitive mathematics.

“Each country brings a booklet with its most novel and creative problems,” explains Shaden Alshammari, an MIT graduate student and lead author of the paper. “They share the brochures with each other, but no one has bothered to collect them, clean them up, and upload them online.”

Building MathNet: A Herculean Task

The creation of MathNet involved processing 1,595 PDF volumes, totaling more than 25,000 pages, including digital documents and decades-old scans in over a dozen languages. Much of this archive came from Navid Safaei, a long-time IMO community member and co-author, who had been collecting and scanning these booklets by hand since 2006. His personal archive became the backbone of this dataset.

MathNet sets itself apart by sourcing problems exclusively from official national competition brochures. These booklets contain solutions that are authored and peer-reviewed, often presenting multiple approaches to the same problem. This depth provides AI models with a richer basis for learning mathematical reasoning compared to the shorter, informal solutions typical of community-sourced datasets. It also serves as a valuable resource for students preparing for the IMO or national competitions, offering a centralized collection of high-quality problems and developed solutions from diverse traditions.

Implications for Students and AI Research

“I remember so many students for whom it was an individual effort. Nobody in their country trained them for this type of competition,” shares Alshammari, reflecting on her own experiences as a former IMO participant. “We hope this will provide them with a central location with high-quality problems and solutions from which they can learn.”

Supported by a deep connection to the IMO community, the research team, including co-author Sultan Albarakati, a member of the IMO board, is working to share MathNet directly with the IMO Foundation. To ensure the dataset’s reliability, they assembled a review group of over 30 human reviewers from various countries who coordinated to review thousands of solutions.

According to Tanish Patil, deputy head of the Swiss IMO, “The MathNet database has the potential to be an excellent resource for both students and managers looking for new problems to work on or looking for the solution to a difficult question.”

MathNet: A Benchmark for AI Performance

MathNet also serves as a rigorous benchmark for AI performance, revealing a more nuanced picture of AI’s mathematical capabilities. While some AI models have reportedly achieved gold medal performances at the IMO, MathNet highlights uneven progress. Even the best-performing models, like GPT-5, averaged around 69.3 percent on MathNet’s main benchmark, struggling with nearly one in three Olympic-level problems. Notably, performance drops significantly when problems involve numbers, exposing a persistent weakness in visual reasoning.

The dataset further highlights challenges with less common languages, as several open-source models achieved 0 percent accuracy on Mongolian language problems. “GPT models are equally good in English and other languages,” says Alshammari. “But many of the open-source models completely fail with less common languages like Mongolian.”

Expanding Mathematical Horizons

By offering a diverse range of problems, MathNet aims to address a deeper limitation in how AI models learn math. Exposure to a variety of mathematical cultures, such as those found in Romanian combinatorics or Brazilian number theory problems, enriches both human and AI mathematical thinking.

MathNet also introduces a retrieval benchmark, challenging models to detect when two problems share the same underlying mathematical structure. Testing eight state-of-the-art embedding models, researchers found that even the strongest models only identified the correct match on the first try about 5 percent of the time. This highlights the difficulty in finding mathematical equivalences across different notations, languages, and formats.

In addition, the dataset includes a retrieval generation benchmark, testing whether model performance improves when given a structurally similar problem before solving a new one. This approach was effective but only when the retrieved problem was truly relevant, as demonstrated by the DeepSeek-V3.2-Speciale model’s performance.

The study was authored by Alshammari, Safaei, HUMAIN AI engineer Abrar Zainal, KAUST Academy director Sultan Albarakati, and MIT-CSAIL colleagues: master’s student Kevin Wen SB ’25, Microsoft Principal Engineering Manager Mark Hamilton SM ’22, PhD ’25, and Professors William Freeman and Antonio Torralba. Their work received support from the Schwarzman College of Computing Fellowship and the National Science Foundation.

MathNet is publicly available at mathnet.csail.mit.edu.

For more information, please visit the source link.

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here