A growing wave of collaborations between mathematicians and artificial intelligence researchers is transforming how mathematical proofs are discovered, verified, and shared. In late 2024 and into 2025, projects built around interactive theorem provers like Lean — combined with large language models from labs such as Google DeepMind and OpenAI — have begun cracking problems that once seemed firmly out of reach for machines, from International Mathematical Olympiad questions to dense conjectures in algebraic geometry. The shift is prompting a serious reassessment of what it means to “do mathematics,” and who, or what, gets credit for a discovery.
From Curiosity to Working Tool
For decades, formal proof assistants were a niche concern, used mostly by logicians and a handful of computer scientists verifying critical software. That changed when Fields Medalist Terence Tao began publicly documenting his use of Lean and GitHub Copilot to formalize portions of his own research. Tao has written on his blog that the combination of automated tactics and AI-assisted suggestions has allowed him to verify arguments faster than working alone with pen and paper, particularly for tedious case analyses where human attention tends to drift.
The momentum accelerated when Google DeepMind announced that its AlphaProof and AlphaGeometry 2 systems had reached a silver-medal performance at the International Mathematical Olympiad, solving four of six problems. The systems combined reinforcement learning with Lean-based proof verification, meaning every solution they produced was machine-checkable rather than merely plausible-sounding prose. That distinction matters: large language models are notorious for “hallucinating” arguments that read fluently but collapse under scrutiny, and formal verification slams that door shut.
Why Formal Verification Changes the Game
The deeper significance lies less in the headlines about Olympiad medals and more in how research mathematics is being restructured. Projects like the formalization of Peter Scholze’s liquid tensor experiment, completed by a community of Lean contributors, demonstrated that machine verification can keep pace with cutting-edge research. Scholze himself has said the experience increased his confidence in results whose original proofs ran to hundreds of pages and depended on intricate chains of reasoning few humans could fully audit.
Statistics and computer science are feeling the ripple effects too. Probabilistic proofs, randomized algorithms, and complexity-theoretic bounds are being formalized at a pace that would have been unthinkable five years ago. The journal Nature has covered the trend in detail, noting that AI-assisted proof discovery may eventually rebalance which subfields advance fastest, since formalization-friendly areas attract more tooling investment.
Concerns About Trust and Authorship
Not everyone is celebrating. Critics point out that as proofs become longer and more reliant on opaque AI suggestions, the human understanding embedded in mathematics could erode. A proof that no person can read in full — but that a computer certifies as correct — is a different kind of knowledge than what mathematics has traditionally produced. There are also unresolved questions about authorship credit when an AI system generates the key lemma, and about reproducibility when the underlying model is proprietary or has been retrained.
Educators are similarly divided. Some see Lean and its peers as a powerful pedagogical tool, forcing students to make every assumption explicit. Others worry that undergraduates leaning on AI suggestions will skip the productive struggle that builds mathematical intuition.
What to Watch Next
Several benchmarks loom on the near horizon. Researchers are watching to see whether AI systems can produce a novel, publishable result in a top-tier mathematics journal without a human first sketching the argument. The Polymath-style collaborative projects, which historically pooled dozens of human mathematicians, are experimenting with hybrid teams that include AI agents as full participants. And funding agencies, including the U.S. National Science Foundation, are beginning to earmark grants specifically for formalization infrastructure.
Whether this represents a genuine paradigm shift or simply a powerful new tool in an old craft will depend on the next two or three years. What seems beyond dispute is that the boundary between mathematics, logic, statistics, and computer science is thinning, and the people who learn to work fluently across all four will define the next era of discovery.
For more coverage of the ideas reshaping science and mathematics, visit science.wide-ranging.com for related reporting and analysis.


