AI in math: a convincing proof that is completely incorrect

A mathematical proof generated by AI can look very convincing and flawless, yet still be completely incorrect. Moreover, it is often impossible to understand how a language model arrives at an answer. This is a cause for concern among an international group of leading mathematicians—especially since mathematics also forms the basis for societal applications. TU/e mathematician Jim Portegies: “How much trust are we willing to place in systems whose internal workings are largely opaque?”

Unreliable results, lack of source citations, and dependence on closed commercial systems: these are just a few of the consequences of relying too heavily on AI in the field of mathematics. No wonder the international group of mathematicians is concerned about their field. They are calling on their colleagues to take action and are making recommendations in the Leiden Declaration on Artificial Intelligence and Mathematics. Signatories include former Minister of Education Robbert Dijkgraaf and Terence Tao, winner of the Fields Medal.

Credit: Lieke Vermeulen

A smart colleague who sometimes gets it wrong

AI in and of itself isn’t the problem, Portegies emphasizes. He uses the technology himself. “In day-to-day research practice, AI is increasingly being used as a thinking partner—not to provide definitive answers, but to explore ideas.”

For example, researchers use language models to test hypotheses, discuss strategies, or find new perspectives. In that role, he believes AI can be valuable. “The interaction sometimes feels like collaborating with a colleague who thinks on their feet, but isn’t always right.”

When the assistant takes over

The risk arises when researchers rely too heavily on AI’s outputs. This can become particularly problematic in mathematics, where every step must be verifiable. “There’s a strong temptation to trust a well-formulated text,” says Portegies. “Yet verifying the content itself remains essential.”

The cause lies in the way language models work. They do not reason like a mathematician. Instead, they predict which word, symbol, or formula is likely to follow what came before. As a result, they can produce texts that appear convincing but contain fundamental errors. “A manuscript can look completely credible,” says Portegies. “But in terms of content, it isn’t always correct.”

Implications for the academic world

This development has implications for scientific practice. According to Portegies, the number of AI-assisted manuscripts is increasing. Previously, reviewers could rely more on authors’ responsibility for the accuracy of their articles. This allowed them to focus primarily on the quality, relevance, and scientific value of the research. If articles were now required to be completely error-free, a much more thorough review would be necessary. In practice, however, it remains to be seen whether reviewers will actually carry out that extra scrutiny.

Dependence on U.S. tech companies

The rise of AI is sparking a broader discussion about technological dependence. Many popular language models have been developed by large U.S. technology companies, which provide little insight into their training data, algorithms, and decision-making processes.

This clashes with a key principle of science: transparency. “Science is all about verifiability,” says Portegies. European universities are therefore increasingly seeking alternatives. Projects such as OpenEuroLLM are working on open models whose training data, methods, and model weights are transparent. Yet a large proportion of scientists still rely on conventional language models.

From research to application

According to Portegies, the reliance on closed language models becomes even more urgent once mathematics is used not only in research but also in societal applications. Mathematics is at the core of an increasing number of such systems—from algorithms used by government agencies to applications in defense and security. Portegies: “How much trust are we willing to place in systems whose internal workings are largely opaque?”

As an example, Portegies cites the American technology company Palantir Technologies. The company develops software that combines and analyzes large amounts of data from diverse sources and is used worldwide by governments, defense organizations, and security services, among others.

In the Netherlands, too, the use of such systems has become a topic of discussion. For instance, it was recently announced that the Ministry of Defense intends to stop using Palantir software within a few years and is seeking a European alternative. The reason: concerns about dependence on a foreign company and the extent to which sensitive information within such systems can still be adequately controlled.

No ban, but greater awareness

The international group of mathematicians is not calling for a ban on AI in mathematics. On the contrary, they view the technology as an inevitable part of scientific practice. However, the group does make a series of recommendations.

Transparency is key. Researchers should be clear about when and how AI is used, with careful citation of sources and always a human ultimately responsible for the final result. They also advocate for stricter guidelines from universities, scientific journals, and funding agencies to ensure the quality and verifiability of mathematical research.

According to the group, policymakers and politicians also have a broader responsibility. They should invest in public alternatives to commercial AI systems and impose stricter requirements on the sector, so that knowledge and power do not end up exclusively in the hands of a small number of tech companies.

Portegies adds: “Think carefully about the type of model you use. There are various providers, with different revenue models and different ways of handling data. Look into alternatives, such as local models or European systems. For some applications, you really don’t need the newest or most powerful model.”

Where humans still outperform AI

To sum it all up: despite rapid progress, Portegies still sees clear limits to what AI can do. Language models can deliver impressive performance, especially when dealing with well-defined problems. But according to him, mathematics is about more than just finding answers. “An important part of mathematics is developing new concepts and new ways of thinking.” He believes that breakthroughs often arise when researchers develop a completely new language or a new framework that suddenly makes a problem simpler. “It’s precisely that kind of fundamental innovation that I don’t see AI delivering just yet.”