Apple paper: Why don't the Reasoning models

Apple’s research group for Machine Learning comes in one Research work To the so -called Large Reasoning Models (LRMS) to the conclusion that LRMS’s “thinking” could at least partially be an illusion. One problem is that Reasoning models demand significantly more energy and performance, which is already evident from the longer response times.

LRMS are AI models that should give regular voice models the ability to think of logical thinking. The systems try to disassemble tasks into different thoughts, which are also given to the user. However, it has so far been unclear whether the system really “thinks” internally or that the reasoning is not just additional content that has little influence on the end result.

Two LRMs against their LLM variants

Apple’s AI researcher looked at two LRMs for their paper: Claude 3.7 Sonnet Thinking and Deepseek-R1. The tasks used were primarily puzzles, including the River Crossing problem and the Tower of Hanoi-with different levels of complexity. It was shown that the two LRMs worked more precisely and more efficiently with simple tasks compared to their variant without any reasoning – with lower performance. Medium-heavy tasks seemed to lie to the Reasoning models. With a further increase in complexity, it didn’t matter how much power the LRMS was available: the accuracy dropped out massively.

“We have found that LRMS have restrictions on the exact calculation: You do not use explicit algorithms and argue across puzzles inconsistent,” said the Apple researchers. However, LRMS are by no means only used for puzzles – they can at least be helpful in other subject areas.

The drop in performance remains unexplained

The Apple examination was recorded differently in the AI scene. Some experts consider them too short, others praised the approach. In fact, the researchers have not found a real explanation for the loss of performance of the LRMS in more difficult tasks – which is also difficult, however, because in LRMS you can “look into” just as little as in regular large voice models (Large Language Models, LLMS). In addition, the question arises how strongly the results can be generalized: the chosen tasks were very special.

This also gives the Apple researchers: “We are aware that our work has limits. Our puzzle environments enable controlled experiments with a detailed control of problem complexity, but they only represent a small section of the contracts and may not record the variety of real or knowledge-intensive contracts.” So how illusional the “thinking” of LRMS is still remains open.

Discover more from Apple News

Subscribe to get the latest posts sent to your email.

Apple paper: Why don’t the Reasoning models

Two LRMs against their LLM variants

The drop in performance remains unexplained

Related

Discover more from Apple News

Leave a Comment Cancel reply