Apple study reveals major AI flaw in OpenAI, Google, and Meta LLMs

Large Language Models (LLMs) may not be as smart as they seem, according to a study from Apple researchers.

LLMs from OpenAI, Google, Meta, and others have been touted for their impressive reasoning skills. But research suggests their purported intelligence may be closer to “sophisticated pattern matching” than “true logical reasoning.” Yep, even OpenAI’s o1 advanced reasoning model.

The most common benchmark for reasoning skills is a test called GSM8K, but since it’s so popular, there’s a risk of data contamination. That means LLMs might know the answers to the test because they were trained on those answers, not because of their inherent intelligence.

SEE ALSO:

ChatGPT-4, Gemini, MistralAI, and more join forces in this personal AI tool

In the kiwi example, the study said LLMs tended to subtract the five smaller kiwis from the equation without understanding that kiwi size was irrelevant to the problem. This indicates that “models tend to convert statements to operations without truly understanding their meaning” which validates the researchers’ hypothesis that LLMs look for patterns in reasoning problems, rather than innately understand the concept.

The study didn’t mince words about its findings. Testing models’ on the benchmark that includes irrelevant information “exposes a critical flaw in LLMs’ ability to genuinely understand mathematical concepts and discern relevant information for problem-solving.” However, it bears mentioning that the authors of this study work for Apple which is obviously a major competitor with Google, Meta, and even OpenAI — although Apple and OpenAI have a partnership, Apple is also working on its own AI models.

That said, the LLMs’ apparent lack of formal reasoning skills can’t be ignored. Ultimately, it’s a good reminder to temper AI hype with healthy skepticism.

Topics
Apple
Artificial Intelligence

Leave a Comment Cancel reply