Search Open/Close
Open/Close Header Details
Search
LLMs donโ€™t do formal reasoning - and that is a HUGE problem
Andreas Batsis / Wednesday, November 27, 2024 / Categories: The AI-Scape

LLMs donโ€™t do formal reasoning - and that is a HUGE problem


TL;DR Version

Where no health or money is at stake, LLMs can make all the difference!

 

A Little Longer Version

Apple researchers have critically assessed the reasoning capabilities of ๐—Ÿ๐—Ÿ๐— ๐˜€, arguing that ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—ถ๐—ป๐˜๐—ฒ๐—น๐—น๐—ถ๐—ด๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ถ๐˜€ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฎ๐—ป๐—ฑ ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ๐—น๐˜† ๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ผ๐—ป ๐—บ๐—ฒ๐—บ๐—ผ๐—ฟ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฟ๐—ฎ๐˜๐—ต๐—ฒ๐—ฟ ๐˜๐—ต๐—ฎ๐—ป ๐—ด๐—ฒ๐—ป๐˜‚๐—ถ๐—ป๐—ฒ ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐—ถ๐—ป๐—ด.

THE HARSH TRUTH: LLMs struggle with reasoning when faced with ostensibly relevant, but deliberately distracting information.

Key points of "excellence" of LLMs are summarised below:

  • Performance Decline: LLMs perform adequately on small problems but exhibit a marked decline in performance as problem complexity increases, a trend observed in both older and newer models.
  • Arithmetic Limitations: LLMs consistently fail at basic arithmetic tasks, particularly with larger numbers, unlike traditional calculators which maintain accuracy.
  • Chess Rule Violations: LLM inability to follow chess rules exemplifies the broader issue of inadequate formal reasoning in LLMs.

THE EVEN HARSHER TRUTH: Patterns of failure observed are systematic and not merely isolated incidents. 

Some Examples

Sequential patterns or reasoning (#not)

Sequential patterns or reasoning (#not)

 
Ostensibly Relevant - Deliberately Distracting

Ostensibly Relevant - Deliberately Distracting

Sources

Read the excellent article by Gary Marcus.

Need a more comprehensive version? Visit Apple Speaks the Truth About AI. It’s Not Good.

Print
Rate this article:
No rating
32
Tags:AI hype
External Linkhttps://garymarcus.substack.com/p/llms-dont-do-formal-reasoning-and
blog comments powered by Disqus
Back To Top