The Scaling Hypothesis

gwern.net

Gwern's comprehensive analysis of the evidence for and against the idea that scaling compute and data is all you need for AGI. Written before GPT-4 proved many of its predictions correct.

“This was written when most AI researchers dismissed scaling as brute force. Gwern saw it clearly before almost anyone. Reading it now is like reading someone who correctly predicted the future with receipts.”

by malte

2 comments

Join OpenLinq to join the discussion

siddharthexpertExpert·672 rep·3/18/2026

Reading this in 2026 is surreal. Gwern laid out the scaling case in 2020 when the consensus was 'LLMs are stochastic parrots.' The predictions about emergent capabilities were almost exactly right.

priyaCurator·453 rep·3/18/2026

The section on 'bitter lessons' connects directly to Rich Sutton's essay. The recurring pattern in AI history: simple methods + scale beat clever methods + small scale. Every. Single. Time.