
Terence Tao – Kepler, Newton, and the true nature of mathematical discovery
- Overview In this episode, Terence Tao joins Dwarkesh to explore the nature of mathema...
- The central thesis is that AI has dramatically lowered the cost of generating hypothe...
- The conversation moves from historical astronomy to the present moment in AI-assisted...
Readers looking for surprising ideas from global podcasts they may not find on their own.
Dwarkesh Podcast / Dwarkesh Patel
Overview
In this episode, Terence Tao joins Dwarkesh to explore the nature of mathematical discovery through the lens of Kepler's centuries-long struggle to derive the laws of planetary motion, drawing profound parallels to the current state of AI in mathematics. The central thesis is that AI has dramatically lowered the cost of generating hypotheses—much like Kepler's decades of trial-and-error with Platonic solids, musical harmonies, and random correlations—but has not yet solved the harder problem of verification, cumulative understanding, or recognizing which ideas constitute genuine progress. The conversation moves from historical astronomy to the present moment in AI-assisted mathematics, where models have solved roughly 50 of 1,100 Erdős problems but hit a plateau, and where the real bottleneck has shifted from idea generation to evaluation, persuasion, and the social fabric of science itself.
---
Kepler as a High-Temperature LLM
Tao recounts the story of Johannes Kepler, who began with a beautiful but wrong theory: that the six known planets' orbits corresponded to the five Platonic solids nested within spheres. This theory seemed to fit the data approximately, but when Kepler finally obtained Tycho Brahe's extraordinarily precise naked-eye observations—ten times more accurate than any previous data—his beautiful theory failed by about 10%. What followed was two decades of what Tao calls "trying random relationships": Kepler tested musical harmonies, geometric ratios, and astrological connections, eventually stumbling upon the three laws of planetary motion buried within a book largely devoted to the "harmonics of the world."
Dwarkesh proposes the provocative analogy that Kepler was a "high-temperature LLM"—generating vast numbers of hypotheses, most of them wrong, with only a tiny fraction surviving verification against Brahe's dataset. Tao largely accepts this framing but emphasizes that Kepler's success depended on three factors working together: Brahe's meticulous data collection, Kepler's relentless hypothesis generation, and the mathematical tools (Euclidean geometry) available to match models to data. The verification loop, however, was extraordinarily long—decades for Kepler's laws themselves, and nearly a century before Newton provided a unifying explanation.
Tao notes that the prestige in science has historically gone to the "eureka" moments of idea generation, but the actual process involves a dozen components: problem identification, data collection, strategy formulation, hypothesis testing, validation, and communication. Kepler cycled through many ideas he never published because they simply didn't fit the data, and this hidden trial-and-error is an essential but underappreciated part of scientific progress.
---
How Would We Recognize a Unifying Concept in AI Slop?
The conversation turns to a critical question: if AI systems generate millions of hypotheses, how do we identify which ones represent genuine conceptual breakthroughs rather than mere empirical regularities? Tao points out that many great ideas were poorly received at first—deep learning itself was a niche field for decades before bearing fruit. The concept of the "bit" (binary digit) seems obvious in retrospect, but alternative systems like ternary logic (trits) could have become standard in an alternate history.
Tao argues that assessing an idea's fruitfulness depends on the future—on which ideas get adopted, standardized, and built upon. The base-10 number system is extremely useful, but there's nothing special about 10; we're "stuck with it" due to inertia. Similarly, the transformer architecture became foundational for large language models, but it didn't have to be that way. This means you cannot objectively grade a scientific achievement in isolation without understanding its past context and future trajectory—making it fundamentally different from the kind of localized problems that reinforcement learning can solve.
The deeper challenge is that correct theories often initially appear *worse* than their incorrect predecessors. Copernicus's heliocentric model was less accurate than Ptolemy's geocentric system, which had been refined over a millennium with increasingly complex ad-hoc fixes. It took Kepler to make heliocentrism more accurate. Similarly, Newton's theory of gravity had mysterious features—action at a distance, the equivalence of inertial and gravitational mass—that were only resolved centuries later by Einstein. Progress often comes not from adding more theories but from *deleting* assumptions, like the Aristotelian notion that objects naturally want to stay at rest.
---
The Deductive Overhang
Dwarkesh raises the concept of "deductive overhang"—the idea that with the right insight, one can extract far more knowledge from existing data than currently recognized. Tao agrees, noting that astronomy was one of the first sciences to embrace extreme data analysis because data was the bottleneck. Astronomers became "almost world-class" at extracting conclusions from minimal traces of information, a skill that quant hedge funds now actively recruit for.
Tao offers a clever example of extracting hidden information: researchers studying citation practices realized they could measure how often scientists actually read the papers they cite by tracking typographical errors in citations. If a typo (a wrong number or punctuation mark) gets copied from one reference to the next, it suggests the author was cutting and pasting without verification. This kind of "Sherlock Holmes" approach to data—finding signals in noise—is something astronomers excel at and something that could be applied to evaluating scientific progress itself.
The implication is that there may be useful metrics hidden in citation networks, conference mentions, and other sociological data that could help identify which AI-generated ideas represent real progress. But Tao cautions that we don't yet have the frameworks to do this at scale.
---
Selection Bias in Reported AI Discoveries
Tao addresses the recent flurry of AI-assisted solutions to Erdős problems. As of the conversation, AI programs had solved roughly 50 out of 1,100 problems, but this initial burst has plateaued. Tao explains that there was a brief period where frontier models could "one-shot" solutions, but that has stopped—not for lack of trying. Multiple attempts to have AIs attack all remaining problems simultaneously have yielded only minor observations or rediscoveries of already-solved problems.
The key insight is selection bias: the 50 solved problems were almost all ones with virtually no existing literature—problems Erdős posed once or twice, that people tried casually and failed, but that turned out to have solutions accessible by combining one obscure technique with another result. When you look at the success rate systematically, any given problem has only a 1-2% chance of being solved by current AI. The successes get broadcast on social media, creating an impression of rapid progress, while the failures remain invisible.
Tao uses a vivid analogy: imagine a mountain range in the dark, with cliffs of varying heights. Current AI tools are like "jumping machines" that can leap two meters into the air—higher than any human. They cleared all the three-foot walls quickly, but now they're stuck. They cannot do what human mathematicians do: make partial progress, identify intermediate handholds, build up cumulative understanding, and then pull others up behind them. Each new AI session starts from scratch, with no memory of what it learned in previous attempts.
---
AI Makes Papers Richer and Broader, But Not Deeper
Tao reflects on his own productivity changes since adopting AI tools. He made a prediction in 2023 that by 2026, AI would be a "trustworthy co-author if used correctly," and he feels this is holding up. However, measuring productivity gains is tricky because the *type* of work has changed.
Tao estimates that papers he writes today would take "five times longer" without AI assistance—but that's because he now includes far more code, plots, numerical experiments, and deeper literature searches. The core intellectual work of solving the hardest parts of a math problem still happens with pen and paper. AI handles "silly things" like reformatting parentheses, generating visualizations that would have taken hours, and conducting literature surveys. The result is that papers have become "richer and broader, but not necessarily deeper."
This distinction between breadth and depth becomes a major theme. Humans excel at depth—focusing on one or two really important problems and making deep conceptual progress. AI excels at breadth—trying thousands of approaches simultaneously, mapping out entire fields, clearing low-hanging fruit. Tao envisions a future where AI first maps out a field, makes all the easy observations, and identifies "islands of difficulty" that human experts then tackle. But we don't yet have the paradigms to take full advantage of this complementarity.
---
Can Humans Extract Understanding from AI-Generated Proofs?
A central question: if an AI proves the Riemann Hypothesis in Lean (a formal proof assistant), will we actually understand anything new? Tao is cautiously optimistic. He notes that some problems, like the Four Color Theorem, have been solved by brute force with no elegant conceptual proof—and that might be the fate of some problems. But the Riemann Hypothesis feels different; most mathematicians believe that solving it will require genuinely new mathematics, new connections between previously unconnected fields.
The beauty of formal proofs in Lean is that they can be studied atomically. Each lemma can be examined in isolation. Future mathematicians might become specialists in "proof ablation"—taking a giant AI-generated Lean proof and systematically removing parts to find more elegant versions. Other AIs could be trained to grade proofs for elegance or to refactor them into more understandable forms.
Tao points to a practical example: when an AI generates a 3,000-line Lean proof for an Erdős problem, people then use other AIs to summarize it, and humans write their own simplified versions. There's already a "post-processing" pipeline that deconstructs and interprets AI proofs. So even if the initial output is incomprehensible, the artifact of the proof enables analysis that can yield understanding.
---
The Need for a Semi-Formal Language for Scientific Strategies
Tao identifies a crucial gap: we have formal languages for proofs (like Lean) but no equivalent for the *strategies* and *heuristics* that scientists actually use. When Gauss computed the first 100,000 primes and conjectured the Prime Number Theorem, he was making a statistical claim—that primes become sparser according to a specific mathematical law—without any proof. This kind of "conjectural framework" is how mathematicians actually think: they develop probabilistic models, like the "random model of the primes," that are non-rigorous but extremely accurate.
The twin prime conjecture is a perfect example. Mathematicians are absolutely convinced it's true because if primes behaved like random numbers with a certain density, infinite twin primes would appear almost surely. This heuristic has been validated every time mathematicians have been able to prove something about primes—the rigorous results match the random model's predictions. The Riemann Hypothesis is believed for similar reasons; if it were false, it would mean there's a secret pattern in the primes we didn't know about, which would undermine confidence in cryptography based on prime numbers.
Tao wishes for a "semi-formal language" that could capture this kind of reasoning—assessing plausibility, building narratives, communicating uncertainty—in a way that AI could participate in. Currently, this is done through human judgment and the "test of time," which doesn't scale to millions of AI-generated hypotheses. Developing such a framework would be a major breakthrough, but it's "more of a wish than a plan."
---
How Terry Uses His Time
Tao describes himself as a "fox" (knowing many things) rather than a "hedgehog" (knowing one thing deeply), borrowing Isaiah Berlin's famous distinction. He has an obsessive completionist streak—if someone else can do something with mathematics that he cannot, it "bugs him" until he figures out their trick. This drives him to learn new fields, often through collaboration with specialists who teach him their techniques.
He emphasizes the importance of writing things down. Early in his career, he would learn a clever trick and think he'd remember it, only to find six months later that he'd forgotten not just the trick but even the fact that he'd understood it. His blog began as a way to record "anything cool that I've learned" so he wouldn't lose it. Writing blog posts is something he does "when I don't want to do other work"—it's creative and fun, taking anywhere from half an hour to several hours.
Tao also defends serendipity against optimization. He deliberately leaves unscheduled time in his day for unexpected interactions, arguing that the COVID-era shift to fully scheduled remote meetings eliminated the "casual knocking on the hallway" that often leads to important discoveries. He recalls that as a grad student, physically going to the library to find a journal article would lead him to browse adjacent articles and accidentally find interesting things—a process that's been lost now that you can instantly search for exactly what you want. Even a year at the Institute for Advanced Study, with no distractions, eventually led to boredom and reduced inspiration. "You actually do need a certain level of distraction in your life," he says. "It somehow adds enough randomness and temperature."
---
Human-AI Hybrids Will Dominate Math for a Lot Longer
When asked when AI will replace human mathematicians entirely, Tao resists the framing. He points out that calculators already do "frontier math" that humans cannot—but that didn't end mathematics. A 19th-century mathematician's job of laboriously solving differential equations has been automated by computer algebra systems, but mathematicians moved on to different problems. Similarly, sequencing a single organism's genome used to be an entire PhD; now it costs $1,000, but genetics has moved to studying whole ecosystems.
Tao predicts that "hybrid human plus AIs will dominate mathematics for a lot longer." Current AI is "very good at certain things, but really terrible at others," and while frameworks can be added to reduce error rates, "we don't have all the ingredients to really have a truly satisfactory replacement for all intellectual tasks." He warns that AI could also destroy serendipity by making everything too efficient, potentially inhibiting certain types of progress.
His advice to early-career mathematicians: embrace change. The things you study may become obsolete or revolutionized, but some things will be retained. Non-traditional opportunities are emerging—high school students can now contribute to frontier math research using AI tools and Lean. But traditional education and credentials will still matter for a while. The key is adaptability: "pursuing things just for curiosity, for playing around," while being open to ways of doing science that don't exist yet.
---
Conclusion
This episode matters because it reframes the AI-in-science debate from "when will AI replace humans" to "what is the actual bottleneck in scientific progress?" Tao's historical perspective—from Kepler's decades of trial-and-error to Newton's unification to Gauss's statistical conjectures—reveals that hypothesis generation has never been the scarce resource. The hard parts are verification, cumulative understanding, persuasion, and the social structures that separate signal from noise. AI has made idea generation nearly free, but we haven't built the equivalent of peer review, citation analysis, or "proof ablation" at scale. The episode leaves the listener with a sense that the future of mathematics will be deeply collaborative between humans and AI, but that the most important breakthroughs may come not from autonomous AI systems but from new ways of thinking about what progress even means.
---
Key Takeaways
- Kepler's discovery process—decades of trying random relationships against Brahe's data—is a historical analog to how current LLMs generate and test hypotheses, but the verification loop for correct ideas can take centuries.
- AI has driven the cost of hypothesis generation to near zero, but this shifts the bottleneck to verification, evaluation, and recognizing which ideas constitute genuine progress—tasks we don't know how to do at scale.
- Correct theories often initially appear worse than incorrect but well-developed alternatives (Copernicus vs. Ptolemy, Newton's mysterious action-at-a-distance), making it hard to evaluate AI-generated ideas in real time.
- Current AI tools have solved ~50 of 1,100 Erdős problems, but this was a one-time clearing of low-hanging fruit; systematic success rates are only 1-2% per problem, and models cannot build cumulative understanding across sessions.
- AI makes papers "richer and broader, but not necessarily deeper"—handling auxiliary tasks like visualization and literature searches while the core intellectual work still requires human insight.
- Formal proof assistants like Lean enable atomic analysis of AI-generated proofs, potentially allowing future mathematicians to extract understanding even from incomprehensible outputs through "proof ablation."
- A semi-formal language for scientific strategies and heuristics (not just proofs) is needed to allow AI to participate in the conjectural, probabilistic reasoning that actually drives mathematical progress.
- Serendipity and inefficiency have real value in science; over-optimization risks destroying the accidental discoveries that come from browsing, hallway conversations, and unstructured time.