Three Tools Before You Think — Occam, First Principles, and the Zeroth Law

Before the book gets complicated — and it does get complicated, I want to be honest about that upfront — I arm you with three thinking tools. I call them tools rather than theories because theories live in textbooks, and tools are what you actually pick up when something needs doing. These three have shaped every insight in this book, and once you see them, you'll spot them everywhere: in how Einstein dismantled Newton, in how a 7th-century Indian mathematician changed arithmetic forever, in how every major AI breakthrough of the last decade happened. They are not the same tool. People confuse them constantly. The confusion is expensive.

I didn't learn any of this in a classroom. I learned it the way entrepreneurs tend to learn things: by making decisions with incomplete information, being wrong in instructive ways, and eventually noticing the pattern that separates the decisions that held up from the ones that didn't. By the time I recognized I was using these three principles, I had already been using them for years — which is either a compliment to intuition or an indictment of formal education. Possibly both.

I. Occam's Razor — The Simplest Blade

Occam's Razor

Given a binary choice to answer a question, the simpler explanation — the one that makes the fewest assumptions to arrive at an answer — is necessarily the more accurate one. If an explanation requires more scaffolding than the problem it solves, discard the scaffolding.

William of Ockham was a 14th-century English friar who did not own a razor and would likely have been baffled by the metaphor. What he argued — with considerably more Latin than is reproduced here — is that entities should not be multiplied beyond necessity. In plain English: if two explanations account for the same facts, the one with fewer moving parts is probably correct. The universe, as a general rule, does not add complexity without a reason.

The common misreading is that Occam's Razor means "the simplest answer is always right." That is not quite it. The simplest answer that accounts for the available evidence is right. Occam's Razor is not a license to be lazy; it's a prohibition on being needlessly baroque. There's a difference, and collapsing it is responsible for a surprising amount of bad thinking.

Here is where I start applying it — to the most extravagant question I know. Why does the universe exist at all? The question sounds like a philosophical party trick, but it has a structure worth examining. Every explanation I've encountered loops back into itself. You're told the universe began because it began. You're told God created it, and then you're required to explain God, at which point you're trying to understand the origins of two things instead of one — which is definitionally a less economical explanation, not more. God, as a solution to the problem of cosmic origins, fails Occam's Razor: it requires a more extensive explanation than the one it is supposed to provide.

What Occam's Razor does instead is comforting in its severity: it tells me the universe didn't exist at one point, which implies it didn't have to exist at any point. If the most stable state for a universe is, intuitively, nothingness — then there is no requirement for trees, squirrels, pizzas, or any of us to exist. We are here not because we had to be, but because the conditions that would have prevented us simply didn't obtain. That is either terrifying or liberating, depending on your Friday evening.

II. First Principles — The Builder's Method

First Principles

Break any complex problem into its most fundamental elements — the core truths that cannot be decomposed further. Rebuild the solution from those elements. Most new ideas are born from the recombination of existing fundamentals from different fields.

First Principles thinking is the method of reconstruction after Occam's Razor has finished cutting. Where Occam strips away the superfluous, First Principles asks: what remains when everything extraneous is gone? And then: what can I build from that?

The most entertaining example in the literature is also the most fictional one. Sherlock Holmes was, at his core, a first-principles thinker. "When you have eliminated the impossible," he told Watson with the maddening calm of a man who never had to do his own laundry, "whatever remains, however improbable, must be true." That is reductionism in a deerstalker hat — systematically eliminating what cannot be, until you are left with what must be, and then building your conclusion from that residue.

The modern version, less entertaining but more actionable, is what Elon Musk described when asked why he built his own rocket. The conventional wisdom was that rockets cost $65 million and always would, because that's what rockets cost. First Principles thinking asks a different question: what are rockets made of? Aerospace-grade aluminium, steel, copper, carbon fibre, electronics. What do those materials cost on the commodity market? A fraction of the finished rocket price. Therefore the gap between materials cost and rocket price is an assumption, not a law of physics — and assumptions can be interrogated.^[1]

This is the distinguishing feature of First Principles: it is productive. It doesn't just clear away the clutter, as Occam does. It gives you the raw materials to assemble something new. The cross-pollination between fields is particularly potent — when you reduce two apparently unrelated domains to their fundamentals, you sometimes discover they share the same bones. Biology and information theory, for instance, have been having the same conversation for decades under different names. So have economics and thermodynamics. So, increasingly, has everything with AI.

The large language models at the centre of this book's argument were built on first-principles thinking about language. Instead of hand-engineering rules for grammar and meaning — the approach that defined decades of previous AI research — the researchers asked: what is language, fundamentally? Patterns of co-occurrence at enormous scale. What is understanding, fundamentally? The ability to predict what comes next given what came before. Strip out all the interpretive scaffolding. Train on the raw material. See what emerges. What emerged was GPT-4, Claude, and a complete restructuring of what we thought machines could do.

"Most new ideas are born from the amalgamation of already-existing ideas. The cross-interplay of ideas from different fields opens you up to discovering new ideas that neither field could have produced alone."

— from History's Future, Ch. 1

III. The Zeroth Principle — The Dangerous One

The Zeroth Principle

Challenge all existing assumptions about a subject by starting to understand it again from the ground up. Unlike First Principles, which breaks problems into fundamentals within a field, the Zeroth Principle questions whether the field's fundamentals themselves are valid. Any Zeroth insight supported by proof reshapes how reality is understood.

Here is where the thinking gets uncomfortable, and where most people quietly excuse themselves. The Zeroth Principle is not about building from fundamentals. It is about questioning whether the fundamentals are fundamental. It is the cognitive move that says: I don't just want to rebuild the house from better materials — I want to ask whether the ground it's sitting on is actually ground.

The example I find cleanest, because it is so improbable it still feels slightly miraculous, is the number zero. For most of recorded history, zero was not a number. It was an absence — a placeholder, a gap, a nothing-where-something-might-be. Numbers were tools for counting things. You cannot count zero things. Therefore zero was not a number. This was not considered controversial. It was considered obvious.

In 628 AD, the Indian mathematician and astronomer Brahmagupta decided it was not obvious at all. He defined zero as a number in its own right: the result of subtracting a number from itself. He established arithmetic rules for it. He defined it as the empty sum — sunya — and demonstrated that a number minus itself doesn't yield nothing; it yields zero, which is a precise computational concept with its own properties. This was a Zeroth Principle insight: he didn't refine the existing understanding of numbers. He questioned whether the existing understanding was valid, decided it was not, and replaced it with something categorically different.^[2]

Zero, once established, unlocked algebra, calculus, binary code, and the entire computational architecture of the modern world. A concept that literally meant "nothing" turned out to be the load-bearing foundation for most of mathematics. Brahmagupta had no way of knowing any of that in 628 AD. He was just following the logic of the Zeroth Principle to where it led.

Albert Einstein did the same thing — twice. His special theory of relativity in 1905 did not refine Newtonian mechanics. It replaced the assumption that time and space are fixed, absolute, and the same for all observers — a belief so embedded in physics that it had never seriously been questioned — and showed that they are not. His general theory in 1915 then questioned whether gravity was even a force at all, and proposed instead that it is the geometry of spacetime responding to mass. Two hundred years of Newtonian physics: not wrong, exactly, but a special case of a much stranger truth that nobody had known to look for because nobody had thought to question the floor.^[3]

The distinction between First Principles and Zeroth Principles matters more than it might seem. First Principles says: take the accepted elements and recombine them more cleverly. Zeroth Principles says: are these actually the elements? First Principles is what a very good engineer does. Zeroth Principles is what happens when physics rewrites itself. Both are necessary. But they are not the same operation, and confusing them is how you end up with incremental improvements in a domain that needs a revolution.

🌌 Speaking of questioning the floor. The Cosmic Calendar compresses 13.8 billion years into a single year — and forces you to reconsider what "recent" means. Explore it →

The Three Principles Meet Artificial Intelligence

I want to say something that the book argues at much greater length, and which I think is underappreciated even in the communities actively building AI: the current AI revolution is a Zeroth Principle event.

For decades, the dominant paradigm in artificial intelligence was symbolic AI — hand-crafting rules, if-then logic trees, expert systems. The assumption underlying all of it was that intelligence could be encoded explicitly: that a human expert could articulate what they knew, and a programmer could translate that articulation into code, and the resulting system would approximate intelligent behaviour. This assumption was not considered controversial. It was considered the only viable approach.

The deep learning revolution — the one that has now produced systems that write, reason, generate images, and pass bar exams — did not refine symbolic AI. It questioned whether explicit rule-encoding was even the right model for intelligence. It proposed instead that intelligence might emerge from pattern recognition at sufficient scale. That understanding might not require explicit rules at all. That you could get a system to exhibit coherent, generative reasoning not by telling it things, but by exposing it to enough of humanity's accumulated output and letting it find the structure itself.

That is a Zeroth Principle move. It questioned the ground. And when the ground turned out to be shiftable, everything built on the old assumptions had to be rebuilt.

We are now, I believe, at the edge of the next Zeroth Principle moment in AI — the question of whether the architecture of the brain, and by extension the architecture of human cognition, is the right model for machine intelligence. Most current AI systems are, in some sense, still loosely modelling human-language-processing. What happens when someone asks whether that's even necessary? What happens when the first Zeroth Principle thinker looks at AI and decides that intelligence doesn't have to look anything like the version we've been trying to replicate?

I don't know the answer. Neither does anyone else — which means we're standing exactly where Brahmagupta stood in 627 AD, the year before he invented zero: at the edge of something that will seem obvious in retrospect and is currently invisible.

That's where the book is trying to take you. Not to answers I've already worked out, but to the edge of the questions that don't yet have them. Arm yourself with these three tools. The universe is about to get considerably more complicated, and you'll want something sharp.

Sources & Further Reading

Elon Musk on First Principles thinking, interview with Kevin Rose (2012); widely cited. SpaceX's materials-cost analysis of launch vehicles. ↩
Brahmagupta, Brahmasphutasiddhanta (628 AD). Secondary source: Georges Ifrah, The Universal History of Numbers, Wiley (2000). ↩
Albert Einstein, Zur Elektrodynamik bewegter Körper (On the Electrodynamics of Moving Bodies), Annalen der Physik, 1905. Walter Isaacson, Einstein: His Life and Universe, Simon & Schuster (2007). ↩
Arthur Conan Doyle, The Sign of the Four (1890). Holmes quotation on elimination as method.
William of Ockham, Summa Logicae (c. 1323). The original formulation: entia non sunt multiplicanda praeter necessitatem.

← Essay 001: The Free Fall All Essays →