The Laws of Thought by Tom Griffiths The Quest for a Mathematical Theory of the Mind

What's it about?

The Laws of Thought (2026) is a deep dive into the world of cognitive science – the quest to understand the laws that govern our minds. It gives a broad and detailed account of the history of the discipline, starting with the foundations of formal logic before moving through behaviorism, early computational theories, semantic research, artificial neural networks, and finally probability theory. By offering insights into three main approaches to understanding the mind, it offers an intricate picture of the Laws of Thought.

Some 300 years ago, a small group of prominent thinkers set out to uncover the laws that governed the world around them. Newton, Descartes, Hobbes, Leibniz – they were all on a quest for the Laws of Nature, a way of applying mathematical theories to ascertain how things work. Fast forward to today and laws that govern our world – from gravity to acceleration – are taught in every physics classroom around the world. But what about the world inside of us?
Those enlightenment figures were also looking for the Laws of Thought. That quest would ultimately lead to formal logic, cognitive science, Bayesian models of the mind and even the neural networks that power today’s AI. And this lesson will take you through that whirlwind of a thinking journey. So, ready to think hard about thinking? Then let’s get started!
Gottfried Wilhelm Leibniz might be the smartest man who ever lived. Born in 1646 in Leipzig, Germany, he developed calculus in tandem with Isaac Newton. But however hard he tried, he couldn’t figure how to use maths to describe our minds. Leibniz was working off Aristotle, whose own research was based on the syllogism – an argument with two premises and a conclusion that follows from the premises if they are true.
Here’s an example: Every healthy person is happy. Some healthy people are rich. Therefore some rich people are happy. If the premises are true, so is the conclusion. Aristotle identified 14 types of syllogisms, using a mix of intuition and proofs. But Leibniz wanted to go further; he wanted a mathematical theorem that captured the validity of syllogisms.
He wanted what we today call a formal system. Several great minds were with him on this quest – Descartes included. But it would be a 17-year old English schoolteacher who made the next breakthrough, over a century later. Walking through a field one day, George Boole had a sudden realization: human thought could be expressed in algebraic form. The trick was to restrict your variables so that they can only be equal to 0 or 1. In this way Boole developed algebraic statements to capture logical statements like “Every A is B.
” Major philosophers would build on Boole’s insights over the next century to develop an increasingly complete system of formal logic, which came to be known as propositional logic. But hang on, you say – why does this even matter to us? Well, as living beings one of our chief problems is knowing what to believe. If we know some things are true, what else can we trust to be true?
Formal logic offers a way to answer this. It reduces semantics to syntax, meaning it reduces the problem of finding truth to following a set of rules. In this way it presents a step in demystifying the nature of thought. But this is far from the end of our journey.
The first half of the twentieth century saw the emergence of psychology, a new and unusual science in which the subject matter could be neither seen nor touched Early psychologists worked around this problem by observing people’s behavior, and then inferring backward to arrive at the thoughts and feelings behind that behavior. One early experiment by Wilhelm Wundt involved measuring the time between a participant hearing a sound and pressing a button, on which basis he made inferences about the nature of the mind. Many psychologists thought this was stretching it – by a lot. So they proposed a counter-theory, which stated that psychology should only study those things about humans that are actually observable.
The workings of the mind were off limits. This was the birth of behaviorism, and it would dominate how we think about the mind for half a century. But Jerome Bruner soon came along to show the cracks in the theory. He carried out a number of groundbreaking studies with his colleagues at Harvard shortly after WWII. In one famous study with Cecile Goodman, children were asked to estimate the size of a coin by adjusting an apparatus that created a circle of light. Their goal was to get the light circle to match the size of the coin.
The researchers found that children consistently overestimated the size of the coins, especially for coins of larger value. And even more interestingly, children from poorer backgrounds tended to overestimate the coin sizes more than those from affluent backgrounds. These results were a direct challenge to behaviorism: they suggested that people respond differently to the outside world depending on something immeasurable: our inside worlds. But while this challenge was a devastating one, it didn’t provide an alternative path to behaviorism – it didn’t show us a scientifically sound way of studying the mind. For this Bruner would have to dive into the budding world of computation. And that’s where we’ll head next.
In 1833, English mathematician and inventor Charles Babbage started working on his Analytical Engine, a machine that could carry out any arithmetic calculation using instructions provided by punch cards. This would have been the earliest form of what we now call a computer. But Babbage died before he could complete it Some hundred years later, Alan Turing, a fellow at King’s College Cambridge, picked up the baton, with his conception called the Turing machine. The machine’s design involved a moving head and an infinitely long piece of tape.
The head could read or write either a 0 or a 1 on the tape, and could move through a number of finite states. The machine would be provided with a set of rules, which would instruct it what to write and when to switch to the next state, depending on what state it was currently in and what it read on the tape. By translating the rules of branches of mathematics to rules for the machine, the Turing machine could solve any mathematical problem that a human could. Meanwhile, across the pond at MIT, Claude Shannon was breaking ground on how to connect the worlds of logic and electricity. By setting up a number of switches along circuits, in sequence or in parallel, he found configurations that represented the basic components of propositional logic, such as “and” and “or”. Turing and Shannon would both end up spending time at Princeton University.
And while they were there, they would come across the mathematical whirlwind that was John von Neumann. By 1933 the Hungarian-born Jewish mathematician and physicist had fled Germany, and was determined to do his utmost to stop the spread of totalitarianism. In 1944, he joined the US Army’s efforts in building electronic computers. Key to von Neumann’s research was understanding how a computer could store and recover information from memory.
And this involved figuring out how to represent facts in a way that a computer could understand. Jerome Bruner visited Neumann while he was researching this, and he was inspired. He set to work on showing how psychologists could legitimately study human thought, based on the same principles that von Neumann was working on.
The cognitive revolution had just begun. The 1950s saw the explosion of cognitive science. A string of key papers propelled the discipline forward, as researchers grappled with the task of explaining the mind scientifically. One approach that emerged was that of rules and symbols – formal systems based on logic.
American scholars Allen Newell and Herbert Simon introduced the physical symbol hypothesis, whereby a collection of symbol structures could be produced that describes objects and their behaviors in the physical world. But wait, you might ask; how can a bunch of symbols ever represent the complexity of human behavior? Simon’s answer was an invitation to think about ants walking on a sand dune. The trails the ants leave on the sand are incredibly intricate. But the ants are only following simple rules. It’s the complexity of their environment that gives us the shapes, and perhaps the same could be said of humans.
Around the same time, Noam Chomsky was trying to figure out a formal system to describe language. The system he developed, based on what he called phrase structure grammar, was as rich as it was complex. But it also raised an important question. Children only have a limited exposure to language. And yet they are able to acquire it with apparent ease. How?
Chomsky’s answer was that we all possess innate knowledge, which is located somewhere in our genes. The question of language acquisition actually touches upon a much broader problem for formal logic systems: induction. Formal logic works based on deduction – we know some things to be true, and from those known things we draw conclusions about the truth of other things. Induction requires us to accept a conclusion that appears to be supported by the facts, but which isn’t absolutely certain. For example, we may notice that the sun rises every morning. But this doesn’t mean we can claim with absolute certainty that it will rise tomorrow.
Human thought is full of induction. Without it, we wouldn’t be able to get by in the world. Even vision involves induction: there simply aren't enough patterns of light falling on our retina to distinguish the three-dimensional world with absolute certainty. Visual illusions play on this.
So if formal logic, through rules and symbols, can’t fully account for induction, do we need another way to think about how our minds work? One of the hardest blows to the rules and symbols approach to describing the mind came from the highlands of Western New Guinea. In 1968 California native Eleanor Rosch travelled there to study the local Dani people.
The Dani only had two words for colours: “mola” and “mili”. She enrolled some of the community’s members in a color memory experiment that had previously been conducted by Roger Brown on Harvard undergraduates. The Harvard undergraduates had found it easier to remember colors that were a good match for the names of colors in English. Rosch discovered that the Dani people more easily remembered those very same colors – even though they didn’t even have words for them!
It seemed to Rosch that colors had a definite center – a certain shade that people agreed on, and then a less definite boundary, where disagreements were possible. What’s more, it looked like this had more to do with human perception than with any language or culture. But what if this wasn’t just the case for colors? What if all categories, from blue and red to things like “toy”, “sport” and “furniture” had a definite centre but a fuzzy boundary? This is tricky for a formal logic system, where propositions are either true or false. So how do we account for this?
What eventually emerged was a new approach to understanding categories, based on spaces and features. The idea here is to think of categories like “blue” and “furniture” as an ideal prototype – the perfect blue, or a simple chair. Each category is located in psychological space in relation to other categories, depending on how many features they do or don’t share. So when you encounter an object in the world, like a shade of blue, you call it “blue” because it falls closer to the “blue” prototype than other prototypes in the psychological space.
This way of thinking solved the problem of fuzzy boundaries, providing a new way to describe our internal states. But it didn’t explain how we move through different internal states – in other words, how we actually think. That would lead us back to the world of computing, where a revolutionary new model would be born.
The perceptron. Sounds like science fiction, right? Well at the time, it was. This was a machine designed by researcher Frank Rosenblatt – an artificial visual system that could classify a range of images based on the visual input received by an artificial retina – much like what our brains do with the light patterns received on our own retinas.
The perceptron’s design involved three layers. First there were sensory units that would activate in response to light patterns. These would be connected to association units, which would sum up the signals received by the sensory units and activate if they passed some predetermined threshold. Finally, the association units would be connected to response units, which would do the same thing, summing and thresholding, and ultimately give a classification of the image presented to the artificial retina. The perceptron could recognize a shape as a triangle, for example, or state if a shape appeared on the left part of an image. Crucially, the connections between the association units and the response units had different strengths or weights.
A different set of weights would be used to solve a different classification problem. And the perceptron would be able to learn those weights on its own, by adjusting them whenever it made a mistake. By many accounts, the perceptron was the first ever artificial neural network. But it could only handle simple tasks. To push the field further, researchers needed to figure out how to represent the complex world around us in ways that a perceptron could understand. In the field of language acquisition, the results were fascinating.
Neural network models were developed that could predict the next word in a sequence, or get a sense for the contextual meaning of words. The idea of representing words as points in a semantic space was also developed. In modern day neural networks, these points in space are referred to as word embeddings, and they form a crucial part of how large language models operate. As more and more data became available for training these networks, they got more and more powerful.
And in so doing they solved increasingly complex inductive problems. They arrived at correct conclusions from sets of data, in a systemized way but without using formal logic. But these models’ feats also raised an important question: why do neural networks need so much more data than humans do? To answer that, we’ll need to visit one more approach to thinking about the mind: probability theory.
We all use probability in our daily lives. We gauge how likely it is for it to rain tomorrow, or what the chances are of getting a job promotion. As more information reaches us, like a menacing cloud or an encouraging email, we adjust our probabilities accordingly. How do we capture this mathematically?
This problem was solved in the eighteenth century, by the Reverend Thomas Bayes and the Marquis Pierre-Simon Laplace. Both men solved the problem independently, but we now refer to this kind of reasoning as Bayesian inference. Bayesian inference involves first assigning probability values between 0 and 1 to possible outcomes. These numbers tell us about the chance of a possible world being the true one. A probability of 1 means that possible world is certain to us. Crucially, we can recalculate the chances we assign to each possible world based on new events, using a specific formula: Bayes’ rule.
And this gives us a way to better understand inductive problems. Take learning a language, or more specifically, syntax. We can say that different possible grammars are our hypotheses, and utterances from that language are our data. If a learner has a pre-existing bias toward a certain grammar, we give that grammar a higher prior probability – the probability of it being true right now, before any future evidence reaches us. In accordance with Bayes’ rule, a language with a very high prior probability will require very little evidence from future data to be true. In other words, it will be easy to learn.
This fits perfectly with Chomsky’s suggestion that children are born with innate, genetic knowledge, which allows them to learn languages from so few utterances. Innate knowledge would give children inductive biases, favoring a particular grammar over others. The biases give that language a much higher prior probability. Large language models on the other hand, have to start from scratch. For them, the prior probabilities for different grammars are much lower. And this is one way to explain why they need so much more data to learn a language.
Bayesian reasoning also gives us a clue as to how to improve LLMs in the future. To build more human-like LLMs, we will need to tailor their inductive biases to resemble those of humans – catching up on millenia of our evolution. This is an insight that only probability theory can give us. In this way, it fills a gap left by the rules and symbols and neural network approaches to the mind. Taken together, the three approaches give us an intricate view on what it means to think. It will be up to the cognitive scientists of the future to complete that picture.
In this lesson to The Laws of Thought by Tom Griffiths, you’ve travelled through the centuries on a quest to understand the Laws of Thought. You started with the foundations of formal logic systems, from Aristotle through Leibniz and Boole, followed by the rise of behaviorism. You saw how Bruner challenged behaviorism, drawing on the early computational theories of Babbage, Turing, Shannon, and von Neumann. You then accompanied the cognitive revolution, diving into rules and symbols approaches to the mind by the likes of Newell, Simon, and Chomsky, before learning of Rosch’s counter-theory of fuzzy boundaries.
This led you to the development of artificial neural networks, starting at the beginning with Rosenblatt’s perceptron. Finally, you turned to a final approach to understanding the mind: probability theory, underpinned by Bayesian inference. By combining the approaches of rules and symbols, artificial neural networks, and probability theory, a picture of the Laws of Thought starts to emerge. But the quest is far from over.

Comments

Popular posts from this blog

The Prince and the Pauper: A Tale of Two Mirrored Fates by Mark Twain

lessons from. the book 📖 Alexander Hamilton

Lessons from the Book 📖 New Great Depression