A Recursive Definition of Language
I wrote this in March 2007 on clawbox.com, trying to pin down what separates language from communication. The core claim — that language is any protocol capable of self-description — has aged in ways I didn’t expect.
Large language models now manipulate language with striking fluency. They can discuss grammar, invent definitions, explain their own outputs. By the criterion I proposed here, they’re operating squarely inside language rather than merely following protocol. Whether that constitutes understanding is a different question, but the self-descriptive property is clearly present. Nineteen years later I still think the recursive definition is the right one. I just didn’t anticipate that something non-biological would get there first.
What separates language from communication? There’s no shortage of communication in the tree of life — pheromone markers, alarm calls, complex hunter-squad coordination. But even the most sophisticated animal signalling can’t transmit a novel concept, and it can’t preserve concepts across generations. Why not?
The answer is probably multi-faceted. Humans are adept at considering the future, and we have hands that can finely manipulate the world. Any ape with both qualities would be formidable. But after just a few generations, the human wins out — and apparently did. The difference is our ability to prepare complex ideas and pass them down. Preserving the ideas of our mothers and fathers is the essence of technology, and technology is what made us top ape. Everything else is cleverness, and cleverness dies with the clever.
So language is important. What actually is it?
Everyone learns at least one, so we should all know. I learned in university that language is a grammar — a system that builds complex ideas from simple ones. Simple ideas fall into categories like “verb” and “noun,” and the grammar restricts where they can appear. These restrictions form the fundamental protocol that makes language possible.
But is the protocol really the definition? Return to pheromone markers. There’s a protocol: the right chemical, in the right place, at the right concentration. We’d all agree that’s messaging, not language. Maybe there’s a threshold of complexity that must be crossed before “protocol” morphs into “language”?
the colony
Imagine a massive colony of pheromone creatures. The individual creatures are simple, but the colony is vast — tendrils reaching into the environment, harvesting every good food source without over-consuming any of them. Some resources are harvested only to enhance production. Some creatures operate as teamsters on service-sector trails. Others harvest leaves to synthesize the pheromones themselves — and there are pheromone trails encoding how to make those pheromones. The whole thing is greater than the sum of its parts.
Your brain is something like this colony. We’re colonies of cells, each following marker chemicals and well-defined pathways. Animal life is profoundly complex and nuanced, all built from protocol — protocol far more complex than any human language.
And yet this isn’t language. Language is not strictly protocol.
If a bull smashes the colony, or if a brain suffers a stroke, it’s broken. All that baked-in wisdom is lost with the structural damage. Every individual creature or cell might survive, but the colony or mind still dies. The colony was just clever, and cleverness alone doesn’t persist.
the definition
A language is any protocol that can self-describe.
Language can be used to recursively define language. Once that’s possible — once the protocol can teach itself — it becomes possible to teach children how to understand arbitrarily complex ideas. The protocols themselves become communicable. Language itself becomes a technology.
Mere protocol, no matter how complex, can’t do this. The pheromone colony can’t encode “here is how pheromone signalling works” in pheromone signals. Your immune system can’t explain immunology to another immune system. But a human language can explain its own grammar, coin new terms for new concepts, and transmit the instructions for doing so.
Life has this property too — life can self-propagate. The parallel matters because evolution only acts on self-replicators. Language, in this sense, is a replicator. It propagates, mutates, competes. The most adept forms survive. As we define more robust and more semantic languages, we become finer semioticians by necessity, capable of comprehending signs and meanings that would have confounded our distant ancestors.
There’s a less comfortable lesson from evolution here. There’s no particular direction to this process. It simply proceeds, and the fittest survive. That doesn’t mean the individual language necessarily improves by our standards. And it’s difficult to steer. We slowed it down by writing things down, preserving ideas in their original form for much longer. But the process still runs.
The colony was just clever. A mind can make others like itself.