Cognition

Brain vs. AI. Side by side.

AGI Philosophy

We are not arguing that AI should become a brain. We are observing how each one processes information, and letting the structural differences speak for themselves.

Information flow

Brain

Information flows in every direction simultaneously. Sensory input travels up from the thalamus to the cortex, but the cortex sends ten times more connections back down than the thalamus sends up. Higher regions constantly modulate how lower regions process raw input. Seeing is not passive — your prefrontal cortex is actively shaping what your visual cortex perceives.

Transformer

Information flows forward. Layer 1 to Layer 2 to Layer 96 to output. Residual connections let later layers access earlier representations, but there is no true feedback — no mechanism where layer 96 changes how layer 1 processes the input within the same pass. The model processes once and commits.

Memory

Brain

Multiple memory systems running in parallel. Working memory in the prefrontal cortex holds what you are thinking about right now. The hippocampus consolidates experiences into long-term episodic memory while you sleep. Procedural memory in the cerebellum and basal ganglia stores skills you have practiced until they are automatic. Semantic memory across the cortex holds facts and concepts. Each system has a different time scale, a different storage mechanism, and a different retrieval pathway.

Transformer

One memory system: the context window. Everything the model knows about the current conversation is in the token buffer. When the buffer fills, the oldest tokens are dropped. When the session ends, everything is gone. There is no consolidation, no long-term storage, no procedural learning, no episodic recall. The model starts every session with its training weights and nothing else.

Gating

Brain

The thalamus acts as a gate between sensory input and the cortex. It does not pass everything through — it filters based on what the cortex currently needs. Attention is not just "focus on this" — it is "actively suppress everything else." The basal ganglia gate action selection: dozens of possible motor plans compete, and only one gets through. Gating is how the brain avoids being overwhelmed by its own capacity.

Transformer

Attention heads compute relevance scores across all tokens and weight them. This is a form of soft gating — more relevant tokens get higher weight. But nothing is truly suppressed. Every token in the context window participates in every attention computation. There is no hard gate that says "this information is irrelevant, do not process it at all." The model pays some attention to everything, always.

Prediction

Brain

The cerebellum maintains a forward model of the body and the world. It constantly predicts what will happen next — where your hand will be in 200ms, what sound will follow this syllable, what the next word in a sentence will be. When prediction matches reality, nothing happens. When prediction fails, the error signal updates the model. You do not experience most of your sensory input — you experience the prediction, corrected by error. Consciousness is largely a prediction engine that only alerts you when it is wrong.

Transformer

The model predicts the next token. That is its entire training objective. In this narrow sense, it is a prediction engine. But it has no forward model of the world. It does not predict what will happen when you run the code it wrote. It does not predict that its advice will be contradicted by reality. It predicts tokens, not consequences. The prediction is linguistic, not physical or causal.

Sleep

Brain

During sleep, the hippocampus replays the day's experiences to the cortex at high speed. The cortex integrates these replays into long-term memory, strengthening useful connections and pruning useless ones. This is not rest — it is offline processing. The brain is consolidating, reorganizing, and compressing. Dreams may be a side effect of this replay process. Without sleep, memory degrades, emotional regulation breaks down, and pattern recognition suffers. The brain cannot function without periodic offline consolidation.

Transformer

There is no offline phase. The model does not consolidate, replay, prune, or reorganize between sessions. Training is the closest analogue to sleep, but training happens once (or periodically during fine-tuning), not after every session. Between sessions, the model is inert. It does not process what happened. It does not strengthen useful patterns from the last conversation. It does not forget irrelevant details to make room for important ones.

Modularity

Brain

The brain is radically modular. Broca's area handles speech production. The fusiform face area recognizes faces. The amygdala processes threat. The hippocampus handles spatial navigation and memory consolidation. These modules developed at different times in evolutionary history, have different architectures, and communicate through white matter tracts. Damage to one module impairs its specific function without destroying the whole system. The modularity is not designed — it evolved because specialization is efficient.

Transformer

A transformer is one homogeneous architecture repeated N times. Every layer has the same structure: self-attention followed by a feedforward network. Specialization emerges during training — certain attention heads learn to track syntax, others track semantics — but the specialization is soft and distributed. There are no hard module boundaries. You cannot remove "the part that does math" without degrading everything. The architecture is uniform; the specialization is a learned accident, not a structural commitment.

What this comparison is not

This is not an argument that AI should replicate the brain. The brain evolved under constraints (skull volume, caloric budget, developmental timing) that do not apply to software. Many of the brain's architectural choices are solutions to problems AI does not have.

But the differences are worth noticing. Feedback loops, multiple memory systems, hard gating, forward models, offline consolidation, structural modularity — these are not incidental features of biological cognition. They are the mechanisms that produce the capabilities we associate with intelligence: learning from experience, adapting to surprise, focusing under pressure, improving over time.

When an AI system lacks one of those capabilities, it is worth asking: does it also lack the corresponding mechanism? And if so, can the mechanism be built around the model rather than trained into it?

That question is what the rest of our philosophy is about.