Toward a Theory of Software Resistance

Three posts in (one, two, three): we have a directional resistance field, retrodictions that check out, and empirical tools to measure the grain. But I’ve been waving at the math without writing it down. Time to be precise.

base units

What’s the atomic unit of working with code? Not lines, not files, not commits. Those are containers. The primitive is the token — the smallest chunk of meaning an agent can process.

Call it $\tau$ . A token. For a human reading code, it’s roughly a symbol, a name, a keyword — whatever your eyes land on as a unit. For an LLM, it’s literally a token. For a compiler, it’s a lexeme. The size varies, but the concept is the same: the indivisible unit of comprehension.

Two more primitives fall out naturally.

$\iota$ — an inference step. One reasoning move. “This function calls that one.” “This variable is unused.” “This pattern matches the one I saw in the other module.” Each $\iota$ connects tokens into understanding.

$\omega$ — context window. How many tokens and inferences you can hold simultaneously. For a human, it’s working memory — maybe 5-9 chunks, depending on expertise, caffeine, and whatever learned compression scheme lets us retokenize “observer pattern with lifecycle hooks” into a single slot. For an LLM, it’s the literal context window. For a team, it’s the shared understanding in the room.

These three — $\tau$ , $\iota$ , $\omega$ — work for any agent. Human, AI, pair programming, a whole team. The theory doesn’t care who’s doing the work. It cares about the work itself.

comprehension cost

Before you can change code, you have to understand it. Not all of it — just enough to know what your change will do.

Comprehension cost $C$ is the number of tokens needed to describe the relevant state. A function with three local variables and no side effects: low $C$ . A method that reads from a global cache, mutates shared state, and triggers callbacks: high $C$ . You need more tokens to hold it in your head.

$C$ isn’t a property of the code alone. It’s a property of the code relative to a change. Adding a field to a simple data class: you need to understand the class and its callers. Changing the auth flow: you need to understand the auth flow, everything that depends on it, and the security invariants that can’t break. Same codebase, different $C$ .

Experienced developers know this intuitively. “Let me read around a bit before I touch this.” They’re loading context — paying $C$ before they start.

the work equation

Now we can be precise about what post 1 was gesturing at.

$W = C \cdot X(\theta)$

Work equals comprehension cost times directional resistance.

$C$ is how much you need to understand. $X(\theta)$ is how hard the system pushes back against direction $\theta$ . Their product is the work.

This explains a pattern every developer has felt. Sometimes you stare at a codebase for days, finally understand it, and the change is trivial — high $C$ , low $X(\theta)$ . Other times you know exactly what to change but the change cascades through everything — low $C$ , high $X(\theta)$ . The worst case is both: you can’t understand it and it fights you.

It also explains why seniority matters in a way that’s hard to pin down. Senior developers don’t just write better code. They have lower $C$ for a given system because expertise compresses tokens — patterns that take a junior 50 tokens to process, a senior recognizes in 5. Same $X(\theta)$ , lower $C$ , less work.

what your architecture is optimizing for

You don’t make one change to a codebase. You make many, over time, in different directions. Your architecture is a bet on which directions will be cheap.

Make the bet explicit. Let $P(\theta)$ be your probability distribution over future intended directions. Some directions are likely — new features along the product roadmap, bug fixes in areas that tend to break. Others are unlikely — rewriting in another language, swapping the database.

The expected work over all future changes:

$E[W] = C \cdot \int P(\theta) \cdot X(\theta) \, d\theta$

A good architecture minimizes $E[W]$ . It makes $X(\theta)$ low where $P(\theta)$ is high — low resistance along the directions you’ll actually travel.

This is what architectural decisions are, whether anyone frames it this way or not. “We’ll use a plugin system” says: we expect $P(\theta)$ to be high for adding new plugins and low for changing the plugin interface. “We’ll use a monolith” says: we expect changes to cross domain boundaries often enough that the cost of boundaries exceeds the cost of a large $C$ .

Every architectural debate is a disagreement about $P(\theta)$ .

Microservices vs. monolith? “Will cross-cutting changes be common?” That’s a question about $P(\theta)$ .

DRY vs. duplication? “Will these two things evolve in the same direction?” Also $P(\theta)$ .

Build vs. buy? “Will we need to push this component in directions the vendor didn’t anticipate?” Still $P(\theta)$ .

The architecture that’s “obviously right” just means everyone agrees on $P(\theta)$ . The architecture that sparks endless debate means people have different beliefs about where the system is heading. The debate isn’t about taste. It’s about prediction.

the conservation law

Post 1 mentioned a tradeoff between comprehensibility and fluidity:

$C \times \varphi \approx \text{constant}$

I want to push on this more because I think it’s deeper than a rule of thumb.

Comprehensibility comes from structure — conventions, patterns, shared abstractions. When everything follows the same pattern, you can predict what you’ll find without looking. Low $C$ .

But that structure is grooves. And grooves are exactly what raises $X(\theta)$ in off-groove directions, which is what lowers $\varphi$ .

So there’s a genuine tension. More structure means less comprehension cost but less fluidity. Less structure means more fluidity but higher comprehension cost. You can’t have both arbitrarily high.

Think of it as a budget. Every codebase has a complexity budget, and you’re choosing how to spend it. Spend it on structure (conventions, frameworks, strong typing) and you buy comprehensibility at the cost of fluidity. Spend it on flexibility (loose coupling, indirection, dependency injection) and you buy fluidity at the cost of comprehensibility.

Is the constant actually constant? I suspect not — it varies with $\omega$ , the context window. A larger context window (more experienced developer, better tooling, an LLM with 200k tokens of context) shifts the curve. You can hold more in your head, so the same structure costs less comprehension overhead, and the tradeoff eases.

This might be the most important thing AI coding tools do. Not write code faster. Expand $\omega$ . A larger context window relaxes the conservation constraint, letting you have more of both $C$ and $\varphi$ simultaneously. If that’s right, the real value of coding assistants isn’t speed — it’s enabling architectures that were previously too complex to navigate.

what this buys you

With $P(\theta)$ and $E[W]$ , you can do things the informal version couldn’t:

Evaluate an architecture quantitatively. Given a model of $P(\theta)$ (from product roadmap, from historical change directions, from stakeholder interviews), compute $E[W]$ for your current architecture. Compare it to alternatives. The one with the lowest expected work wins — not because it’s “cleaner” or “more elegant” but because it’s cheaper to steer.

Diagnose mismatches. If your grooves point south and your $P(\theta)$ has shifted east, you have a measurable problem. The integral is large because $X(\theta)$ is high exactly where $P(\theta)$ is high. That’s not “technical debt” in the vague sense. It’s a specific, quantifiable mismatch between your architecture’s bets and your actual direction.

Know when to refactor. Refactoring reshapes $X(\theta)$ . Worth it when the reduction in $E[W]$ exceeds the cost of the refactoring itself. Not worth it when your grooves still point where you’re going, even if they’re “not clean” by someone’s aesthetic standard.

Compare agents meaningfully. Two developers, two LLMs, two approaches. Which one leaves the codebase with lower $E[W]$ for the next quarter’s likely changes? That’s a better question than “which one writes fewer lines” or “which one passes more tests.”

The pieces are on the table now: base units that work for any agent, a work equation, an optimization target, and a conservation constraint. Whether these hold up as more than analogies depends on measurement — and post 3 has the starting tools.

What I don’t have yet is a clean answer to where $P(\theta)$ comes from. You can estimate it from history, from roadmaps, from the team’s stated intentions. But stated intentions and revealed intentions diverge all the time. The architecture your team actually builds often reveals a different $P(\theta)$ than the one they’d describe in a planning meeting.

Maybe the most honest use of this framework is as a mirror. Compute the $P(\theta)$ your architecture is implicitly optimized for. Then ask whether that matches where you actually want to go.