Instead of:
"Great question! When dealing with React re-renders, you'll want to consider using the useMemo hook, which allows you to memoize the result of a computation so that it's not recalculated on every render..."
You get:
"New object ref each render. Wrap in useMemo."
No articles. No pleasantries. 688 people thought this was worth upvoting.
The real cost behind the humor
Caveman exists because tokens cost money. Not abstractly — concretely, operationally, in a way that changes developer behavior.
The benchmarks in the repo are striking: React explanation goes from 1,180 to 159 tokens. PostgreSQL setup: 2,347 to 380. Average savings: 65%. For a developer making hundreds of API calls per day, this isn't optimization — it's survival math.
But here's what the Caveman readme doesn't say: why is anyone building token compression tools at all?
Because the pricing model that governs AI access was designed for a world that no longer exists.
How we got here
The mental model behind $20/month AI subscriptions comes from streaming. Netflix charges one price regardless of how many hours you watch. This works because human attention is bounded — nobody watches Netflix for 22 hours a day. The math holds.
AI subscriptions inherited this logic. Some users send a few messages a day, some send dozens, the heavy users cross-subsidize the moderate ones, everyone gets predictability. The abstraction holds — until agents arrive.
An agent doesn't have attention. It doesn't pause to think. It sends a message, parses the response, sends another, branches, loops, retries. A background agent working on a codebase overnight makes 500–2,000 API calls. A customer support agent runs continuously, with zero idle time.
A human power user might send 50 messages on a busy day. An agent sends 50 messages before you finish your morning coffee.
The flat subscription model doesn't fail for agents because providers are being restrictive. It fails because the math never worked. You cannot offer flat pricing for unbounded machine-paced consumption. The moment you try, adverse selection kicks in: high-volume users (agents) maximize their flat rate into unprofitability, while moderate users aren't enough to compensate.
The cascade
When Anthropic blocked third-party agentic tools from Claude subscriptions last month, the developer community erupted. OpenClaw users lost access overnight. Threads hit the front page.
Most of the anger targeted Anthropic's timing. But their technical explanation was honest: "Our subscriptions weren't built for the usage patterns of these third-party tools."
That's not spin. Anthropic's Boris Cherny spelled out the actual problem: their own Claude Code tool is built to maximize "prompt cache hit rates" — reusing previously processed context to save compute. Third-party tools aren't optimized this way. The math only works if caching works. Third-party tools break the math.
So now you have two communities responding to the same underlying problem through different lenses:
Caveman: make the AI say less so it costs less.
Claude Code: cache aggressively so compute gets reused.
Both are correct. Both are workarounds for a pricing model that wasn't built for agents.
What the right model looks like
The honest answer is that agentic workloads need usage-based billing. Pay per token, with caching as a first-class optimization lever.
This sounds worse for developers. It's actually better, for three reasons:
1. Transparent costs
You know exactly what an agent run costs. You can set spending limits, alert thresholds, kill switches. With flat subscriptions, cost is opaque until you get cut off.
2. Aligned incentives
When you pay per token, you're motivated to minimize waste. Caveman's value proposition is identical to cache optimization — both exist because the cost signal is real. Real cost signals create better software.
3. Predictable unit economics
If you're building a product on top of an AI API, your costs should scale with your usage, not with your provider's estimate of average human behavior. Subscription pricing makes agent cost modeling impossible. API pricing makes it straightforward.
The developers who will win in the machine-paced era are those who internalize this now. Not as a constraint — as a design principle. Every agent you build should have a token budget. Every workflow should have a cache strategy. Every API call should have a cost attribution.
The structural truth
Caveman is funny. 688 people upvoted it. But the reason it exists is that the developer community is paying real money for tokens and knows it.
The streaming subscription model was built for human-paced consumption. We're in the machine-paced era now. The pricing models, the billing abstractions, the infrastructure assumptions — they all need to be rebuilt around a simple truth:
Agents don't pause. Pricing models that assume they do are priced for the world we left behind.
The flat subscription is ending for agents. Not because providers are hostile. Because math is.
Building on AI APIs?
We work with companies navigating the agent infrastructure layer — pricing, identity, and cost attribution for agentic workloads.
Get in touch →