Economics

The NVIDIA tax is a routing problem.

Enterprise AI today runs every query through a GPU, whether the query needs one or not. The result is a structural overpayment that shows up on every compute bill in the industry — a tax levied not by any government but by the absence of an alternative.

2Brains is that alternative. Most enterprise queries are fact retrieval, not language generation. Fact retrieval runs on a CPU. We route accordingly.

Physics

Power per query.

GPU inference
300W+
Per query, sustained, on a single accelerator.
2Brains retrieval
~6W
Per query, on commodity CPU, for the verified-corpus path.

A fifty-fold reduction in power per query is not an optimization. It is the difference between architectures. The 300W figure assumes a query that genuinely requires generative inference. The 6W figure assumes a query that the system has correctly identified as fact retrieval — the majority of enterprise traffic.

Economics

What that means at scale.

Modeled annual compute cost for an enterprise running one billion queries per year, assuming 68% of queries are routable to deterministic retrieval.

GPU-only stack 2Brains routed stack
Cost comparison: GPU-only ranges 10M to 50M USD; 2Brains routed ranges 4M to 18M USD across 200M to 1B annual queries.

Illustrative model. Actual savings depend on query mix, corpus design, and existing infrastructure.

Consequences
CPU is no longer a fallback. It is the path.

For the majority of enterprise queries, generative inference is overkill. Deterministic retrieval gives the correct answer at a fraction of the wattage, on hardware buyers already own.

GPU allocation stops being the bottleneck.

If 68% of traffic does not need a GPU, neither does 68% of an enterprise's procurement runway. AMD, ARM, and sovereign silicon become viable hosts for the work that remains.

Power becomes a design choice again.

Data center planning today assumes generative inference will eat the grid. When retrieval handles two-thirds of traffic at 6W, that assumption stops holding.

Modeling at your scale

We can run the cost model against your actual query mix under NDA.