Essay

Proof Pressure Is Not Just Matrix Multiply

Dense arithmetic is the shared base. The interesting pressure appears where attention, lookup tables, range policy, rounding, and carried state have to become one checked statement.

The useful transformer-proving question is not whether the model contains arithmetic. It is where proof plumbing gets reused or repeated.

  • Omar U. Espejel: Former machine learning engineer at Hugging Face; now working on crypto and AI at Starknet Foundation.
  • transformers
  • starks
  • zkml
  • proof systems

TL;DR

  • Dense arithmetic is real work, but it is not the whole proof story.
  • Transformer proof pressure also lives in lookup tables, range checks, rounding policy, carried state, and statement boundaries.
  • The current checked sequence-axis result is the important one: lookup claims grew 3.73x and trace rows grew 4.00x, while fused proof bytes grew only 1.06x to 1.08x.
  • On the d64 four-head seq64 row, the fused proof was 276,503 bytes versus a 315,785 byte matched split frontier, saving 39,282 bytes.
  • This is not a full transformer block proof and not a comparison win against external systems. It is evidence that boundary choice matters.

Series part 2 of 4

Part 1 made decode look like a trace. Part 2 asks where the proof work actually gets heavy.

Part 1 used a simple picture:

state in, local work, state out.

That picture is useful, but it is only the start.

A trace still has to pay for the work inside each row.

The tempting shortcut is to say:

transformers are mostly matrix multiplication.

That is true in the same way that a kitchen is mostly heat and knives.

It misses the interesting part.

A transformer proof also has to account for the places where values are clipped, rounded, looked up in tables, normalized, carried forward, and attached to a statement the verifier is allowed to believe.

That is what I mean by proof pressure.

Where the pressure appears

Every serious zkML system has to deal with dense arithmetic.

There is no magic boundary that makes QKV projections, score products, value aggregation, output projection, or MLP layers disappear.

But dense arithmetic is the shared base. It is not where the architectural question ends.

The sharper question is:

what happens around the lookup-heavy and boundary-heavy work that repeats inside decode?

For attention, those surfaces include:

  • score clipping;
  • valid input ranges;
  • Softmax-like table membership;
  • denominator, quotient, and remainder checks;
  • rounding policy;
  • quantization policy;
  • carried attention state;
  • the identity of the table or policy the verifier is supposed to accept.

That list is not only a cost list.

It is also a meaning list.

If a proof uses a table, the accepted statement has to bind the table.

If a proof uses a rounding rule, the accepted statement has to bind the rounding rule.

If a proof carries state from one step to the next, the accepted statement has to bind the state before and the state after.

Otherwise the proof can be valid while the AI claim around it is too loose.

Transformer proof pressure surfaces A decode row split into shared dense arithmetic and proof-pressure surfaces that must become statement fields. One decode row has two kinds of work Dense arithmetic is the shared base. The pressure appears where lookup policy, ranges, and carried state must be bound. decode row state in -> work -> state out shared dense base QKV, MLP, dot products proof-pressure surfaces tables, ranges, rounding, state Pressure surface Statement data the verifier must bind Softmax-like lookup table exp-like values, clipped score gaps table identity + valid input domain the table is part of the claim Quantization and division outputs scale, clamp, quotient, remainder range + rounding policy same values, same interpretation Attention/KV carried state old cache, selected positions, new cache state in + state out commitments the row must connect to the next row
Proof pressure The hard parts are not just expensive operations. They are places where the verifier needs named statement fields.

A human picture: one audit packet, not thousands

Imagine a warehouse receiving a truck.

The truck has many boxes.

For every box, the warehouse has to check a few things:

  • the SKU exists in the approved table;
  • the weight is inside the allowed range;
  • the count matches the manifest;
  • the box belongs to this delivery.

Those checks are real work.

But the annoying part is often the audit packet around the checks: the manifest, the signature, the timestamp, the table version, the delivery id, and the record that says all of these line items belong to the same truck.

If every box gets its own audit packet, the paperwork grows almost one-for-one with the number of boxes.

If the truck gets one audit packet, the line-item checks still happen, but the expensive packet is shared.

That is the intuition behind the current proof-pressure result.

The lookup claims are the line-item checks.

The proof bytes include the audit packet.

The question is whether related attention work can share the packet instead of paying for a fresh one every time the arithmetic proof and lookup proof are split apart.

The checked result

The result I care about is not:

we proved a full LLM.

We did not.

It is not:

STARKs beat every zkML system.

That would be too broad.

The useful result is narrower:

on the checked attention surfaces, fused STARK-native proof boundaries kept proof bytes growing much more slowly than lookup-heavy work and trace rows.

The clean sequence-axis test moves from seq32 to seq64.

Across the four checked rows:

Growth from seq32 to seq64Result
lookup claims3.729730x
trace rows4.000000x
fused proof bytes1.064910x to 1.080697x
fused ratio versus matched split frontier0.875605x to 0.922792x

The hardest row in the public story is the d64 four-head seq64 row:

MetricValue
lookup claims8,832
trace rows16,384
source arithmetic proof bytes272,638
LogUp sidecar proof bytes43,147
matched split frontier315,785
fused proof bytes276,503
fused saving39,282
fused ratio0.875605x

That is the signal.

The checked work got much larger.

The fused proof object got only modestly larger.

And the fused proof still beat the matched split frontier.

Sequence-axis proof pressure result Lookup claims and trace rows grow about four times from seq32 to seq64 while fused proof bytes grow only about eight percent. The current result is a scaling shape From seq32 to seq64, checked work grows fast. Fused proof bytes grow slowly. lookup claims 3.73x more table work trace rows 4.00x more row work fused proof bytes 1.06-1.08x larger proof object hardest public row in the current story: d64, four heads, seq64 split frontier 315,785 fused proof 276,503 saved bytes 39,282 fused ratio 0.8756x trace rows 16,384 The proof did not get proportionally larger as the checked lookup-heavy surface grew.
Checked evidence The current signal is not a full-model benchmark. It is a measured proof-boundary effect on matched attention surfaces.

The earlier target-selection signal points in the same direction.

Moving from a small d8 attention surface to a two-head seq32 attention surface, lookup claims grew from 52 to 1,184.

That is 22.77x.

Typed attention proof bytes grew from 18,124 to 22,916.

That is 1.26x.

That older signal was not the final result. It was the smell that told us where to dig.

Lookup work grows faster than proof bytes A comparison showing lookup claims growing 22.77x while typed proof bytes grow 1.26x between the d8 and two-head seq32 attention fixtures. The pressure signal is amortization More lookup work does not have to mean proportionally more proof bytes. small attention surface d8 fused attention 52 lookup claims 18,124 typed bytes larger attention surface two-head seq32 fused attention 1,184 lookup claims 22,916 typed bytes 22.77x more lookup work 1.26x more typed proof bytes This is not a final benchmark win. It is the target-selection signal.
Amortization signal The interesting result is structural: lookup work grew much faster than typed proof bytes on this checked attention surface.

What this means

The result does not say lookup work is free.

It says the fixed proof plumbing is being amortized across related work on this surface.

In proof-system terms, the shared packet is the commitment, opening, decommitment, FRI-facing, and trace-path structure around the work.

If attention arithmetic and table membership are split into separate proof objects, some of that structure is paid twice.

If they live in one native proof boundary, the related checks can share the surrounding machinery.

That is why the boundary matters.

Not because fusion is automatically better.

Not because one giant proof is always the answer.

Because proof boundaries should be chosen around actual proof pressure.

Sometimes related work should be fused.

Sometimes it should be split and connected through a typed statement boundary.

The point is to make that decision with evidence instead of vibes.

What this does not claim

This is where the result has to stay honest.

It is not a full transformer block proof.

It is not exact real-valued Softmax.

It is not a public benchmark win against NANOZK, Jolt Atlas, DeepProve, zkLLM, EZKL, RISC Zero, SP1, or any other system.

It is not a proving-time win. The timing rows we have are engineering data, not a public performance claim.

It is not evidence that every operation should be fused into one proof.

The claim is smaller and stronger:

for the checked lookup-heavy attention surfaces, STARK-native fusion shares proof plumbing across arithmetic and table membership, so proof bytes grow much more slowly than lookup claims and trace rows on the sequence axis.

That is already interesting.

It gives us a real architectural lever.

Next

Proof pressure is only half the story.

The other half is meaning.

If a proof uses a lookup table, a range rule, a rounding policy, or carried state, the application has to bind those facts to the proof.

Otherwise a proof can verify while the product claim around it drifts.

That is Part 3.

Next

Once proof boundaries become explicit, the next question is what the proof is allowed to mean.

  1. 03

    Proof Validity Is Not Statement Validity

    The claim has to be bound to the proof.