TL;DR
- Dense arithmetic is real work, but it is not the whole proof story.
- Transformer proof pressure also lives in lookup tables, range checks, rounding policy, carried state, and statement boundaries.
- The current checked sequence-axis result is the important one: lookup claims grew 3.73x and trace rows grew 4.00x, while fused proof bytes grew only 1.06x to 1.08x.
- On the d64 four-head seq64 row, the fused proof was 276,503 bytes versus a 315,785 byte matched split frontier, saving 39,282 bytes.
- This is not a full transformer block proof and not a comparison win against external systems. It is evidence that boundary choice matters.
Series part 2 of 4
Part 1 used a simple picture:
state in, local work, state out.
That picture is useful, but it is only the start.
A trace still has to pay for the work inside each row.
The tempting shortcut is to say:
transformers are mostly matrix multiplication.
That is true in the same way that a kitchen is mostly heat and knives.
It misses the interesting part.
A transformer proof also has to account for the places where values are clipped, rounded, looked up in tables, normalized, carried forward, and attached to a statement the verifier is allowed to believe.
That is what I mean by proof pressure.
Where the pressure appears
Every serious zkML system has to deal with dense arithmetic.
There is no magic boundary that makes QKV projections, score products, value aggregation, output projection, or MLP layers disappear.
But dense arithmetic is the shared base. It is not where the architectural question ends.
The sharper question is:
what happens around the lookup-heavy and boundary-heavy work that repeats inside decode?
For attention, those surfaces include:
- score clipping;
- valid input ranges;
- Softmax-like table membership;
- denominator, quotient, and remainder checks;
- rounding policy;
- quantization policy;
- carried attention state;
- the identity of the table or policy the verifier is supposed to accept.
That list is not only a cost list.
It is also a meaning list.
If a proof uses a table, the accepted statement has to bind the table.
If a proof uses a rounding rule, the accepted statement has to bind the rounding rule.
If a proof carries state from one step to the next, the accepted statement has to bind the state before and the state after.
Otherwise the proof can be valid while the AI claim around it is too loose.
A human picture: one audit packet, not thousands
Imagine a warehouse receiving a truck.
The truck has many boxes.
For every box, the warehouse has to check a few things:
- the SKU exists in the approved table;
- the weight is inside the allowed range;
- the count matches the manifest;
- the box belongs to this delivery.
Those checks are real work.
But the annoying part is often the audit packet around the checks: the manifest, the signature, the timestamp, the table version, the delivery id, and the record that says all of these line items belong to the same truck.
If every box gets its own audit packet, the paperwork grows almost one-for-one with the number of boxes.
If the truck gets one audit packet, the line-item checks still happen, but the expensive packet is shared.
That is the intuition behind the current proof-pressure result.
The lookup claims are the line-item checks.
The proof bytes include the audit packet.
The question is whether related attention work can share the packet instead of paying for a fresh one every time the arithmetic proof and lookup proof are split apart.
The checked result
The result I care about is not:
we proved a full LLM.
We did not.
It is not:
STARKs beat every zkML system.
That would be too broad.
The useful result is narrower:
on the checked attention surfaces, fused STARK-native proof boundaries kept proof bytes growing much more slowly than lookup-heavy work and trace rows.
The clean sequence-axis test moves from seq32 to seq64.
Across the four checked rows:
Growth from seq32 to seq64 | Result |
|---|---|
| lookup claims | 3.729730x |
| trace rows | 4.000000x |
| fused proof bytes | 1.064910x to 1.080697x |
| fused ratio versus matched split frontier | 0.875605x to 0.922792x |
The hardest row in the public story is the d64 four-head seq64 row:
| Metric | Value |
|---|---|
| lookup claims | 8,832 |
| trace rows | 16,384 |
| source arithmetic proof bytes | 272,638 |
| LogUp sidecar proof bytes | 43,147 |
| matched split frontier | 315,785 |
| fused proof bytes | 276,503 |
| fused saving | 39,282 |
| fused ratio | 0.875605x |
That is the signal.
The checked work got much larger.
The fused proof object got only modestly larger.
And the fused proof still beat the matched split frontier.
The earlier target-selection signal points in the same direction.
Moving from a small d8 attention surface to a two-head seq32 attention surface, lookup claims grew from 52 to 1,184.
That is 22.77x.
Typed attention proof bytes grew from 18,124 to 22,916.
That is 1.26x.
That older signal was not the final result. It was the smell that told us where to dig.
What this means
The result does not say lookup work is free.
It says the fixed proof plumbing is being amortized across related work on this surface.
In proof-system terms, the shared packet is the commitment, opening, decommitment, FRI-facing, and trace-path structure around the work.
If attention arithmetic and table membership are split into separate proof objects, some of that structure is paid twice.
If they live in one native proof boundary, the related checks can share the surrounding machinery.
That is why the boundary matters.
Not because fusion is automatically better.
Not because one giant proof is always the answer.
Because proof boundaries should be chosen around actual proof pressure.
Sometimes related work should be fused.
Sometimes it should be split and connected through a typed statement boundary.
The point is to make that decision with evidence instead of vibes.
What this does not claim
This is where the result has to stay honest.
It is not a full transformer block proof.
It is not exact real-valued Softmax.
It is not a public benchmark win against NANOZK, Jolt Atlas, DeepProve, zkLLM, EZKL, RISC Zero, SP1, or any other system.
It is not a proving-time win. The timing rows we have are engineering data, not a public performance claim.
It is not evidence that every operation should be fused into one proof.
The claim is smaller and stronger:
for the checked lookup-heavy attention surfaces, STARK-native fusion shares proof plumbing across arithmetic and table membership, so proof bytes grow much more slowly than lookup claims and trace rows on the sequence axis.
That is already interesting.
It gives us a real architectural lever.
Next
Proof pressure is only half the story.
The other half is meaning.
If a proof uses a lookup table, a range rule, a rounding policy, or carried state, the application has to bind those facts to the proof.
Otherwise a proof can verify while the product claim around it drifts.
That is Part 3.
Next
Once proof boundaries become explicit, the next question is what the proof is allowed to mean.
- 03
Proof Validity Is Not Statement Validity
The claim has to be bound to the proof.