And with version 0.49.0, we can now do padded heights of 2^{24} on our machine (when the correct environment variables are set). This will allow for bigger transactions to be mined without going through the merge path.
$ RAYON_NUM_THREADS=90 TVM_LDE_TRACE="no_cache" triton-cli --profile prove --program spin.tasm --input 24
### Triton VM – Prove 1417.02s #Reps Share Category 395.8 GiB
├─trace execution 7.50s 1 0.53% (gen – 4.34%) +3.1 GiB
├─Fiat-Shamir: claim 7.25µs 1 0.00% (hash – 0.00%) ±0 B
├─derive additional parameters 1.01ms 1 0.00% ±0 B
├─main tables 455.55s 1 32.15% +94.9 GiB
│ ├─create 95.74s 1 6.76% (gen – 55.36%) +49.9 GiB
│ ├─pad 3.92s 1 0.28% (gen – 2.27%) -800.0 KiB
│ │ ├─pad original tables 2.19s 1 0.15% -340.0 KiB
│ │ └─fill degree-lowering table 1.73s 1 0.12% -460.0 KiB
│ ├─Merkle tree 290.11s 1 20.47% +11.1 GiB
│ │ ├─leafs 288.14s 1 20.33% +7.1 GiB
│ │ │ ├─LDE 205.14s 5 14.48% (LDE – 26.88%) +420.5 MiB
│ │ │ └─hash rows 42.05s 5 2.97% (hash – 49.46%) ±0 B
│ │ └─Merkle tree 1.72s 1 0.12% (hash – 2.03%) +9.1 GiB
│ ├─Fiat-Shamir 26.40µs 1 0.00% (hash – 0.00%) ±0 B
│ └─extend 65.78s 1 4.64% (gen – 38.04%) +33.3 GiB
│ ├─initialize master table 1.08s 1 0.08% +32.1 GiB
│ ├─slice master table 4.71µs 1 0.00% ±0 B
│ ├─all tables 63.75s 1 4.50% +52.1 MiB
│ └─fill degree lowering table 950.31ms 1 0.07% -1.2 MiB
├─aux tables 258.34s 1 18.23% +9.1 GiB
│ ├─Merkle tree 258.34s 1 18.23% +9.1 GiB
│ │ ├─leafs 256.35s 1 18.09% +4.1 GiB
│ │ │ ├─LDE 121.23s 1 8.56% (LDE – 15.88%) +44.4 MiB
│ │ │ └─hash rows 35.09s 1 2.48% (hash – 41.28%) ±0 B
│ │ └─Merkle tree 1.74s 1 0.12% (hash – 2.05%) +9.1 GiB
│ └─Fiat-Shamir 104.15µs 1 0.00% (hash – 0.00%) ±0 B
├─quotient calculation (just-in-time) 544.45s 1 38.42% +15.4 GiB
│ ├─zero-initialization 2.07s 1 0.15% +83.4 GiB
│ ├─fetch trace randomizers 389.83µs 1 0.00% ±0 B
│ ├─poly interpolate 22.80s 1 1.61% (LDE – 2.99%) ±0 B
│ ├─calculate quotients 466.82s 1 32.94% +14.2 MiB
│ │ ├─poly evaluate 183.02s 8 12.92% (LDE – 23.98%) +102.9 MiB
│ │ ├─trace randomizers 209.07s 8 14.75% (LDE – 27.39%) -2.1 MiB
│ │ └─AIR evaluation 74.74s 8 5.27% (AIR –100.00%) -98.1 MiB
│ │ ├─zerofier inverse 13.80s 8 0.97% +432.7 MiB
│ │ └─evaluate AIR, compute quotient codeword 60.35s 8 4.26% +363.6 MiB
│ ├─segmentify 26.98s 1 1.90% +12.4 GiB
│ └─restore original trace 21.95s 1 1.55% (LDE – 2.88%) ±0 B
├─hash rows of quotient segments 2.67s 1 0.19% (hash – 3.14%) +4.1 GiB
├─Merkle tree 1.74s 1 0.12% (hash – 2.04%) +9.1 GiB
├─out-of-domain rows 43.33s 1 3.06% +111.1 MiB
├─Fiat-Shamir 68.37µs 1 0.00% (hash – 0.00%) ±0 B
├─linear combination 38.32s 1 2.70% +4.2 GiB
│ ├─main 3.34s 1 0.24% (CC – 12.38%) +303.1 MiB
│ ├─aux 3.22s 1 0.23% (CC – 11.95%) +363.9 MiB
│ └─quotient 17.62s 1 1.24% (CC – 65.34%) +768.3 MiB
├─DEEP 11.08s 1 0.78% +8.1 GiB
│ ├─main&aux curr row 3.61s 1 0.26% +2.1 GiB
│ ├─main&aux next row 3.66s 1 0.26% +2.1 GiB
│ └─segmented quotient 3.81s 1 0.27% +3.1 GiB
├─combined DEEP polynomial 2.78s 1 0.20% -5.1 GiB
│ └─sum 2.78s 1 0.20% (CC – 10.32%) -5.1 GiB
├─FRI 11.38s 1 0.80% -4.8 MiB
└─open trace leafs 32.88s 1 2.32% +5.9 GiB
└─recompute rows 32.86s 2 2.32% +3.8 GiB
### Categories
LDE 763.21s 53.86%
gen 172.95s 12.20%
hash 85.01s 6.00%
AIR 74.74s 5.27%
CC 26.96s 1.90%
Clock frequency is 3974 Hz (5632027 clock cycles / (1417016 ms / 1 iterations))
Optimal clock frequency is 11839 Hz (16777216 padded height / (1417016 ms / 1 iterations))
FRI domain length is 2^27
As with all other benchmarks, this was performed on a Threadripper 7995wx with 768GB RAM.