Transaction throughput on the Neptune Cash blockchain is restricted by computational power. The more proving power that comes online, the more transactions each block can include.
For this reason, several power users run neptune-core
on powerful machines as this gives them income in the form of composer fees and transaction fees. We own a couple of machines with powerful processors and we just ran multiple benchmarks on Triton VM version 0.48.0, the version used by neptune-core
.
For the benefit of our power users, we present these numbers below.
Triton VM performance
Threadripper 7995wx, RAM-hungry path
Log_2(padded height) |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
Proving time (seconds) |
2.65 |
5.36 |
11.06 |
22.80 |
46.52 |
99.17 |
203.43 |
417.75 |
Max RAM consumption, approximated (GiB) |
4.9 |
8.6 |
16.1 |
29.7 |
58.4 |
107.8 |
199.9 |
399.5 |
If you want to participate in competitive composition and transaction upgrading, you should probably be able to hit these numbers, or be close to them.
From these numbers, we can estimate the time it takes to create a block proposal as:
- one proof of padded height 2^{21} (raise from
ProofCollection
to SingleProof
)
- one 2^{20} proof (a merge of two
SingleProof
)
- one 2^{19} proof (
BlockProgram
).
Summing the time for each proof, gives a total of 168.47 seconds, a little less than three minutes. This assumes your node’s mempool is non-empty – that there is a synced SingleProof
transaction ready to be merged into your coinbase transaction.
To benchmark proving speeds on your setup, you can follow these instructions that @jfs just added.
Happy to report that the newly released Triton VM v0.49.0 reduces proving time by 5-10 %. We can now build a proof with a padded height of 2^{21} in 93 seconds.
Profile:
$ triton-cli --profile prove --program spin.tasm --input 21
### Triton VM – Prove 93.30s #Reps Share Category 108.2 GiB
├─trace execution 900.76ms 1 0.97% (gen – 5.49%) +511.8 MiB
├─Fiat-Shamir: claim 5.66µs 1 0.00% (hash – 0.00%) ±0 B
├─derive additional parameters 132.85µs 1 0.00% ±0 B
├─main tables 45.09s 1 48.33% +68.4 GiB
│ ├─create 7.21s 1 7.73% (gen – 43.91%) +6.3 GiB
│ ├─pad 402.28ms 1 0.43% (gen – 2.45%) +5.9 MiB
│ │ ├─pad original tables 264.67ms 1 0.28% +5.9 MiB
│ │ └─fill degree-lowering table 137.46ms 1 0.15% ±0 B
│ ├─LDE 23.18s 1 24.84% (LDE – 55.00%) +62.2 GiB
│ │ ├─polynomial zero-initialization 2.23µs 1 0.00% ±0 B
│ │ ├─interpolation 1.90s 1 2.04% +14.5 GiB
│ │ ├─resize 1.25s 1 1.34% +47.3 GiB
│ │ ├─evaluation 20.03s 1 21.47% +402.5 MiB
│ │ └─memoize 931.00ns 1 0.00% ±0 B
│ ├─Merkle tree 6.06s 1 6.50% +1.7 GiB
│ │ ├─leafs 5.82s 1 6.24% +496.5 MiB
│ │ │ └─hash rows 5.82s 1 6.24% (hash – 51.26%) +496.5 MiB
│ │ └─Merkle tree 208.04ms 1 0.22% (hash – 1.83%) +1.2 GiB
│ ├─Fiat-Shamir 24.67µs 1 0.00% (hash – 0.00%) ±0 B
│ └─extend 7.91s 1 8.48% (gen – 48.16%) +4.1 GiB
│ ├─initialize master table 148.73ms 1 0.16% +4.1 GiB
│ ├─slice master table 4.91µs 1 0.00% ±0 B
│ ├─all tables 7.68s 1 8.24% +1.5 MiB
│ └─fill degree lowering table 75.26ms 1 0.08% ±0 B
├─aux tables 19.26s 1 20.64% +34.3 GiB
│ ├─LDE 14.20s 1 15.22% (LDE – 33.69%) +37.2 GiB
│ │ ├─polynomial zero-initialization 1.66µs 1 0.00% ±0 B
│ │ ├─interpolation 1.83s 1 1.96% +4.2 GiB
│ │ ├─resize 866.18ms 1 0.93% +32.1 GiB
│ │ ├─evaluation 11.50s 1 12.33% +125.8 MiB
│ │ └─memoize 991.00ns 1 0.00% ±0 B
│ ├─Merkle tree 4.85s 1 5.20% +1.2 GiB
│ │ ├─leafs 4.62s 1 4.96% +547.5 MiB
│ │ │ └─hash rows 4.62s 1 4.96% (hash – 40.71%) +547.5 MiB
│ │ └─Merkle tree 197.17ms 1 0.21% (hash – 1.74%) +1.2 GiB
│ └─Fiat-Shamir 124.27µs 1 0.00% (hash – 0.00%) ±0 B
├─quotient calculation (cached) 6.00s 1 6.43% (CC – 68.85%) +379.7 MiB
│ ├─zerofier inverse 1.66s 1 1.78% +510.1 MiB
│ └─evaluate AIR, compute quotient codeword 4.30s 1 4.61% +378.0 MiB
├─quotient LDE 4.77s 1 5.11% (LDE – 11.32%) +1.4 GiB
├─hash rows of quotient segments 292.75ms 1 0.31% (hash – 2.58%) +633.0 MiB
├─Merkle tree 212.98ms 1 0.23% (hash – 1.88%) +1.3 GiB
├─out-of-domain rows 4.82s 1 5.17% +151.1 MiB
├─Fiat-Shamir 66.43µs 1 0.00% (hash – 0.00%) ±0 B
├─linear combination 3.76s 1 4.03% +603.4 MiB
│ ├─main 379.16ms 1 0.41% (CC – 4.35%) +18.6 MiB
│ ├─aux 332.63ms 1 0.36% (CC – 3.82%) +8.5 MiB
│ └─quotient 1.62s 1 1.74% (CC – 18.58%) +239.1 MiB
├─DEEP 1.13s 1 1.21% +1.4 GiB
│ ├─main&aux curr row 365.74ms 1 0.39% +294.8 MiB
│ ├─main&aux next row 379.78ms 1 0.41% +442.2 MiB
│ └─segmented quotient 386.25ms 1 0.41% +330.4 MiB
├─combined DEEP polynomial 382.87ms 1 0.41% -721.6 MiB
│ └─sum 382.82ms 1 0.41% (CC – 4.39%) -721.6 MiB
├─FRI 1.65s 1 1.77% +32.1 MiB
└─open trace leafs 846.33µs 1 0.00% ±0 B
### Categories
LDE 42.15s 45.18%
gen 16.42s 17.60%
hash 11.36s 12.17%
CC 8.71s 9.34%
Clock frequency is 7545 Hz (704027 clock cycles / (93301 ms / 1 iterations))
Optimal clock frequency is 22477 Hz (2097152 padded height / (93301 ms / 1 iterations))
FRI domain length is 2^24
And with version 0.49.0, we can now do padded heights of 2^{24} on our machine (when the correct environment variables are set). This will allow for bigger transactions to be mined without going through the merge path.
$ RAYON_NUM_THREADS=90 TVM_LDE_TRACE="no_cache" triton-cli --profile prove --program spin.tasm --input 24
### Triton VM – Prove 1417.02s #Reps Share Category 395.8 GiB
├─trace execution 7.50s 1 0.53% (gen – 4.34%) +3.1 GiB
├─Fiat-Shamir: claim 7.25µs 1 0.00% (hash – 0.00%) ±0 B
├─derive additional parameters 1.01ms 1 0.00% ±0 B
├─main tables 455.55s 1 32.15% +94.9 GiB
│ ├─create 95.74s 1 6.76% (gen – 55.36%) +49.9 GiB
│ ├─pad 3.92s 1 0.28% (gen – 2.27%) -800.0 KiB
│ │ ├─pad original tables 2.19s 1 0.15% -340.0 KiB
│ │ └─fill degree-lowering table 1.73s 1 0.12% -460.0 KiB
│ ├─Merkle tree 290.11s 1 20.47% +11.1 GiB
│ │ ├─leafs 288.14s 1 20.33% +7.1 GiB
│ │ │ ├─LDE 205.14s 5 14.48% (LDE – 26.88%) +420.5 MiB
│ │ │ └─hash rows 42.05s 5 2.97% (hash – 49.46%) ±0 B
│ │ └─Merkle tree 1.72s 1 0.12% (hash – 2.03%) +9.1 GiB
│ ├─Fiat-Shamir 26.40µs 1 0.00% (hash – 0.00%) ±0 B
│ └─extend 65.78s 1 4.64% (gen – 38.04%) +33.3 GiB
│ ├─initialize master table 1.08s 1 0.08% +32.1 GiB
│ ├─slice master table 4.71µs 1 0.00% ±0 B
│ ├─all tables 63.75s 1 4.50% +52.1 MiB
│ └─fill degree lowering table 950.31ms 1 0.07% -1.2 MiB
├─aux tables 258.34s 1 18.23% +9.1 GiB
│ ├─Merkle tree 258.34s 1 18.23% +9.1 GiB
│ │ ├─leafs 256.35s 1 18.09% +4.1 GiB
│ │ │ ├─LDE 121.23s 1 8.56% (LDE – 15.88%) +44.4 MiB
│ │ │ └─hash rows 35.09s 1 2.48% (hash – 41.28%) ±0 B
│ │ └─Merkle tree 1.74s 1 0.12% (hash – 2.05%) +9.1 GiB
│ └─Fiat-Shamir 104.15µs 1 0.00% (hash – 0.00%) ±0 B
├─quotient calculation (just-in-time) 544.45s 1 38.42% +15.4 GiB
│ ├─zero-initialization 2.07s 1 0.15% +83.4 GiB
│ ├─fetch trace randomizers 389.83µs 1 0.00% ±0 B
│ ├─poly interpolate 22.80s 1 1.61% (LDE – 2.99%) ±0 B
│ ├─calculate quotients 466.82s 1 32.94% +14.2 MiB
│ │ ├─poly evaluate 183.02s 8 12.92% (LDE – 23.98%) +102.9 MiB
│ │ ├─trace randomizers 209.07s 8 14.75% (LDE – 27.39%) -2.1 MiB
│ │ └─AIR evaluation 74.74s 8 5.27% (AIR –100.00%) -98.1 MiB
│ │ ├─zerofier inverse 13.80s 8 0.97% +432.7 MiB
│ │ └─evaluate AIR, compute quotient codeword 60.35s 8 4.26% +363.6 MiB
│ ├─segmentify 26.98s 1 1.90% +12.4 GiB
│ └─restore original trace 21.95s 1 1.55% (LDE – 2.88%) ±0 B
├─hash rows of quotient segments 2.67s 1 0.19% (hash – 3.14%) +4.1 GiB
├─Merkle tree 1.74s 1 0.12% (hash – 2.04%) +9.1 GiB
├─out-of-domain rows 43.33s 1 3.06% +111.1 MiB
├─Fiat-Shamir 68.37µs 1 0.00% (hash – 0.00%) ±0 B
├─linear combination 38.32s 1 2.70% +4.2 GiB
│ ├─main 3.34s 1 0.24% (CC – 12.38%) +303.1 MiB
│ ├─aux 3.22s 1 0.23% (CC – 11.95%) +363.9 MiB
│ └─quotient 17.62s 1 1.24% (CC – 65.34%) +768.3 MiB
├─DEEP 11.08s 1 0.78% +8.1 GiB
│ ├─main&aux curr row 3.61s 1 0.26% +2.1 GiB
│ ├─main&aux next row 3.66s 1 0.26% +2.1 GiB
│ └─segmented quotient 3.81s 1 0.27% +3.1 GiB
├─combined DEEP polynomial 2.78s 1 0.20% -5.1 GiB
│ └─sum 2.78s 1 0.20% (CC – 10.32%) -5.1 GiB
├─FRI 11.38s 1 0.80% -4.8 MiB
└─open trace leafs 32.88s 1 2.32% +5.9 GiB
└─recompute rows 32.86s 2 2.32% +3.8 GiB
### Categories
LDE 763.21s 53.86%
gen 172.95s 12.20%
hash 85.01s 6.00%
AIR 74.74s 5.27%
CC 26.96s 1.90%
Clock frequency is 3974 Hz (5632027 clock cycles / (1417016 ms / 1 iterations))
Optimal clock frequency is 11839 Hz (16777216 padded height / (1417016 ms / 1 iterations))
FRI domain length is 2^27
As with all other benchmarks, this was performed on a Threadripper 7995wx with 768GB RAM.