How a (future) Light Node can import old wallet (privately)

This is a recent chat in telegram that I am archiving here.

E:
So even after succintness is implemented (~ability for light nodes to verify the chain from the latest block only, thanks to recursive proofs), we still need archival nodes for quering old transactions in order to ensure correct balance for old wallets? Is there any way around this?

danda:
yes, that is my understanding. Some full (archival) nodes will always need to exist. A brand new wallet can start from the tip and not need to download any old blocks. However, if you restore an old wallet from seed, it would need to query a full node to learn about which blocks affect it and download those blocks. The complication is doing this in a privacy preserving manner, eg with bloom filter, and probably some tradeoffs there.

@alanszepieniec please correct if I got something wrong.

@aszepieniec:
Perfectly correct :100: If you load an old seed, you need historical blocks to update your UTXOs’ membership proofs. You either download all of them, in which case you might as well be running an archival node, or you download specifically the blocks you’re interested in and then you risk losing some privacy. I guess there are also fancy cryptographic techniques like private information retrieval that could in theory allow you to query a block from a server that does not learn which block you are interested in, but I’m not an expert there.

danda:
yeah, I mean a basic technique is just to download a bunch of decoy blocks as well. So then:

a) your wallets utxos are hidden within the actual blocks you need
b) the actual blocks you need are hidden amongst the total set of blocks you request.

But that requires you already know which blocks you need. So that’s the part I’m not sure how to do privately, as it seems necessary to share your set of receiver_id with the full node. Maybe hide them in a much larger set of receiver_id, but then that requires having blocks, so a little bit chicken/egg.

danda:
I guess it could be something like:

  1. ask every peer for a (different) set of random blocks.
  2. extract a bunch of receiver_ids from these (via public_announcements).
  3. inject your own receiver_ids into the set.
  4. ask a random peer for the set of blocks matching all receiver_ids.
  5. add some more random blocks to the set.
  6. query the blocks from random peers.
  7. now wallet has all the information it needs.

thinking about it some more…

Don’t the membership proofs have to be updated block by block? Eg, if my wallet was sent a utxo at height 50 and tip is now height 500, don’t I need blocks 50..500 in order to perform all the updates?

paging… @aszepieniec @sword-smith

A mutator set membership proof consists of

  • the sender randomness
  • the receiver preimage
  • the AOCL MMR membership proof
  • one Chunk and SWBFI MMR membership proof for each one of 45 indices.

The first two bullet points never change. Only the last two need to be updated as transactions are made and confirmed and blocks found.

The AOCL MMR membership proof is a list of digests that is only ever extended. Furthermore, its length is logarithmic in the total number of transactions. Therefore, there are a relatively small number of extensions which are induced by a small number of blocks. The user can locate the block that induces a relevant extension through bisection search based on the number of leafs in the AOCL MMR, which is readily apparent from a downloaded block.

The SWBFI Chunk and MMR membership proof are a little more complicated because, when UTXOs are spent, Chunks are modified and this modification percolates into the MMR and its membership proofs. In principle, the number of modifications can scale linearly with the number of transactions in the intervening time. However, most of these modifications overwrite the same data again and again. There is a logarithmic number of blocks from which the digests of the most recent SWBFI membership proof can be read, and there is at most one block from which the most recent Chunk value can be read. The question is, how to find these blocks?

The starting point is the root of the current MMR peak, which is explicitly present in the most recent block. Then through bisection search based on MMR membership proof validity, the user can find the block that contains the most recent value of the last digest of the MMR membership proof. (This digest might be embedded in one of the transaction’s removal records instead of the block’s mutator set, but that hardly matters.) Another bisection search gives penultimate digest. And so on until the first digest is found, at which point the value of the Chunk is exposed as well.

The above sketch is a theoretical solution that achieves communication and time complexity O((log N)^2), where N is the number of blocks in the intervening time. This complexity answers the question negatively. In practice though, it might take a long while before this theoretical succinct algorithm outperforms the naĂŻve one where you download all the blocks to build an archival mutator set and then read out the data you want from there.

the following is a succinct LLM generated response. It states it better than I could.


Thanks for the detailed breakdown.

I understand the conclusion that for the current network state, the naive approach (downloading all blocks) outperforms the complex theoretical approach. However, I want to challenge the idea that this answers the question “negatively” in the long run.

If we project this forward to a mature blockchain, the “theoretical” solution transitions from being an optimization to being a hard requirement.

Consider a scenario where the chain is at height 1,000,000 and a light node needs to import a wallet with UTXOs from block 1,000 (where N \approx 10^6). Assuming a fixed block size of 1MB:

  1. The Naive O(N) Solution:
    The node must download and process 1,000,000 blocks.
    Total Data: ~1 TB.
    Result: This is impossible for a mobile device or standard light node.

  2. The Succinct O((\log N)^2) Solution:
    \log_2(1,000,000) \approx 20.
    (\log N)^2 \approx 400.
    Even if the node fetches a full block for every step of the search (worst-case), it only downloads ~400 blocks.
    Total Data: ~400 MB.
    Result: This is heavy, but entirely feasible for a mobile phone.

While the “naive” approach works for the bootstrap phase, it seems we cannot rely on it indefinitely. Doesn’t this make the succinct algorithm mandatory for the protocol’s long-term viability on consumer hardware?


So it seems to me (not LLM) that initially it would make sense to implement light node with the naive approach for expediency, but eventually we would want to improve on it as block height increases. I’m glad there seems to be a path for that.

2 Likes