LIght nodes and wallet history

⚓ Neptune 📅 2023-10-07 👤 danda 👁️ 1895

Warning

This post was published 541 days ago. The infomation described in this article may have changed.

I’ve been reading through some of the excellent blog posts on neptune.cash today, and have read several descriptions of light node functionality without going too deep into the details. Here’s a mile high summary of my present understanding, please correct anything wrong:

The light node Z would ask another node (full, archive, etc) for the most recent (tip) block and receive it.
Z would validate the block against known info, ie the genesis block, using magic (to me) of recursive snarks.
Z would also ask for and receive the current utxo set. (correct? I’m guessing/assuming)
Z would validate the utxo set against the merkle-mountain-range-accumulator in the tip block.
Z iterates through entire utxo set to find any that match user’s keys (addresses).
Z calculates user’s balance as sum(matching_utxo.amount). (by coin type).

This seems to get us the present balance and allow us to spend. However, I don’t see how a transaction history can be displayed without requesting additional blocks from full/archival nodes. Obviously requesting all blocks would allow us to scan everything, but by then we are effectively a full node. So how to get a transaction history without giving up privacy? Or do we just sacrifice tx history?

🏷️ wallet

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮

aszepieniec 2023-10-08 👍 👎

Z would also ask for and receive the current utxo set. (correct? I’m guessing/assuming)

Z would validate the utxo set against the merkle-mountain-range-accumulator in the tip block.

This is incorrect. No node in the network possesses the UTXO set. The block contains the mutator set accumulator, which you can think of as the Merkle root of the UTXO set. Nodes on the network might additionally possess

supporting data for producing membership proofs in this data structure (mostly Merkle authentication paths); and/or
historical transactions.

Towards updating the workflow:

Z requests the set of public scripts attached to transactions since last sync.
Z filters for public scripts containing his public key hash.
For each such public script:
- Z decrypts the associated ciphertext.
- Z recomputes the UTXO-commitment and determines the index under which it is listed in the transaction.
- Z requests and validates the Merkle authentication data for this commitment (with or without a sufficiently large random enveloping range for privacy) in the AOCL MMR. The AOCL MMR is just a Merkle mountain range collecting UTXO-commitments. It is part of the mutator set.
- Z computes the SWBF indices that the UTXO will set if it is spent. The SWBF is a very large Bloom filter and keeps track of which UTXOs are spent. It is also part of the mutator set.
- Z requests and validates the Merkle authentication data for these indices in the SWBF MMR (also with or without a random envelope).
- Z now possesses a UTXO and its membership proof in the mutator set. He adds it to his list of monitored UTXOs.
Z calculates balance as sum(utxo.coins[native] for utxo in monitored_utxos)

Transaction history is determined by the block heights of the transactions where the user receives a UTXO or spends it.

danda 2023-10-08 👍 👎 [op]

I’m thinking about the case where we say load in a paper wallet to a brand new light node and need to check for historic tx going back possibly to genesis.

Z requests the set of public scripts attached to transactions since last sync.

So if this is the initial sync for Z, is that not the set of all historical transactions?

sorry if I’m being dense.

aszepieniec 2023-10-08 👍 👎

So if this is the initial sync for Z, is that not the set of all historical transactions?

Almost – assuming that the paper wallet mnemonic phrase encodes a timestamp in addition to the seed, it would be the set of all historical transactions since the seed was generated. But that distinction hardly matters if we are crossing a time span of say 30 years.

There are ways to reduce the complexity of this task, for instance by outsourcing the filtering to a support node. Alternatively to sending the public key hash and asking for exact matches, you could send a Bloom filter and ask for potential matches – this is a little more mindful of privacy but comes at the cost of more work on the part of the light client.

Here’s another solution that I have not seen described before. Every block comes with a data structure that resembles an invertible Bloom filter. It is basically an array of integers. Every time a public script with ciphertext is confirmed, the indices determined by the matching public key hash are incremented by one. Take the difference (as vectors) of the invertible Bloom filters of any two blocks, and you can tell with a certain probability whether a UTXO bound for you was confirmed in between those two blocks. For time spans that are not too large, the light client can do binary search to locate his UTXOs. I did not run the maths, but intuitively as time increases, binary search breaks down and you’re still stuck with linear running time – but concretely less than the naïve approach.

I previously came up with this construction as a way to heuristically test whether one block was a descendant of another, but it works for locating inbound UTXOs as well.

danda 2023-10-09 👍 👎 [op]

For time spans that are not too large, the light client can do binary search to locate his UTXOs

Ok, that seems a neat solution. So then I’m guessing there would be a way for the light node to query just the bloom filters (without any tx data) of all blocks from another node? And then request specific blocks that match. that sounds mostly privacy-preserving, unless the requested blocks were formed from small # of transactions.

Also, I wonder if it makes sense for there to be a light wallet mode that does not bother generating any tx history. So upon importing a paper wallet, you would be able to see your present balance and spend but would not see any past history. Maybe that’s an acceptable limitation in return for a fast sync.

aszepieniec 2023-10-09 👍 👎

I’m guessing there would be a way for the light node to query just the bloom filters (without any tx data) of all blocks from another node?

That makes sense. But note that none of this has been written yet. You’re participating in the design process right now ;-)

Also, I wonder if it makes sense for there to be a light wallet mode that does not bother generating any tx history. So upon importing a paper wallet, you would be able to see your present balance and spend but would not see any past history. Maybe that’s an acceptable limitation in return for a fast sync.

I certainly don’t think showing history is necessary for 99% of use cases, and nor would I mind the exchange of timestamps-info for faster sync. However, I’m not sure this is technically feasible. If you are going to use the blockchain as a backup mechanism (as in generation addresses or traceable change) then you need the ciphertext that encodes the UTXO data. You get the ciphertext from the transaction, and the transaction will have been confirmed in a block of a given height. In short, I don’t see a pathway towards getting the UTXO info without simultaneously getting the block height.

danda 2023-10-10 👍 👎 [op]

You’re participating in the design process right now ;-)

Heh, just trying to grok this thing. I guess that’s where asking questions leads eventually.

then you need the ciphertext that encodes the UTXO data.

ok, it makes sense that is necessary for the present unspent UTXOs. But I think we don’t need to lookup the blocks for already spent UTXO. Is that right? This wallet might have 3 unspent UTXO and hundreds or thousands of spent UTXO, so that could be a large savings. If correct, that’s the mode I’m suggesting that would avoid looking up old history.

aszepieniec 2023-10-11 👍 👎

But I think we don’t need to lookup the blocks for already spent UTXO. Is that right? This wallet might have 3 unspent UTXO and hundreds or thousands of spent UTXO, so that could be a large savings. If correct, that’s the mode I’m suggesting that would avoid looking up old history.

I think you’re mostly right. For spent UTXOs, you just care that it is spent and you don’t care when. The fact that a given UTXO is spent might be apparent from blocks mined long after the expenditure.

Whether a UTXO is spent is determined whether its removal record indices (a list of integers uniquely determined by the UTXO when it is confirmed) point to set bits in the sliding window Bloom filter. Blocks only track the active window of this data structure, the inactive chunks are stored in an MMR. So if you look at blocks mined long after the UTXO, the active window might have slid too often to have any bearing on whether the UTXO was spent. In this case you need to a) read and process historical blocks or b) invoke the help of supporting nodes; in order to synchronize the UTXO’s membership proof. In the process of this task you will find the relevant chunks of the sliding window Bloom filter. Phrased differently, the outcome of this task is either the information that UTXO was spent or else a synchronized membership proof. This process does not automatically give you a timestamp of expenditure.

danda 2023-10-13 👍 👎 [op]

Okay, so this seems like potentially a lot to process. How long do you foresee this taking in practice? Let’s use an example of an early adopter, Bob, in 2024 who generates a wallet, mines a few coins, forgets about it, and then loads the wallet again in say 2030.

I guess I’m trying to reconcile with statements about fast sync time in the whitepaper and elsewhere, eg:

Synchronizing. The long synchronization time for newcomers to the network represents a major obstacle for adoption. In Neptune, every block contains enough information for newcomers to the network to synchronize fast -- in time independent of the age or popularity of the network.

Here the word newcomer may be key because Bob is not a newcomer in the same way as someone who generates a brand new wallet. But still we want his experience to be fast.

or b) invoke the help of supporting nodes

Is there a privacy tradeoff here?

danda 2023-10-13 👍 👎 [op]

grr, formatting mistake in my last comment. I don’t see any way to edit comments. Is that a forum software option that can be turned on?

related: I’ve stopped using Preview because it seems the preview doesn’t display markdown properly, eg quoted lines appear as regular lines. Though they appear in the final post. freedit bug?

aszepieniec 2023-10-14 👍 👎

Bob is not a newcomer in the same way as someone who generates a brand new wallet. But still we want his experience to be fast.

Sure, but at what cost? Different features are in tension with each other, and it might not be possible to achieve them all simultaneously. If this feature is difficult to marry with, say, the light footprint of nodes who are continually online, then I think I prefer to prioritize the second.

What does Bob’s workflow look like? He needs to find:

a) the chunks of the sliding window Bloom filter (SWBF) where his UTXOs set bits;
b) the Merkle mountain range (MMR) membership proofs (which are Merkle authentication paths) authenticating these chunks relative to the SWBF peaks list that’s present in the most recent block;
c) the Merkle mountain range membership proofs authenticating membership of the UTXO’s commitment in the append-only commitment list (AOCL) relative to the AOCL peaks list that’s present in the most recent block.

These data structures and their mechanics are all components of mutator sets, which is how Neptune maintains a private UTXO set. There are several ways in which Bob can obtain this information.

Remain Online

This obviously violates the premise of the question but is worth mentioning for the sake of completeness. The simplest case is that Bob remains online and in particular processes every block as it is mined. Some blocks induce updates on his membership proofs and so he applies them on the go.

Query Select Blocks

This model assumes Bob can query blocks by height from peers.

Let’s start with (c), which is the easiest. Bob already knows the index at which the commitment was inserted into the AOCL as well as a membership proof at that time. As a function of $L$ , the number of new UTXOs confirmed in the time spent offline, the AOCL authentication path can undergo at most $\log L$ updates. Bob can find the blocks applying these updates by running as many binary searches.

Tasks (a) and (b) are best done simultaneously. To analyze the cost, let’s divide the expenditures into two categories: A) UTXOs are removed by setting bits in monitored chunks of the SWBF or earlier ones; and A) UTXOs are removed by setting bits in later chunks.

(A). Let $U$ be the size of the UTXO set at the time Bob mined his coins. Then there can be at most $U$ updates to monitored or older SWBF chunks or their membership proofs, which might necessitate Bob to update his chunks or their membership proofs. He can locate the blocks applying these updates by running as many binary searches. Note that the space covered by this binary search is essentially all blocks mined in the time spent offline, which scales linearly with $L$ .

(B). As in the case of task (c), there can be at most $\log L$ updates and Bob can find the relevant blocks by running as many binary searches.

The cost is dominated in asymptotic terms by (A) with the expression $O (U \cdot \log L)$ . It is worth noting that the factor $U$ in particular captures a worst-case scenario.

Query Archival Peers

An archival mutator set is one that contains all current data, including all nodes of all Merkle trees and all SWBF chunks (not to mention the active window). This model assumes that peers on the network maintain this archival data structure. Currently, this is the only mode of operations that the supports.

In this case, Bob just queries one archival peer (or several and without making redundant queries, for better privacy) for:

the relevant SWBF chunks along with membership proofs (and possibly also some decoy chunks for better privacy);
the membership proof of the UTXO commitment in the AOCL at the given index (possibly plus or minus a random range for better privacy).

The code for making these queries and responding to them has not been written yet.

Collect All Blocks

Currently, the only supported mode of operations is for the synchronizing client to download all blocks and synchronize his UTXOs block-by-block.

Build Archival Mutator Set

Currently, the synchronizing client also builds the entire archival mutator set from the downloaded blocks. A more direct way to synchronize UTXOs is to get the membership proofs directly from the archival mutator set. This functionality has not been implemented yet.

Not

This non-option violates the premise of success but is worth mentioning for completeness’ sake as well. A design criterion is that the network can continue functioning even if it collectively forgets old information (or refuses to serve it). In particular, historical data availability is not guaranteed. It is Bob’s responsibility to ensure that he has access to the information needed to update his membership proofs.

aszepieniec 2023-10-14 👍 👎

I don’t see any way to edit comments. Is that a forum software option that can be turned on?

I’ll have a look, but I don’t think so. It has to do with the flat database that freedit uses: it is append-only.

I’ve stopped using Preview because it seems the preview doesn’t display markdown properly, eg quoted lines appear as regular lines. Though they appear in the final post. freedit bug?

Yes, that sounds like a freedit bug.

Freedit has a bunch of drawbacks but I like it because it’s self-hosted and lightweight. Moreover, it’s written in rust and open-source. I already had one bug fix merged.

aszepieniec 2023-10-14 👍 👎

I’ll have a look, but I don’t think so. It has to do with the flat database that freedit uses: it is append-only.

This can’t be the reason because it looks like the OP can edit his/her post. I don’t know why follow-up posts can’t be edited though …

danda 2023-10-14 👍 👎 [op]

Sure, but at what cost? Different features are in tension with each other

yes, there are always tradeoffs. I’m just trying to understand better what those tradeoffs are, and your detailed posts are super helpful, thanks! When I was reading the whitepaper, I believe I conflated two concepts. The whitepaper is talking about a new node syncing the blockchain and being able to validate the tip without downloading all blocks. I was somehow assuming that meant that a wallet balance (and history?) could be instantly synced as well, but clearly that is more complicated.

I think what I’m really trying to grok with this thread is the eventual level of simplicity/ease that will be available to end-users, ie what are we trying to build in terms of seamless wallet functionality? And I guess there are even multiple components of that, eg: what is a minimum viable product for mainnet vs what can we ultimately provide given protocol design/constraints (and without falling back to a centralized server). And then also so far I think we have just been talking about layer1, but also I understand Neptune is being designed with lightning wallets in mind, and possibly even with the expectation that lightning wallets are the default, or only wallet, from the very beginning. But unless I’m mistaken, a lightning wallet doesn’t really help out with Bob’s scenario.

In the bitcoin world, I am partial to the electrum system, which offers a client that can connect to one or more federated nodes and obtain balance + history in seconds. I realize there are some privacy and trust concerns with this, but in terms of pure speed and convenience I find it quite nice and without having to trust a fully centralized server as many other wallets do. So as I read descriptions of how neptune works, I’m subconsciously asking how it compares with electrum’s rapid wallet sync experience. Now clearly something like electrum could be built on top of Neptune as distinct nodes, but from the sounds of it that may not be necessary once “Query Archival Peers” functionality is in place. And that functionality may not be necessary for a mainnet launch, so long as there is a clear path to implementing it later. Does that sound about right, or anyway what are your thoughts?

aszepieniec 2023-10-15 👍 👎

what is a minimum viable product for mainnet vs what can we ultimately provide given protocol design/constraints (and without falling back to a centralized server)

I think the question “what is absolutely necessary for a minimum viable product?” is a good way to categorize features and implementations more or less in alignment with priority. The architecture does anticipate plenty other features, but these can be placed on the roadmap for implementation at a later date.

Since I haven’t put words to this yet, let me articulate a couple of target modes of use (for lack of a better term). These modes of use are not necessarily mutually exclusive.

MVP. The node is archival and mining is optional. Every payment uses the blockchain as a backup of relevant information, meaning that users only need to keep their secret seed phrase safe. If the node is offline for some time, it synchronizes its UTXOs from blocks mined in the meantime.

Off-Chain Backups. The node uses some alternative to using the blockchain as a backup service. This enables different address types and ultimately generates lighter transactions.

Light Node. The node is not archival but does keep track of its own UTXOs and associated membership proofs. If it spends time offline, it synchronizes its membership proofs by querying peers for blocks or mutator set data.

Lightning. Lightning is every bit as big a deal as Bitcoin. Lightning nodes jointly own UTXOs whose mutual balance can be updated with cryptoeconomic security. By chaining together an atomic sequence of balance-updates, any member of the Lightning network can pay any other member provided that there is a connecting line and all hops on it have enough liquidity. As these balance-updates take place offline, they leave no trace on the blockchain. Nodes are encouraged with routing fees to remain online.

Plug-In. In order to participate in an overlay protocol or a smart contract (or both), nodes need to install and activate a plugin. In particular, the core node provides the interface that the plugin interacts with.

In regards to Electrum’s rapid wallet sync experience, I tend to think this is described by light nodes outsourcing work (and losing some privacy) to archival peers.

danda 2023-10-17 👍 👎 [op]

Thanks for you patience in answering my q’s. I think this last post might be a good basis for a blog entry. Maybe some of your other answers in this thread as well.