Decrypting Bitcoin Blockchain Data by Mass Analysis

Bitcoin is a peer-to-peer electronic payment system that has grown rapidly in popularity in recent years1,2,3,4. As a distributed ledger technology (DLT), Bitcoin records newly created transactions in a decentralized manner, eliminating the need for intermediaries such as banks and reducing transaction costs.5,6,7.

Bitcoin relies on Unspent Transaction Output (UTXO) logging to efficiently verify newly created transactions8,9,10,11. An illustrative example of a UTXO is shown in Figure 1. UTXOs can be generated either as group rewards or as transaction outputs. Block rewards are newly minted bitcoins (BTC) that are distributed to miners for their work to maintain the network, such as routing transactions and verifying blocks. In fact, all UTXOs can be dated to block bounties. The timestamp is logged when a UTXO is created. A UTXO is spent and converted into a spent transaction output (STXO) when used as an input to a transaction. The timestamp is logged again when a UTXO is spent, and each UTXO can only be spent once. This unique feature allows us to calculate the age of each UTXO and the age of each STXO as we do in population data. Take Figure 1 as an example. As of July 1, 2020, UTXOs 1-3 years old are 8.5 years, 1 year, and 1 day, respectively. Immediately after Alice pays Bob on January 1, 2021, UTXOs 1–3 are converted to STXOs with ages of 9 years, 1.5 years, 0.5 years, and 1 day, respectively.

graph 1
shape 1

An example of the birth and death of a UTXO. UTXOs 1, 2, and 3 were spent in a transaction taking place between Alice and Bob and converted into UTXOs 4 and 5. UTXOs 1, 2, and 3 became STXOs after the transaction.

Observing the unique structure of Bitcoin blockchain data, we apply mass analysis12,13,14,15,16, which was originally developed for population data, for analysis. To continue the analogy with population data, we say that a UTXO is generated when it is generated as a block reward or the output of a transaction, and we say that a UTXO is dead when it is spent as an input to another transaction. In this way, all UTXOs generated on the same day constitute a daily birth pool, and all UTXOs spent on the same day constitute a daily death pool. We define the age of a UTXO as the difference between ‘now’ (the date we are working on) and the time you were born. We define the age of the STXO as the difference between the time the STXO died and the time he was born. Thus, all UTXOs within an age range constitute an age group, and all STXOs within an age range constitute an age group. With this framework, we naturally transcribe into Bitcoin blockchain data the triad of birth, death, and age using population cohort analysis.

Usually, we need to query the full history of Bitcoin blockchain data to get economically meaningful variables. With over 1.6 billion historical transactions on the Bitcoin blockchain, it is now increasingly difficult and computationally intensive to download complete Bitcoin blockchain records. It is therefore important to query bitcoin transaction data in a more efficient manner and provide economic insights17. Cluster analysis provides a new perspective in which we can analyze the data within each group separately before merging it into a time series.

Our workflow is shown in Figure 2. We query and process the input and output data of Bitcoin transactions within each daily batch. By doing so, we have successfully generated datasets and visualizations for some key Bitcoin transaction indicators, including the daily life distributions of STXOs as percentages (Fig. 3) and the cumulative daily age distributions of UTXOs (Fig. 4). These visualizations can be used to study the functions of Bitcoin (BTC) as a currency. The three functions of currency include serving as a store of value, a unit of account, and a medium of exchange. For example, Figure 4 shows the number of BTCs in UTXOs (that is, BTCs that have not been spent) by age distribution. By the end of 2020, nearly 2 million bitcoins had not been transacted for more than 10 years. In the past 5-10 years, from 2 to 5 years, and from 1 to 2 years, approximately 2 million, 4.5 million and 3 million bitcoins, respectively, have remained inactive. This equates to about 11.5 million bitcoins that have not been traded for more than one year. BTCs act as a term deposit and act as a store of value. Moreover, approximately 5 million BTCs are alive for 1 month to 1 year. These BTCs are similar to demand deposits. The most frequently traded BTCs are those that are between 1 day and 1 month old (2 million) and less than 1 day old (0.2 million). BTCs act as a medium of exchange.

picture 2
Figure 2

Collective analysis workflow on BTC UTXO data.

fig. 3
Figure 3

Lifetime distribution of BTC STXOs. The figure shows the percentage of transaction output history spent with different lifespans on each day through February 2021. For example, by February 2021, STXOs with ages less than one day accounted for 80% of all STXOs, while those with between-day lives One and one month accounted for another 15%.

Figure 4
Figure 4

Number of BTC UTXOs by Age. The figure shows the total unspent transaction output by age. For example, by February 2021, approximately 200,000 UTXOs less than a day old were used as a medium of exchange and about 2 million UTXOs over 10 years old were lost or used as a store of value.

Our final datasets include one dataset that characterizes STXOs and another set that characterizes UTXOs, both of which are smaller than 1MB. Furthermore, mass analysis keeps data querying and processing to a minimum for future updates and enables automatic updates. We thus present a computationally feasible approach to characterize BTC transactions, paving the way for future economic studies of Bitcoin. Our methods can generally be applied to other cryptocurrencies that adopt the UTXO protocols, including Litecoin, Dash, Zcash, Dogecoin, and Bitcoin Cash.

Leave a Comment

Your email address will not be published.