NARRATIVE/SYSTEMATIC REVIEWS/META-ANALYSIS
Wendy M. Charles, PhD1*, Brooke M. Delgado, MS1
1BurstIQ, Inc., Denver, Colorado, USA
Keywords: blockchain, data sales, data valuation, intangible assets
There is increasing recognition that health-oriented datasets could be regarded as intangible assets: distinct assets with future economic benefits but without physical properties. While health-oriented datasets—particularly health records—are ascribed monetary value on the black market, there are few established methods for assessing value for legitimate research and business purposes. The emergence of blockchain has created new commerce opportunities for transferring assets without intermediaries. Therefore, blockchain is proposed as a medium by which research datasets could be transacted to provide future value. Blockchain methodologies also offer security, auditability, and transparency to authorized individuals for verifying transactions. The authors will share data valuation methodologies consistent with accounting principles and include discussions of black market valuation of health data. Further, this article describes blockchain-based methods of managing real-time payment/micropayment strategies.
Citation: Blockchain in Healthcare Today 2023, 6: 185 - https://doi.org/10.30953/bhty.v6.185
Copyright: © 2023 The Authors. This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, adapt, enhance this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0.
Received: 10 November 2021; Revised: 19 December 2021; Accepted: 21 December 2021; Published: 21 January 2022
Competing interests and funding: The authors work for a company that designs blockchain platforms used for healthcare information. However, this paper was intended to provide broad educational information about data marketplaces and data valuation. Several blockchain platforms are described in a neutral and objective manner.
The authors did not receive any funding or financial source of support to write this manuscript.
*Corresponding Author: Wendy M. Charles. Email: wendy.charles@cuanschutz.edu
Individually identifiable information is collected about patients in nearly every health and wellness-oriented app, wearable device, and healthcare setting.1 This information is used to identify treatment opportunities within the care facilities where the patients are treated. However, data are also regularly shared and sold to other technology or life sciences organizations to design innovations in healthcare, identify new healthcare markets, create business opportunities, and uncover revenue collection opportunities.2
Life sciences research organizations have a tremendous need to acquire health information from real-world sources, referred to as “real-world data,” as part of a United States (U.S.) Food and Drug Administration (FDA) Real-World Evidence Framework initiative.3 With careful planning, the FDA notes that “a non-interventional study has the potential to meet FDA’s regulatory standards for an adequate and well-controlled clinical study”.4 Among sources of real-world data, life sciences organizations seek information about the effectiveness of pharmaceutical compounds when used in typical care conditions—rather than the stringent environment of a clinical research setting—to learn how physicians are utilizing these drugs and which patient groups may experience unexpected benefits or adverse events.5 As a recent example, the drug Blincyto (blinatumomab) received accelerated FDA approval to treat acute lymphoblastic leukemia using a single-arm trial. The experimental group was compared to a historical control group using electronic health records from 694 patients in the European Union and U.S.6 Electronic health records also created the control group for a new FDA-approved indication for Prograf (tacrolimus) to help prevent organ rejection.7 Overall, acquiring and using existing health information allows research organizations to achieve faster, lower-cost research that demonstrates treatment effectiveness within real-world care conditions. Therefore, there is a tremendous need to acquire health information.
Many organizations have uncertainty regarding the best technologies to manage health information exchange and monetization securely. While some companies utilize traditional database technologies, blockchain technologies have emerged to allow for more capabilities. X. Wang et al.8 point out that blockchain is already used to exchange contracts, capital, and digital assets, so this technology would inevitably be used to exchange data. In addition, buying and selling assets previously required an intermediary, such as a financial institution or marketplace. However, blockchain technologies can simplify the data transaction process by allowing organizations to transfer assets without intermediaries.9 The transfer is completed, validated, and recorded on the blockchain in near-real-time.10
While health information can be collected from many places, such as patient wellness apps, patient repositories, and research studies, most health information used for sharing and sale originates from organizations that deliver or support healthcare.1 In the U.S., these organizations are referred to as covered entities, involving “(a) a health plan, (b) a healthcare clearinghouse or (c) a healthcare provider, who transmits any health information in electronic form in connection with a transaction” (45 CFR 160.103).
The nature of health information that can be shared and sold depends on the degree to which information is considered to constitute “protected health information” and whether the issuing organization is a covered entity. While each country imposes privacy regulations to protect health information, a review of privacy requirements is outside the scope of this article. Therefore, this section addressed only the health information privacy requirements of the U.S. The Health Insurance Portability and Accountability Act (HIPAA) defines protected health information as individually identifiable health information transmitted or maintained in any other form or medium (45 CFR 160.103); and a covered entity.
Authorized methods of distributing protected health information include:
The HIPAA regulations do not apply to information generated or provided by a patient or healthcare consumer that is not maintained by a covered entity.
There are several methods by which organizations can obtain health information.
Several startups have been formed that compensate patients for sharing their health information. For example, EncrypGen enables individuals to upload their DNA profiles to a marketplace and set a price to sell their profiles.12
Pharmaceutical company Roche AG purchased Flatiron Health, acquiring 260 community cancer clinics to obtain cancer treatment information to support regulatory decisions.13 Roche’s purchase price of $1.9B averages $1,000 per medical record for 2 million oncology patients.14
A covered entity may share health information with a member of its workforce or a business associate for providing professional services, provided that the covered entity represents that the health information includes the minimum necessary to achieve the stated purpose (45 CFR 514(d)(iii)). For example, Ascension Health and the Mayo Clinic distribute health information to Google under Business Association Agreements to design artificial intelligence algorithms to identify opportunities for treatment and revenue.15,16
More than 10,000 deidentified health-related data sets are publically available on Data.gov (https://www.data.gov/). PubMed (https://pubmed.ncbi.nlm.nih.gov/) allows authors to upload their health data sets with their publications.2 In addition, some universities offer publicly accessible data warehouses for researchers to query and download deidentified data. As of October 2021, the University of Michigan’s Inter-university Consortium for Political and Social Research program offers data sets from over 16,000 studies represented in nearly 100,000 publications (https://www.icpsr.umich.edu/web/pages/ICPSR/).
Data sellers have created a $100B market with companies buying, selling, and trading deidentified information.14 In fact, 14 U.S. health systems started a new company, Truveta, to aggregate and sell their deidentified patient data.17
This article focuses on data marketplaces and blockchain-based technologies’ role in managing pricing, access, and monetization.
Data allows decision-makers to make calculated, insightful, and profitable decisions adding value to data mining alone.18 Health information derived from electronic health record systems creates value for life sciences research due to the complex demographics, health history, and other health-oriented behaviors. Because life sciences organizations seek data to develop new revenue-generating opportunities, they are willing to pay for this health information—ascribing value to the data. This section explores factors for determining data value.
Because a dataset provides value and offers potential financial benefits, a dataset could be considered an asset.19,20 According to the Financial Accounting Standards Board,21 recognized by the U.S. Securities and Exchange Commission, an asset has three characteristics.
Data are classified as intangible assets when they have no physical properties. As shown in Figure 1, examples of intangible assets include patents, trademarks, copyright, and intellectual property.23 Intangible assets are generally not accounted for on an organization’s balance sheet24 but add to the organization’s value in the marketplace. Similar to most intangible assets, the potential data value is challenging to measure.8
Fig. 1. Representative types of intangible assets.
Data value is determined by various factors such as data complexity, number of records in the data set, number of variables, and quality of the data.25 Further, deidentified health records are less valuable because researchers need dates and geocodes to contextualize disease progression and co-morbidities.26 In addition, data valuation is influenced by data perishability, which involves devaluation over time, and time-dependency, a measure of the time since data collection.27 The use of blockchain also offers features that may increase data value.28
Organizations use both subjective and objective methods to determine dataset value. The following strategies are not comprehensive but describe the most common methods used to value intangible assets: the cost, income, and market approaches.
With a cost approach, data sets are valued based on the estimated historical costs to create the data set29 or the anticipated costs incurred to replace the data set.30 This approach is aligned with the HIPAA requirement to limit data sales for research to cost-based fees to cover the cost of preparing health information (45 CFR 164.502(a)(5)(3)(ii)). When using historical cost as the basis, organizations should consider likely inflation and other infrastructure costs necessary for replacing the data set.31 Organizations may benefit from blockchain-based transparency in price histories and formulas.32
The use of blockchain technology for preparing or replacing the data could potentially increase or decrease the replacement costs. Data valuation could increase when the technologies are novel and there is no competition with the methodology.26 However, blockchain technology also introduces data redundancies and audit trails to significantly reduce replacement costs.33 Future valuations using the cost approach should consider emerging economic and technological developments.
The income approach considered the estimated increased revenue generated by the data set.31 For example, suppose a pharmaceutical company acquires health data sets as real-world evidence that could support a regulatory decision for a new indication. In this case, the data sets could be valued with the projected new drug revenue stream. Blockchain-based data exchanges facilitate data sharing about data uses and related metadata that can be used to determine trends and future needs.32 However, future revenue forecasts are also influenced by market share and adoption rates, requiring several assumptions that could change.29
The market approach appraises data based on the price of comparable data traded or sold. Pricing may be set by guidelines posted within the marketplaces or by analyzing similar sales data.34 Blockchain-based audit trails are currently used to track the history of data values,35 allowing for more convenient access to historical sales information.
Similarly, the market may bear higher values for data sets involving more records, identifiable data, and quality.29 Zozus and Bonner35 note that blockchain-based metadata have been used to facilitate evaluations of data quality and other attributes that may influence valuation. These authors describe how data value-level metadata are used to calculate data age and potential discrepancies. Accordingly, blockchain-based data provenance and integrity may increase perceptions of data value,28 resulting in higher market-driven pricing.
The market approach can also be used to consider supply and demand. Specifically, data buyers consider the availability of other data sets within the market to determine the relative value of any particular data set.9 Concepts of supply and demand drive pricing of illegal data sales on the black market or dark web.36 Stack36 notes that social security numbers sell for around $1 on the dark web, and credit cards sell for $5-110 (with the median ranging from $25-40,37). Medical records sell for $1-1,000, depending on information completeness.38,39 Specifically, electronic health records are more valuable for illicit sales when they include e-commerce transactions and credit card information.40 Data are typically sold using blockchain technology on the dark web in exchange for cryptocurrency—and data are held for ransom requesting cryptocurrency—because of the pseudonymous nature of transactions.40
These data valuation methods can only provide estimates, and organizations are encouraged to use multiple data valuation approaches.29 Birch et al.41 encourage organizations to use advanced technological solutions for these approaches because data access can be tracked and measured more efficiently. As described in the next section, blockchain-based systems are often used to track and record user engagement with datasets, allowing for better value measurements. However, because data values are in constant flux, data valuations should be reviewed and recorded at least annually to ensure accuracy.29
Marketplaces provide digital platforms for buyers and sellers to exchange data. As shown in Figure 2, the volume of data bought and sold on data marketplaces—of all types—is expected to increase 25% from 2020 to 2022.42
Fig. 2. Percent of large organizations as sellers or buyers of data via online data marketplaces.
There are three primary characteristics of blockchain-based marketplaces: private, consortium, and independent data marketplaces. Platform architecture is either centralized or decentralized.43 In this section, the benefits and drawbacks of each characteristic are described.
Private marketplaces are controlled by a single vendor or organization that controls access and governance.44 A private marketplace owner can use the marketplace to its advantage—such as having access and insight to all data exchanges, and the owner can both sell data and charge subscription fees.34 However, private marketplaces can introduce bias on the platform,44 and the platform operator is responsible for protecting the data and ensuring that all data privacy laws are met.23
Consortium marketplaces have a group of owners that cooperate to support platform operations and decision-making.44 Consortia benefit from sharing costs and resources to maintain the reliability of the network.45 These organizations often utilize pooled data to create larger health data sets and collaborate on research. A drawback pertains to the trust required of the other collaborators to protect and manage data appropriately,46 as well as questions about ownership.23
Independent health marketplaces allow individual patients/consumers to provide and sell their own data on a platform, where the buyers, sellers, and marketplaces are independent entities.44 Each entity has independent control—and often independent monetization—over their data and how data are used. As a downside, it is difficult to gain enough market size to individually source and sell data without the buying power of a consortium or large private organization.44
The infrastructure and architecture of a platform refer to locations of data storage, access, and technical governance. A centralized blockchain marketplace collects and stores data where a single (or few) organization(s) provide access controls, regular maintenance, and oversight.47 As an example for health data, centralized marketplaces can negotiate health data purchases directly from healthcare organizations and payers, as well as negotiate large-scale sales to life sciences organizations. Centralized marketplaces can offer additional layers of encryption and smart contracts.47 However, there are potential vulnerabilities for hacking and less operational transparency.48
A benefit to a decentralized marketplace is quality.47 Since data is secured by smart contracts and accessed directly from the data provider, higher quality can be assumed. A drawback of decentralized marketplaces is that it creates more difficult transactions.49 Each transaction must be facilitated through a distributed ledger. An example of a decentralized marketplace is where deidentified data are aggregated using software, but data never leave the source location, and each organization cannot access data from others.
A decentralized location allows the data to stay with the data provider.47 Blockchain-based data exchange platforms exist today as decentralized ecosystems that enable individuals or organizations to source and share data.47 Rather than centralized management where there are potential points of vulnerability,48 blockchain-based data exchanges allow for distributed data stewardship and communication.50
Blockchain technologies offer data management methods for sales and transactions in ways that traditional databases typically cannot provide. Among examples, blockchain can support transaction visibility to ensure the data exchange and payment process is fair and consistent with payment terms.51 Further, because malicious data buyers or sellers may refuse to pay for data,48 smart contracts automate payments and revenue distributions.52 This capability also ensures efficiencies for data exchanges and resource allocations.53 Further, smart contracts ensure that only authorized individuals can contribute or access specific data in a very granular manner.54
Blockchain not only offers a new technology to manage data sharing and tracking but facilitates new economic models. Considering that blockchain first received wide recognition for the transparent exchange of cryptocurrency, the same reasoning is applied to the exchange of other assets. As Lee55 notes, blockchain-based marketplaces offer trusted data while transparently tracking both transactions and payments.
While many blockchain-based data sales and exchange platforms are still early in development, several have achieved stable platforms and market awareness. Several companies also attempt to utilize blockchain-based decentralization with incentive schemes to reward both providers and patients for participation.
Three primary players are involved in blockchain-based data marketplaces: providers, buyers, and digital platform owners.47 Providers list their data in exchange for monetary value, buyers purchase data to add value to their organizations, and marketplace owners/controllers provide a place where data can be stored and sold.47 While open data marketplaces are available to download/exchange data at no charge,47 the authors focus on profit-generating marketplaces. Profit-generating data marketplace participation vary from business-to-business (B2B), consumer-to-consumer (C2C), business-to-consumer (B2C), business-to-business-to-consumer (B2B2C), and business-to-consumer-to-business (B2C2B).56 The following companies offer blockchain-based data marketplaces for data exchange or monetization of health information. This list is intended to be representative but not necessarily comprehensive.
BurstIQ, Inc. (https://www.burstiq.com/): Colorado-based B2B2C technology company, BurstIQ, promotes itself as the first blockchain-based data management platform to process large volumes of health information on chain while meeting health information and regulatory privacy requirements.57 The on-chain capabilities allow organizations to connect patients’ longitudinal, multidimensional health profiles, called LifeGraphs®, using artificial intelligence and machine learning.58 BurstIQ also developed health marketplaces where patients could loan, sell, and license their health information based on automated matchmaking. BurstIQ expanded the collaboration space for research and development, now called “The Foundry,” to share data and leverage crowd intelligence.59 The use of blockchain provides data governance, granular consent capabilities, data provenance, and data security.
Ciitizen, located in Palo Alto, California (https://www.ciitizen.com) and recently acquired by Invitae,60 is designed as a B2C2B personal health record platform where patients can aggregate health information from all healthcare providers and share information for research. While the website does not specify the use of blockchain, Ciitizen’s blockchain technologies have been listed among blockchain-based health platforms (e.g.,61). The platform is free for patients, but researchers pay a fee to access health records when patients agree to share their health information for research. Of particular note, Ciitizen shares these fees with the individuals who agree to provide their health information for research.62 Among the Research FAQs, the website specifies, “Should a patient’s information be included in a study, Ciitizen is committed to returning a portion of the value gained from this study with users to the extent permitted by law (for example, in the form of direct payment, services, discounts, donations, or other value) or to donate this value to an advocacy or research non-profit as directed by the patient”.62
Operating in France and Russia, Datapace.io offers a B2C2B blockchain-based data marketplace for IoT sensor data (https://datapace.io/) but can be used to buy or sell any data.63 The marketplace uses Hyperledger Fabric to build the platform and modules, using a high-performance practical Byzantine Fault Tolerance consensus mechanism. Individuals receive tokens native to the platform for contributing data and managing the Proof of Stake consensus mechanism.63
Datum platform (https://datum.org) was founded in 2017 and is headquartered in Zug, Switzerland. The B2C2B Datum network allows anyone (outside the U.S., China, or South Korea) to own and manage their data using the DAT smart token for buying and selling data.64 While data agnostic, the Datum network founders specify that the blockchain platform is intended for buying and selling individuals’ health information for research. The platform is compared to a decentralized version of the Apple HealthKit that respects data owners’ terms and conditions for data usage.64 The Datum network enhances data research capabilities by capturing and linking data from IoT devices, specifying that the network could capture information from digital health devices and provide research data to scientific or medical institutions.64
Founded in 2015, Dawex (https://www.dawex.com) markets itself as an advanced blockchain-based B2B data-exchange environment where organizations can share and commercialize data. The company notes that the blockchain platform provides data transaction security and traceability.65 As of 2020, Fernandez et al.66 commented that Dawex created a successful sharing platform but had yet to determine how to address data integration and pricing. Specifically, buyers were required to offer a price without being permitted to evaluate the value of the dataset.66
Embleema (https://www.embleema.com/), based in Metuchen, NJ, is designed to collect electronic health records and share them as real-world data for research. Embleema uses a private HIPAA- and GDPR-compliant blockchain to manage granular patient consent and securely store patient information.67 The B2C2B blockchain is also designed for transparency of recruitment opportunities, study progress, and results. Patient participants log in with blockchain-based public and private key pairs instead of user names and passwords.68 When individuals share their health information or participate in virtual studies such as surveys, participants receive points that can be exchanged for unspecified “rewards”.69 However, the Patient Advocacy page specifies that users receive compensation for the uses of their data.68
EncrypGen (https://encrypgen.com/), located in Miami, FL, is a B2C2B blockchain-based DNA marketplace that allows individuals to provide their Genome in return for “$DNA,” a utility token that can be used to buy and sell DNA profiles. The EncrypGen platform facilitates storing, sharing, searching, buying, and selling user-provided profiles.70 The blockchain “gene-chain” backbone is used to manage the privacy and security of genetic data as well as facilitate the data-exchange process. EncrypGen makes data available to third parties, such as research scientists, willing to pay for genetic information.50
Enigma (https://www.enigma.co), a San Francisco and Tel Aviv-based B2B and B2C2B company, offers an open-source blockchain protocol for data sharing. The main-net blockchain, the Secret Network, allows decentralized applications to perform computations on encrypted data. Because of the persistent encryption, the data remain private to the nodes—even on a public blockchain.71 Enigma is designed to be blockchain agnostic and data agnostic, but the company promotes its ability to facilitate research on health information. Further, the platform offers a data marketplace that allows data monetization.71
Hu-manity.co (https://hu-manity.co/) was founded in 2018 and is based in Sparta, NJ. In this B2C2B platform, individuals can upload all, part, or none of their healthcare records to the data marketplace and specify how their healthcare data can be accessed and used.72 Hu-manity.co allows patients to specify how their health information could be used with the prospect that pharmaceutical companies would pay each user for access to their data.73 Of interest, the company allows individuals to “claim title” to their data and recognize their health information as “personal property”.72 The IBM blockchain is used to allow granular consent of data and securely track the uses of data. The website notes that the company has not yet received a critical mass of people using the app, but once enough data are available, data participants will receive utility tokens—the “Hu” token—that could be exchanged for internal offers and incentives with the plan for offering fiat currency in the future.72
LunaDNA (https://www.lunadna.com/), a San Diego-based B2C2B company, was formed in 2017 as a member-owned platform to help individuals manage the scientific and monetary value of their DNA. Individuals are encouraged to upload genetic files with the option of completing additional surveys and adding electronic health records to receive shares in the company.74 A portion of LunaDNA’s proceeds from research is shared with members as (fiat) dividends per the company’s filing with the U.S. Securities and Exchange Commission.74
In May 2021, the Finnish company Nokia launched a B2B blockchain-based data marketplace for sharing and monetizing data (https://www.nokia.com/networks/services/nokia-data-marketplace/). While this data marketplace is not designed exclusively for health data, Nokia specified that health data are a use case for federated learning and monetization within its marketplace.75 Nokia specifies that a private, permissioned blockchain creates trusted and secure data transactions with transaction automation and federated intelligence.75 The nature of monetization to users (i.e., fiat currency or cryptocurrency) is not specified.
PhrOS (https://phros.io/services/health_data_market) was founded in 2016 in Taipei, Taiwan as a B2C2B data-exchange network with a dedicated health data market. Patients can sign consent to share healthcare data with the network to be used for research. Patients and researchers exchange “health points” for health-related data.76 A blockchain is used to manage the fine-grained consent options to use patients’ health information, create a token-based exchange network, and offer wallet services to control users’ keys and tokens.76 The website specifies that it can automatically gather and update participants’ health record data and push alerts to healthcare providers or hospitals if patients need immediate care.
Blockchain technologies have spawned innovation for sophisticated methods for managing data. Because cryptocurrency is a digital asset represented on blockchains, the same approach has been applied to representing physical and digital assets for proof of ownership.77 Referred to as “tokenization,” classes of blockchain-based tokens are divided into two categories of fungible tokens and non-fungible tokens (NFTs). Fungible tokens are designed to be divided into fractions where each fraction is equal to others in value, allowing them to be interchangeable.78 In contrast, NFTs are unique assets that cannot be divided and are not interchangeable, such as a photo or physical object.78 The tokenization of digital assets has created new investment opportunities and new methods of establishing asset ownership.77
Health records are also now being classified as NFTs as unique assets with original value.79 Hapiffah et al.80 created a proof of concept for a medical record system where patients’ medical record data are registered as NFTs to establish proof of evidence and ownership. Sandner et al.79 recognize that datasets can be classified as NFTs for blockchain-based token exchanges where a data set’s value is identified with the value of the NFT. The blockchain also then provides transparency and auditability to ensure honest data transactions.79
Complications impacting data valuation and sale can be economical, social, or ethical. If blockchain organizations wish to engage in data valuation and sale, these values drive considerations of monetization and privacy.1
While healthcare or life sciences organizations may benefit from the sale and use of patients’ health information, will any of that money be given to the patients represented in the data sets? Klugman81 argues that “it is only ‘just’ that [patients] benefit from the sale” (approx. p. 2). He adds that if the healthcare organizations are acting in the best interests of patients, then they should share data profit with the patients. Tlacuilo Fuentes82 notes that patients regularly receive benefits from retail organizations in exchange for uses of their data, such as discounts or free uses apps/services. Klugman81 adds that it is well within a healthcare organizations’ authority to offer additional services or reductions to copays and deductibles when healthcare organizations profit from patient health information.
While it is an admirable goal to provide compensation to patients who knowingly or unknowingly provide their information in a data marketplace, the marketplaces must determine methods of allocating compensation to these individuals. There are various rewards programs granted for the use of data; however, this section will focus exclusively on sharing payments for individuals represented in health data sets purchased and used for research.
Data are often accessed during queries where researchers may simply be attempting to determine study feasibility or compare and contrast data sets.51 In this case, the researcher needs to access an individual’s data to determine whether the data are sufficient without committing to a data set. Should individuals be compensated when data sets are merely sampled?
Biocuration may only be a tiny part of research and development where a profitable product may not result for many years, if at all.83 Many research studies do not have initial funding—much the less profit.26 How should the original data subjects be recognized if a patient provides an early and relatively insignificant contribution to a later project?
This challenge reflects the many parties that contribute to the healthcare data chain.51 Parties involve the healthcare provider who enters the data, the healthcare organization that stores and maintains the electronic health record system or data warehouse, the data broker, or even the data marketplace. Because each party provides data or infrastructure to support data, how should values be divided among these parties?
This challenge describes dividing the value among the individual patients represented in a data set where the data of different people contribute to the data set.51 Should patients with more healthcare visits, therefore contributing more data, be compensated more than patients with fewer healthcare visits? Or should patients with higher health data quality receive more compensation than those with less quality?
Because there are many costs for storing and curating data, should the profits be distributed in the same proportion as the costs?26 Should the costs for resources and capital investments first be declared and quantified before determining how best to distribute the profits? A form of cost-sharing split is to grant free services in exchange for selling the data. For example, PicnicHealth provides a free personal health record app to patients/consumers if these individuals allow their health information to be sold for future research.84 Otherwise, patients/consumers must pay $299 for processing the past medical records and $39 per month.
When individuals receive monetization for participating in a data marketplace, it is also necessary to consider the financial ramifications of withdrawal. Should a company allow an individual to remove their data before costs are recovered if the transaction was in exchange for free genetic sequencing?50 LunaDNA, a blockchain-based DNA marketplace, allows individuals to withdraw consent for subsequent use of their data, but the individual will lose all ownership shares previously granted.85 This arrangement could coerce individuals to allow their data to be used instead of missing out on potential financial benefits.
While some blockchain designers have proposed cryptocurrency-based payments based on decentralized anonymous research networks,86 it may be impractical for research payments in the U.S. to remain anonymous. Payments for participating in research are considered taxable income, and Internal Revenue Service (IRS) Form 1099 must be issued to the participant if payments equal or exceed $600 in a calendar year.87 It is unlikely that monetization payments could reach that amount, but organizations would have to track the identities of the individuals to comply with IRS regulations.88
Even if patient compensation were feasible, Hank Greely, Director of Stanford University’s Center for Law and the Biosciences14 wrote, “as to compensation, figuring out a royalty kind of system seems very hard to me because of the difficulty of assigning cause/contribution to any particular person’s data … and any flat compensation would likely not be very much” (approx. p. 2). When the monetization to individuals is very small, the administrative costs would likely exceed the financial benefit to consumers. Even when using blockchain-based automation, there are costs for data transfers, creating a business model that would be difficult to sustain.
Blockchain-based technologies can offer new methods to protect the privacy of patient-level information.
Zero-knowledge proofs are blockchain-based strategies that allow one party to prove that some statement is true to another party without revealing anything but the truth of the statement.52,89 This technology is particularly effective for performing quality assurance without needing to access patient-level information.
Homomorphic encryption involves encryption methods that allow one to perform calculations on encrypted data that does not allow visibility into raw data. When decrypted, the calculated output is the same as if the operations had been performed on the unencrypted data.90 While promising, homomorphic encryption has not yet achieved widespread adoption.
Federated learning systems share machine learning algorithms or edge training plans without sharing the raw data.91 Organizations can bring analytic tools to the data while protecting individuals’ privacy.92
Some blockchain technologies allow the creation of synthetic data that mask individually identifiable data within a data set.93
Even when using blockchains to manage health data, no technology is impervious to vulnerabilities. Deidentification strategies using encryption may be vulnerable to future computing advances.92 Further, blockchains and smart contracts have been hacked or breached50—even the sizeable public blockchain networks should not be described as entirely immutable.94 Therefore, organizations should remain cautious about data protection for data sharing and sales because there could be considerable unintended consequences for the patients represented in the data sets.
While health data acquisition is often thought to be relatively straightforward—especially when using a data service or marketplace—there are often misconceptions about health data quality. The reality is that health information is not designed for research purposes and can be notoriously inaccurate and incomplete.95 Vezyridis and Timmons26 described that within the National Health Services electronic health record systems, the billing codes used to record the same disease could vary widely between healthcare practices. Additionally, the massive volume of electronic health data sets also affects quality because it is challenging to implement data standards and ranges.25
Health data inaccuracies may require researchers to spend valuable time identifying and eliminating health information that may be of poor quality.83 Worse, researchers may inadvertently use inaccurate health information in research that is not reproducible or may lead to spending limited research funds on projects that are ultimately dead ends.92 Last, the use of deidentified data sets further complicates data quality because a researcher cannot examine the original sources to confirm or correct data values.92 While some may encourage the use of blockchain to address health data accuracy,96 the use of blockchain for inaccurate health information exemplifies the “zero state problem” described by LaPointe and Fishbane.97 The authors note that organizations have not achieved “trusted data” by adding inaccurate information to a blockchain.
Mandl and Perakslis92 point out that it is sadly ironic that patients and healthcare organizations are often unable to obtain health information necessary for treatment, but these same patients may be included in massive data sets that are shared and sold without appropriate monitoring or oversight. For many health data marketplaces, neither patients nor healthcare organizations are given visibility or control of health information sold in data marketplaces.
This article describes how blockchain technologies are increasingly used to manage the transparency and control of health data in data marketplaces. This technology can advance individual patients’ control over their information, monitor access to their data, and control permissions,32 including the ability to revoke permissions.82
As there is a growing interest in managing health data sets as assets, blockchain technologies can improve data valuation and asset management.8 However, there are no uniform approaches to valuation26 or assetization of health information.25 Thus far, most research conducted about data value has focused on the factors that can influence perceived data value.25 However, there is a need to draft effective algorithms that consider data valuation methods, industry sectors, and data. Further, it is unclear how Fair Market Value limitations required for some data sets may influence or educate these algorithms.
Additionally, the scope of research on data marketplaces is not well developed,23 and the research on blockchain-based data valuation and sales is scant. These areas would benefit from additional study to advance concrete methods of data valuation and responsible development of blockchain-based data marketplaces.
Blockchain-based data marketplaces are emerging to provide data management and automate monetization practices. However, minimal research has been conducted involving heavily regulated data, such as health information or data intended for submission to the FDA. There is a great need for determining appropriate blockchain platforms and best practices for data management to create sustainable marketplaces for this emerging area.
While the legal, ethical, and regulatory considerations of health data ownership are outside the scope of this paper, additional research about data ownership may advance understanding about authority to control and value health information.81
Similarly, there is an additional need for understanding whether blockchain-based NFTs can establish datasets as assets and whether NFTs can enhance concepts of data ownership, data control, and value. Specifically, could NFTs demonstrate the value of intangible data assets and/or exclusivity of these data assets?
The sales and exchange of health information grow significantly more extensive and diverse as data can be collected from electronic health record systems and patient-generated data from wearables and wellness apps. The need for sales and exchange is bolstered by the need for real-world data within life sciences organizations. The combination of health data volumes and needs creates a new data economy82 and opportunities to better assess data value for these economic opportunities.
Health data assets have been managed by complex and outdated methods where patients do not have awareness or control of the uses of their health information. However, as patients are increasingly empowered with greater electronic access to their health information—and privacy regulations have enabled individuals to have more information and control over data uses—blockchain-based data management systems will serve as an enabling technology. This technology allows patients opportunities for granular consent and greater visibility into the uses of their health information. While mechanisms of data monetization and assetization are still being developed, blockchain technology is a critical tool for maximizing the potential for the emerging health data economy.
Both authors contributed to the conception, design, writing, and editing of this article.
The authors gratefully acknowledge the thoughtful review, editing, and graphic support provided by Leanne Johnson and Hayley Miller.
Copyright Ownership: This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, adapt, enhance this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0.