The health care data economy is broken. Where are the banks?

Many of us have more firsthand experience navigating the health care system than we might want to; few of us have had opportunities to see it from the inside and begin to understand the labyrinthine manner in which it operates. To those in tech, most of the systems and processes in health care would be what they call a “kludge”—an ill-assorted collection of parts assembled to function, somewhat, towards a purpose. If that word is unfamiliar to you, you may want to consider a Rube Goldberg device, trying to achieve one thing (improved health outcomes) through an overly complicated chain of somewhat indirect actions. Nothing in this system was created from a patient-centric, design-oriented perspective.

For my part, not only have I been inside the system as a cancer patient, but I’ve also worked as a health policy wonk in the halls of Washington and Westminster, a pharma company health economist for companies in Basel, Buenos Aires, and Burlingame, and a consultant helping commercialize cancer drugs and diagnostics across the US. I study Evidence-Based Health Care with one of the top health data and statistics research units in the world at the University of Oxford. Along the way, I’ve seen a LOT of data. I’ve also seen how this data is generated, gathered, aggregated, analyzed, shared, and resold. My reactions to what I’ve seen have gone from curiosity, to surprise, to anger, and now to advocacy. The health system at large has failed patients in so many ways, and now here is another one—patients are absolutely, categorically being seriously exploited in terms of their data.

Data has often been referred to as the “new oil,” a valuable resource driving today’s economy. Just as money is a medium of exchange that allows us to buy and sell goods and services, data is a medium that allows us to make informed decisions, optimize processes, and gain insights. Just like money, data has value and can be traded, sold, or shared. However, unlike money, no global infrastructure exists to govern, manage, and watch data transactions, and no collective institutions exist to protect data security or integrity.

When it comes to health care data, we have a wild west scenario governed by permissive regulations that do more to enable data sharing and transfer among entrenched parties than they do to protect the privacy and security interests of patients. A similar structure, enabling “data monopolies” to operate and profit unfettered while “data laborers” receive no compensation for their contributions would be completely unacceptable, antiquated, and horrifyingly exploitative if we were dealing with money. Let’s take one example from today’s health care data economy and consider whether we would accept this according to the standards we apply to modern finance, labor rights, and banking security.

The health care landscape is dotted with companies that make a significant business model out of data aggregation and resale. Flatiron, for example, sold to Roche for $1.9B in 2019 just as this model was beginning to take off. To some extent, many big names in health care have data sales as a component of their business models: McKesson, Cerner, Epic, UnitedHealth/Optum, as well as more niche companies like Komodo, Symphony Health, and IQVIA. These data companies often keep their methods and sources of data aggregation somewhat hidden, each touting that they capture data on a large portion of the American population through EHRs or insurance claims datasets, but not acknowledging that many of the datasets are essentially covering pieces of the same populations with numerous gaps and overlaps, and no way to verify the provenance, accuracy, or quality of the data they contain.

Some companies in this space, such as Healthverity and Datavant, go further than offering large datasets, offering to “tokenize” datasets in order to link them together without revealing patient identities, giving data purchasers more complete records of patients when the patient sought services across numerous hospitals or doctors. These techniques may meet the letter of the law when it comes to regulations such as HIPAA, but fall far short of meeting ethical standards for data privacy and patient expectations when they have a doctor’s visit or have labwork done. Would consumers feel comfortable if third parties could string together all of their financial and spending information across different credit cards, bank accounts, and brokerage accounts without their knowledge, and then use this data to profit? Health data is even more private, and its use is far more risky.

As an example, consider a company—we’ll call it Company A—that has a dataset containing genomic and molecular data, including germline sequencing data, which is a person’s unique DNA sequence. This company, which could be a genomic sequencing lab or diagnostic company, or even a non-medical company that a person paid to sequence their genome, knows exactly who the individual patient is. Now, Company A likely has positive intentions, such as developing new therapies for personalized medicine. Using a tokenization service such as the one sold by Datavant, Company A can take their purchase of an “anonymized” dataset from a company such as Komodo or Symphony Health that contains claims data, EHR data, or any number of additional items that Company A’s data doesn’t contain, and they can now link the datasets together. Now, Company A sees not only the results of their sequencing, but can string together that data with data on which drugs the patient had, whether they had surgeries, the patient’s age, maybe what hospital system they were treated at, and other similar pieces of information. The patient’s name and exact birthdate might reside in a different database than this linked database, but they share information that could be crosswalked and potentially used to re-identify other datasets.

Now—of course there are amazingly good things that can come out of this combined data, and I’m not advocating that this research stop or even slow down because of the major, positive developments coming out of these research methods and the usefulness of comprehensive datasets when it comes to advancing cancer treatments, AI predictive models, and similar.

The issues I have are twofold.

First, for the most part, patients have absolutely no idea that this type of combining and resale is happening with their data. They have little to no recourse, except perhaps lawsuits, if there is a data breach or if somehow they are accidentally de-anonymized or re-identified, or if that data is used for a targeted or nefarious purpose. There are some proactive remedies to demand transparency via HIPAA that I have been working on with patient advocate groups, as well as the potential for class action lawsuits when patients believe data has been used improperly, but these are long-term, high-effort, and unscalable types of approaches that are unlikely to drive immediate change.

Second, and a much larger issue, concerns the economics and value of that data. If Flatiron could sell their business for $1.9B, how much of that value was actually generated by the cancer patients whose data made up Flatiron’s business (and how many of those same patients had their data sold by Flatiron to one of the world’s largest pharmaceutical companies while at the same time they were struggling to pay for their cancer treatments?) We can consider a genetic and molecular data scenario, whether germline data (what you are born with) or somatic mutation data (your body creates a tumor that is valuable for research, although very large negative value to your health). If that data is used to make a biotech discovery resulting in billions of dollars, shouldn’t the patient share in that upside? Right now, patients receive zero. In fact, they may even receive negative value, with a company profiting from patient data on one hand while, on the other hand, billing the patient for a service. I would propose that we’re not very far from coming across one, or thousands, of “Data Henrietta Lacks,” where digital representations of patient cells or tissue become a cornerstone for research, but the human who originated that data receives no compensation and only posthumous recognition.

The present-day data economics of health data are staggeringly unjust.

These two issues—transparency and economics—go hand in hand. Patients are just beginning to be aware of the volume of data that they generate, thanks to the 21st Century Cures Act’s rules around digital access, apps like Epic’s MyChart, and similar. The data volume and access discussion is moving into the mainstream of patient advocacy discourse. As we realize how much health data we all generate, the conversation can evolve from volume to value—not only “who can access this?” and “how is this being used?”, which are transparency questions, but also “how much is this worth?”, which is the economic question. Once people figure out that their data is being used for profit, they will want a share. Research shows that people with various health concerns, especially in advocacy-rich communities like cancer patients, are happy to share data if they think it can provide a health benefit to themselves or to others like them. Patients are very altruistic in this regard. But that altruism stops when they feel that they may be exploited by companies profiting from data and not sharing the return.

Fortunately, we’ve mostly solved this problem as a society when it comes to another very valuable, very private, very sensitive topic—money. The best analogy we can consider here is a bank. Most people are fairly familiar with the utility they derive from banking: the ability to keep money safe, where they still own and control it, but someone else helps take the responsibility of safeguarding it, finding opportunities for it to be put to use, maybe paying interest on it, making sure that it isn’t transferred or moved around without the owner’s knowledge and consent, and similar. If you want to, you can take your money out of a bank and move it to a different one, or even just hide it under your mattress. We need the same institutions and functionality for data, especially data as important, valuable, private, and personal as health data.

When we apply this banking analogy to personal data, it becomes nearly absurd to consider that, if data really is money, we are letting others scoop up all of our data—that digital representation of our real world work and personal energy expenditures—and then use it to profit without giving us a fair share. Patients are being literally exploited, probably along with health care workers. It’s horrifying to consider that the most vulnerable patients, in this case, are the most valuable to exploit—those with rare, difficult-to-treat, burdensome diseases.

For example, banks and other financial institutions serve as trusted intermediaries that hold and manage our money, and they are regulated by government agencies to ensure that they follow ethical and legal standards. New technologies give us the ability to do the same with data, helping ensure ethical and responsible utilization as well as distribution of profits that includes not only data brokers or curators, but also data generators.

The health care data economy of the future will be driven by what patients and physicians demand in terms of being recognized for their contributions to data, discovery, and business. Institutions that can provide the vaults, accounts, transfer protocols, and business opportunities that maintain ownership, provenance, and integrity so that data generators share in the upside will come to the forefront; data aggregator companies that don’t recognize patients’ economic interests or provide transparency on data utilization will face a reckoning, whether by the courts, or the courts of public opinion. We have a lot to learn from financial institutions on how to implement this model, but the framework is there for adoption. The future of health data is banking.