Generating data from patients has been a long-standing phenomenon in health care. In the early days of medicine, doctors took detailed notes about their patients to track their health and progress over time. Now, technology enables a different kind of data capture, with untold volumes of data flowing into electronic health record systems, digital management systems for images or gene sequences, and billing systems that integrate clinical and financial data. All of these tools make it easier for healthcare providers to collect, analyze, and synthesize patient data, improve their clinical decision making, and better understand each patient’s health. Data enables tracking of treatment outcomes and provision of personalized care. By maximizing the use of data, clinical decision-makers are able to provide the highest quality of care based on the most complete information.

I have spent my career focused on understanding how we can use the capabilities of technology to extend the reach of clinical experts. There are many places where this is happening now, but the extent of our capabilities in gathering, analyzing, and using a variety of patient data has drastically increased the possibilities in recent years. For example, my current work at Catalia Health leverages psychological modeling of patients to better convey information back to those patients. While the core programs are physician-prescribed treatment plans, the psychological approach is a unique, effective way of scaling care for patients. In this way, we can use technology and data to multiply the impact of clinicians, personalize treatments for patients, and, as much as possible, optimize for improving health outcomes and the efficiency of care delivery.

Despite these good intentions, however, patient-generated data is used for more than just helping healthcare providers deliver better care. Some of the largest companies in the healthcare industry—McKesson, United HealthCare’s Optum division, Verily (Google/Alphabet’s health company), and Flatiron (acquired by Roche for $1.9B)—are profiting handsomely by collecting, aggregating, and selling patient data to pharmaceutical companies and others. By doing so, these companies can provide valuable insights into patient behavior, health outcomes, and patterns of care, allowing life sciences firms to develop products and services that better meet the needs of patients. But companies can also use this data to create detailed profiles of individuals and to target them with marketing messages or personalized offers, a controversial (but effective) way to reach and engage customers. Patient data, on the whole, is quite valuable.

Given this value, are we looking at patient-generated data from the optimal economic perspective? How do we know how accurate it is, how representative it is of a “typical” patient population, or how much it can truly tell us about patient behavior or outcomes? Do we have a full understanding of how companies are using patient-generated data to target potential customers with marketing messages or offers? Despite the opportunities, what risks do we encounter when we handle patient-generated data, and what are our responsibilities in how that data is used? How do we know whether all of this data aggregation and sharing has actually resulted in any discoveries, benefits, or efficiencies?

We know that patient data that is longitudinal, complete, and accurate has enormous value in high-return areas such as the development of new drugs and therapeutics, the identification of sub-populations most likely to benefit from targeted or molecularly-specific therapies, or, as we have found at Catalia Health, in designing tailored programs that help patients with chronic diseases stay on beneficial therapies and maintain communication with their healthcare team. Curiously, some of the aspects of these patient-specific datasets that are most insightful in terms of optimizing health outcomes for the “long tail” of patients at both ends of a bell curve are also the aspects that are lost when a dataset is aggregated, de-identified, “cleaned”, and homogenized for external sale and consumption. Likewise, when data is aggregated and sold on, it loses its connection to the patient, and therefore loses the opportunity to be further connected to future outcomes. Current healthcare data markets may be missing this point, sacrificing the high value of identifiable, patient-linked, opt-in data for the high volume and easy grab of an aggregated data pool that might offer no discernable insight beyond hypothesis.

Patients are becoming more aware of their rights, increasing their desire to access and own their data, and are simultaneously becoming more skeptical of how the entities they trust as health data stewards—hospitals, electronic health record companies, and health technology companies—might use, safeguard, or fail to protect it. From the perspective of a company that helps patients generate data and then uses that to improve their healthcare experience and outcomes, we know the power of using patient data responsibly. For those who may be playing fast and loose with patient data or claiming to practice privacy while engaging in commercial data sharing without patient knowledge or consent, patients may start standing up to your business model and demanding their share—or, at the very least, your transparency.

By sharing their data, patients know that they can help advance medical research and improve healthcare outcomes for themselves and for other patients like them, but they also want to know that their data is protected, and that if there are commercial gains to be made, that they might receive some consideration. The use of patient-generated data is an important tool for healthcare providers and companies alike, allowing them to better understand their patients and develop better treatments and services tailored to their needs. While the use of patient-generated data is often controversial, it can be a powerful tool for improving healthcare outcomes when used responsibly.