Whose Data Is It Anyway?


Again, I ask—Whose data is it, anyway?

A look back to Project Nightingale, the privacy of health care data, and why this remains a big issue.

In November of 2019, The Wall Street Journal broke the news that Google was working with Ascension Health, one of the largest hospital systems in America, to aggregate and analyze patient data across their vast health care delivery system. The name of this initiative was “Project Nightingale,” and at the time, I published this piece almost certain that the public outcry over this type of data sharing would tip the scales towards a patient-led data economy in health care.

For those unfamiliar with the players, Ascension is a health care system owning hospitals and physician practices in over 20 states. Ascension is also the largest Catholic health system in the world. It’s also the largest not-for-profit health system in the country, although you wouldn’t know that looking at executive administrator salaries—its CEO is paid more than $18M a year. Google is, well, Google. Both of these companies are behemoths.

In the Project Nightingale deal, neither Ascension’s patients nor its clinical professionals were informed that their data would be shared with, or potentially utilized by, Google. Current data privacy laws, including the frequently misunderstood and mis-cited HIPAA, do not require that.

Reports at the time of the WSJ’s exposé stated that the shared data included patient names and other identifiable information. But even when health data is “blinded”, “anonymized,” or “de-identified,” it is now easier than ever to unmask the person behind it, particularly now that so many electronic health records contain information such as genomic or molecular data that is specific to only one individual in the world (unless you are an identical twin). The most valuable data for purposes of drug discovery and development, clinical trial recruitment, and molecular discovery is also the data that is the highest risk for privacy breaches—life sciences companies are hungry to know about patients who have rare diseases, difficult-to-treat tumors with “undruggable” mutations, and patient populations who have been left out of the past few decades’ largest advances.

Basically, data coming from the very vulnerable is also very valuable.

Ascension is so large that many of its facilities are not explicitly branded as Ascension. It has acquired so many physician practices and community hospitals that chances are very good that a random reader of this article has been seen by a clinician at an Ascension-owned facility without realizing they were participating in a global conglomerate.

Some of your health care data may have been swept up by this project, slightly more than three years ago, and this may be the first time you are hearing about it. Is that okay with you?

Wait, was this legal? What about HIPAA? Or even if it was legal in 2019, certainly it isn’t now, right? What about the 21st Century Cures Act, or CCPA, or GDPR?

Before you shout: “But doesn’t HIPAA, or some other patient data privacy law, prevent this?”, the answer is not really.

HIPAA (the Health Insurance Portability and Accountability Act, enacted in 1996) is perhaps one of the most poorly understood American health care laws; even many employees of health systems and medical offices don’t really understand its purpose and what it prohibits or allows.

The “P” in the law does not stand for privacy. HIPAA does govern several aspects of patient data privacy, but one of these provisions allows “covered entities” (such as a hospital or medical office subject to the law) to share data to Business Associates — which are companies that may help the covered entity carry out its health care functions.

Ascension and Google assert that Google, as a Business Associate of Ascension, can access the data to help Ascension carry out its functions of taking care of patients, as well as its business objectives (these could be as broad as lowering health care costs, or managing insurance and bills). This is likely a correct interpretation of the law, and the US Department of Health and Human Services (HHS) has already opened an inquiry to determine whether the law was interpreted correctly in this case and whether Google and Ascension have done anything untoward.

Consumers, patients, and clinical professionals may not realize that health systems frequently share data with all manner of Business Associate entities, from electronic health record (EHR) companies and analytics companies to consulting firms and collections agencies. Hospitals and medical offices aren’t in the practice of employing data analysts or statisticians, so there are strong reasons for why this exception to data privacy should exist.

Despite all of this, patients are rarely, if ever, informed that their data may be traveling into the hands of these different entities. Furthermore, even though these Business Associate entities are also bound to protect the data, it’s frequently difficult to determine whether their systems might have been breached or compromised. Data that Business Associates touch does not have to be anonymized; they might have access to everything.

Seriously? HIPAA doesn’t cover this?

Keep in mind that HIPAA was drafted in the mid-90s. That was long before many of today’s issues in technology, surveillance, and data privacy—from wearables to consumer-available genetic testing to cloud-stored data—even existed. At the time, a hospital transferring data to Business Associates likely meant operating within a closed network, or possibly via floppy disks and flash drives.

Before the days of Big Tech and consumer data aggregation, when HIPAA was put into action, it would have been very hard to imagine what the current data world would look like. After all, most hospitals and medical offices ran on paper records until the 2010s. It took until October of 2022 for a new law, the 21st Century Cures Act, to dictate that patients should have free, unfettered, digital access to their own records. As reported elsewhere in this issue by cancer survivor and medtech founder Samira Daswani, she had to drive miles to a hospital to collect part of a record on a CD-ROM within very recent memory.

Does HIPAA restrict what Business Associates can do with data? What are the implications of this arrangement and its legality?

HIPAA doesn’t give Business Associates a free rein. This is where Google could, hypothetically, tiptoe into rough waters if they don’t get a clear legal opinion, and health care lawyers may disagree on what Google can do with the data. The issues at hand cover not only what HIPAA intends (or intended, in an age prior to AI and machine learning), but also how patient data itself might be viewed. Is this data, in and of itself, a valuable and proprietary asset that patients and their clinicians have generated and should own rights to? Or are patient records and health outcomes data merely the “exhaust” generated by the day-to-day work that hospitals and doctors do to take care of patients?

The usual interpretation of HIPAA is that Business Associates are allowed to use the shared data only for business purposes for the covered entity. Translated for this case, that would mean that Google can use the data to do things for Ascension’s usual business and health care purposes.

The law also is typically interpreted to say that the Business Associate can’t use the data for independent business purposes. If taking this interpretation, Google would need to firewall this data within its organization so that it can only be used for Ascension.

What types of other activities might Google want to use the data for, but might not be able to under HIPAA restrictions? (The WSJ produced this handy video on why Big Tech may want your health care records.)

Google may not be able to legally use data from Ascension’s patients to train Artificial Intelligence (AI) or Machine Learning (ML) algorithms that might then be ported beyond Ascension’s systems. If AI algorithms were developed based on Ascension patient data, those would potentially have to remain specific to Ascension. Given new revelations and controversy about Google’s work at the intersection of medicine and AI, it’s impossible to know whether this is the case.

Likewise debatable would be whether Google could widely apply their learnings gathered through Nightingale on other initiatives within Google or via its other health-related subsidiaries, such as Calico or Verily. Google also works with a number of other health care systems, including Mayo Clinic; in an ideal world, it might want to be able to scale learnings from any one of these systems across all of its health care customers, but if those insights and learnings are based on HIPAA-controlled data, it may be prohibited from doing so.

Lastly, Google may have a desire to link health care data from Ascension or other parties to its rich trove of individual data gathered through its many other products. Other Big Tech players such as Facebook have already been questioned about how health care related data that users assume is private might be connected to other data and leveraged for advertising or monetization. While the insights and learning potential of linking health data to consumer data could be transformative, the issues around privacy, consent, and the clinical decision-making implications of relying on such data are thorny indeed.

So what ended up happening with this project?

It’s still going on.

In October of 2022, Google made new announcements around its software for medical imaging and AI applications. Were those machines trained on Ascension patient data? We don’t know. If so, and if those software programs end up profiting millions for Google, will Ascension patients or providers see any upside, considering that their work and bodies generated the data that fed the machines? Fat chance.

Google isn’t stopping there. ProPublica, less than a month ago, revealed that Google has been working for years to access biobank materials from the US Department of Defense, including more than 31 million tissue blocks. What the ProPublica article doesn’t mention is that with new next-generation sequencing technology, it may very well be possible to extract significant amounts of exome data from tissue even if it’s relatively old, or in other words, Google could get their hands on DNA sequences that are by definition unique to individuals and therefore impossible to anonymize in any true sense of the word.

But isn’t all of this data necessary for scientific advancement? We all want more cures.

In concept, using high-quality, real-world data to power robust insights around health care delivery and medical discovery is undeniably positive. Although most of the interventions prescribed in health care are informed by clinical trials, these are typically performed in relatively controlled environments among pre-specified populations, and as a result, the outcomes may not reflect “real world” circumstances. Working with Real World Evidence (RWE) is a key initiative in life sciences research; clinicians, researchers, and patients would like to understand how things work “in the wild,” not just in the lab or the hospital room.

At the same time, data held in electronic health records (EHRs) is usually not complete enough, or accurate enough, or high-quality enough to generate scientifically valid conclusions. At best, it generates new hypotheses that still need to be tested out in organized, controlled studies.

Although we picked on Google here, they are far from the only company frantically trying to acquire and aggregate health care data, whether from health system records or from biobanks. They understand that this data can be used to feed the insatiable data appetites of AI, to develop new medical devices that can yield profits, and, yes, to improve people’s health. Right now, they may be tiptoeing just inside the line of data privacy laws, at least in the US.

Patients and medical professionals, however, are unlikely to stand for the status quo here. For instance, if patient data is going to be monetized, patient advocates would assert that some returns should accrue back to the patients whose data made such revenues possible. Increasingly, patient advocate groups are looking at legal remedies. Scripps Health was forced to pay patients $3.5M to settle a suit after a data privacy breach. Other attempts to litigate health data privacy have been less successful. Reading the tea leaves, genetics companies such as Invitae are taking a proactive approach, publishing reports on how they use identifiable data.

In an era where health systems can potentially profit off of patient data while simultaneously pursuing legal action against patients who can’t pay their medical bills, advocates are clamoring for equilibrium. These sorts of revenue-generating tactics translate into especially poor optics for non-profit or faith-based systems such as Ascension.

What patients want is not unachievable. We simply want to know how our data is being used and to maintain some degree of control of that data, including the right to completely remove our data from a health care system if we lack confidence in their care or in their data security practices. This issue has been exacerbated at a time where so many companies, particularly in the tech sector, have built business models on aggregating consumer data, and are now moving into the health care space, where the data points describe our most private and personal matters.

Studies evaluating patients’ willingness to share data indicate positive attitudes towards sharing; by and large, patients are willing to share if they think the data will contribute to new scientific discoveries, improvements in the health care system, or lower health care costs—even if those advancements will not benefit the specific patient, but rather society at large or patient groups with the same disease. Patients are more hesitant to share if their privacy could be compromised, or if the data will be used for commercial purposes.

We’re in a weird time when it comes to data ownership, privacy, aggregation, and control. Some would even say there’s a secret war going on for your data. On one side stand technology and information giants whose entire business models are predicated on the ability to aggregate personal data and monetize it. On the other side stand advocates for data privacy and ownership, regulations such as the EU’s GDPR (General Data Protection Regulation) and California’s CCPA (California Consumer Privacy Act), and companies or projects building new tools to put data tracking, control, and monetization in the hands of individuals and data generators instead of data aggregators, embracing new technologies such as distributed ledgers, blockchains, and Web3.

Only one side of the fray represents patients.