The Icelandic biotech firm deCODE Genetics has pioneered a means of determining an individual’s susceptibility to various medical conditions with 99 percent accuracy by gathering information about that person’s relatives, including their medical and genealogical records. Of course, inferences have long been made about a person’s health by observing and gathering information about her relatives. What is unique about deCODE’s approach in Iceland is that the company uses the detailed genealogical records available in that country in order to estimate genotypes of close relatives of individuals who volunteered to participate in research, and extrapolates this information in order to make inferences about hundreds of thousands of living and deceased Icelanders who have not consented to participate in deCODE’s studies. DeCODE’s technique is particularly effective in Iceland, a small island nation that, due to its largely consanguineous population and detailed genealogical records, lends itself particularly well to genetic research.
While Iceland’s detailed genealogical records enable the widespread use of estimated data in Iceland, a large enough U.S. database could be used to make similar inferences about individuals here. While the U.S. lacks a national database similar to Iceland’s, private companies such as 23andme and Ancestry.com have created rough gene maps of several million people, and the National Institutes of Health plans to spend millions of dollars in the coming years sequencing full genome data on tens of thousands of people. These databases could allow the development of estimated data on countless U.S. citizens.
DeCODE plans to use its estimated data for an even bolder new study in Iceland. Having imputed the genotypes of close relatives of volunteers whose DNA had been fully catalogued, deCODE intends to collaborate with Iceland’s National Hospital to link these relatives, without their informed consent, to some of their hospital records, such a surgery codes and prescriptions. When the Icelandic Data Protection Authority (DPA) nixed deCODE’s initial plan, deCODE agreed that it will generate for only a brief period a genetic imputation for those who have not consented, and then delete that imputation from the database. The only accessible data would be statistical results, which would not be traceable to individuals.
Are the individuals from whom estimated data is gathered entitled to informed consent, given that their data will be used for research, even if the data is putatively unidentifiable? In the U.S., consideration of this question must take into account not only the need for privacy enshrined in the federal law of informed consent, but also the right of autonomy, which empowers individuals to decline to participate in research. Although estimated DNA sequences, unlike directly measured sequences, are not very accurate at the individual level, but rather at the group level, individuals may nevertheless object to research participation for moral, ethical, and other reasons. A competing principle, however, is beneficence, and any impediment to deCODE using its estimated data can represent a lost opportunity for the complex disease genetics community.
The solution deCODE proposed to the DPA, to delete the individual-level imputation, is premised upon the idea that individuals whose data have been de-identified do not face a serious risk of re-identification, a notion that has been called into question. Indeed, it is the risk of re-identification that animates the September 8, 2015 Notice of Proposed Rulemaking (NPRM) published by the U.S. Department of Health and Human Services in the Federal Register, which proposes a change to the Common Rule, the federal policy for protecting human research subjects. While the NPRM does not address the need for informed consent for estimated data, dealing instead with biospecimens and personal information, it does cast significant doubt on the ability to achieve de-identification of biospecimens and personal information gathered for research purposes.
The NPRM would treat biospecimens as intrinsically identifiable because of the information imbedded in them, and would therefore require informed consent for research using all existing biospecimens, whether clinical or from prior research, even those that have been stripped of identifiers. Currently, U.S. law does not require informed consent for de-identified biospecimens and personal information. This rule change would apply prospectively to biospecimens collected after the effective date of the new rules.
Clearly, the NPRM contemplates that the use of even de-identified biospecimens raises privacy concerns, given researchers’ ability to extract identifying information from them. A broad reading of the NPRM would suggest that the use of estimated data to extrapolate information about individuals gives rise to the need for informed consent, so as to protect the privacy of those who have not expressly agreed to participate in research. The 2014 NIH Genomic Data Sharing Policy (GDS) goes even further than the NPRM by protecting genomic data itself, stating the expectation that investigators will obtain obtain consent for genomic and phenotypic data to be used in future research, even if the source of the data itself is not identified.
The NPRM and NIH GDS comport with public opinion, according to survey data indicating that the public does not recognize the regulatory distinction between identifiable and nonidentified samples and information. When asked how important it is for them to be informed that research would be performed on their samples, 72 percent of respondents in one survey stated that it was moderately to very important for them to be informed that research would be performed on their samples when the data was anonymous, versus 81 percent who felt that it was moderately to very important for them to be informed that research would be performed on their samples when the data was identifiable. Of those who wanted to be informed about either or both scenarios, as many as 57 percent would require their permission to be sought before their samples were used.
Even absent concern about re-identification, individuals sometimes decline on ethical, religious, or other personal grounds to participate in certain controversial forms of research, such as somatic nuclear cell transfer, stem cell research, and germ-line gene therapy. In addition, they may object to commercial exploitation of discoveries developed through the use of their non-identified data. Finally, public trust and participation in research is enhanced when potential research subjects feel secure in knowing they will be consulted before their biospecimens and information are used.
Should the NPRM be implemented, it is hard to argue that there is any meaningful distinction between nonidentified data and estimated data in terms of the need for informed consent. Neither kind of data requires any direct interaction with the individual about whom data is gathered. Both are subject to re-identification. The primary difference is that estimated data are not accurate at the individual level, but only at the group level. While this fact may adequately address the privacy issue, it does not resolve the matter of autonomy. For this reason, individuals are entitled to informed consent for the use of their nonidentified and estimated data alike. The Common Rule clearly supports informed consent for individuals involved in biomedical research, particularly when they have a reasonable expectation of privacy regarding the tools used to estimate their data, namely their medical and hospital records.
Donna M. Gitter is a professor of law at Baruch College, City University of New York, in New York, NY. This commentary is based on a presentation she gave at The Petrie-Flom Center’s 2016 Annual Conference: Big Data, Health Law, and Bioethics.