Consumer Data Privacy Codes of Conduct
Recommendations for the Department of Commerce Multi-Stakeholder Process
This is an excerpt from the full comments (pdf).
The National Telecommunications and Information Administration, or NTIA, has asked for comments on what issues should be addressed through a privacy multistakeholder process. Based on my experience in privacy law and policy, I believe an early and prominent candidate should be the definition of what counts as “de-identified” information. As discussed below this topic has multiple advantages, including heightened protection for consumers, positive effects on innovation and the broader economy, and likelihood of concrete, enforceable success for the process itself.
These comments provide background for the discussion and then explain the importance of the topic of de-identified data. The comments explain how the recent Federal Trade Commission privacy report provides a new and useful set of proposals for how to handle de-identified data, and concludes with an analysis of why the topic of de-identified data is a good candidate for early consideration in a multistakeholder process.
As background for these comments, I am the C. William O’Neill Professor of Law at the Moritz College of Law of the Ohio State University, and Senior Fellow at the Center for American Progress Action Fund and the Future of Privacy Forum. Under President Bill Clinton I served as chief counselor for privacy in the U.S. Office of Management and Budget. Under President Barack Obama I was special assistant to the president for economic policy in 2009 and 2010. Further information is available at www.peterswire.net.
This February the administration issued its white paper, “Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy.” This privacy framework defined a Consumer Privacy Bill of Rights. To implement this bill of rights, the framework called on the Department of Commerce to foster the development of enforceable codes of conduct for consumer privacy. These codes of conduct will be developed through multistakeholder processes, so that the range of relevant stakeholders can convene and develop codes of conduct even in the absence of binding legislation or regulation. Consumer privacy legislation has been difficult to enact in the United States, so consumer protection will advance more quickly through initiatives, such as the multistakeholder process, that do not depend on passage of such legislation.
The importance of de-identified data
The title of the administration’s white paper reflects two principal goals for policy concerning the data of individual consumers: “A Framework for Protecting Privacy and Promoting Innovation.” This title reflects the risks to individuals if privacy is not protected effectively. It also reflects the importance of creating good information rules in order to foster innovation and growth in our information economy.
The issue of de-identified data creates a vital opportunity to meet both goals—use data for innovation and growth while also protecting privacy. At least in theory, de-identified data allows us to have our cake and eat it, too. With de-identified data, we strip out the name and other information that reveals identity, but we nonetheless can process the data, do research, discover patterns, and innovate in how we respond to the information.
In any statute or other legal obligation, such as a company’s enforceable promise to protect privacy, the most important definition is what counts as covered by the law or obligation. Defining what counts as “de-identified” is crucial because it draws the line between what data is covered by privacy protections (still “identified”) and what data is not (“de-identified”).
In U.S. law de-identified data was first defined as part of the Health Insurance Portability and Accountability Act, or HIPAA, medical privacy rule drafted in the late 1990s. I was very involved in drafting the proposed and final HIPAA rule and paid particular attention to defining what counted as “de-identified.” In HIPAA “identified” data is considered personal health information, subject to the full range of privacy protections. If the data is scrubbed hard enough, however, then it becomes de-identified data and no longer subject to the regulatory requirements.
The final HIPAA medical privacy rule provided two ways to show that data was de-identified. First, the holder of the data could remove a list of at least 17 data fields that could identify a person, such as name, address, or Social Security number. Second, a statistical expert could certify that the risk is very small that the information could be used, alone or in combination with other reasonable available information, to re-identify the individual. Since HIPAA went into effect nearly a decade ago, health care entities have been able to publicly release health data if it has been scrubbed well enough to meet the regulatory requirements for de-identification.
Finding a Goldilocks solution for de-identified data
Since the HIPAA de-identification provisions were proposed in 1999, we have learned a lot about when and how it is possible to “re-identify” data—to link a person’s name with the supposedly de-identified data. Two big trends have made it harder to keep information de-identified. First, search on the Web has gotten much better. Google was not incorporated until 1998, and today’s search engines let anyone link together tidbits from previously hard-to-link data sources. Second, the amount of information on the Web about a typical person has grown astronomically, including all of the personal details on a person’s blog or Facebook page.
The combination of efficient search tools and lots of data means that there is a higher likelihood today that a person’s medical or other records can be re-identified even if the name and other traditional identifiers are deleted. For instance, the de-identified medical record might state that a person in Ohio had minor hand surgery on April 3. In the past, it would have been difficult or impossible for an outsider to figure out the name. Today, online search might turn up a social network thread about the hand surgery—there are multiple such surgeries in Ohio each day, but not that many. A bit of follow-up research, using the rest of the supposedly de-identified information, might easily pinpoint the person who had the surgery.
As academics have analyzed these facts about re-identification, some have concluded that the entire effort to de-identify data has failed, because of the risk of linking information back to the individual. Others have emphasized the limited actual success of re-identification efforts in practice, and found that the benefits to research and innovation are so great that they outweigh the privacy risks.
The preliminary FTC report, issued in 2010, received strong criticisms from both of these perspectives. The earlier report would have applied privacy protections to “consumer data that can be reasonably linked to a specific consumer, computer, or other device.” The debate centered on what the FTC meant by “reasonably linked.” Consumer groups correctly emphasized that it is easier now to search on the Web and re-identify data, at risk to privacy. Researchers and other users of data focused on the problems that come with an over-broad definition of “reasonably linked,” which could extend privacy rules to an almost unlimited range of data processing, if enough effort is put into tracking down and re-identifying data.
Responding to these critiques, the FTC looked at the technical de-identification issues, and found what I believe is a Goldilocks solution for the problem of de-identified data. The FTC provides what amounts to a safe harbor where: “(1) a given data set is not reasonably identifiable; (2) the company publicly commits not to re-identify it, and (3) the company requires any downstream users of the data to keep it in de-identified form.”
The FTC approach responds to the technical experts who correctly say that it is easier today to find data on the Web that helps us re-identify data. To address the privacy concerns the FTC approach first requires a company to make a data set reasonably de-identified. We can think of this as “good but not foolproof de-identification.” Then, in addition, the FTC requires administrative protections. The company has to commit publicly that it won’t re-identify the data. The company also has to get similar promises from anybody downstream who receives the data. These promises are enforceable because Section 5 of the FTC Act prohibits deceptive practices, such as broken privacy promises. Privacy is protected through the combination of technical measures, having reasonably de-identified data, and backup administrative measures, so that the only people who receive the data have made binding promises not to re-identify.
The FTC approach also responds to those who want to study data for research, innovation, and related purposes. Data must be scrubbed pretty hard but not incredibly hard—the dataset need merely not be “reasonably identifiable.” That data should still often be detailed enough to be useful for a variety of purposes, protected by the enforceable promises not to re-identify.
I have long believed that technical controls alone are not enough to protect consumers against possible re-identification, as shown in a 2009 report by the Center for Democracy and Technology and my December talk on de-identified data. The best path is to have reasonably strong technical protections, supplemented by the sorts of enforceable promises that the FTC report supports.
You can read more about why defining de-identified data is a good fit for the multistakeholder process in the full comments at the Center for American Progress Action Fund. Peter Swire is the C. William O’Neill Professor of Law at Moritz College of Law at the Ohio State University, and Senior Fellow at the Center for American Progress Action Fund.
The White House, “Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy” (2012), available at http:// www.whitehouse.gov/sites/default/files/privacy- final.pdf.
Federal Trade Commission, “Protecting Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers” (2012), available at http://www.ftc.gov/opa/2012/03/privacyframework.shtm.
Peter Swire, “FTC Deserves Praise for Its De-Identification Safe Harbor,” Future of Privacy, March 26, 2012, available at http://www.futureofprivacy.org/2012/03/26/fpf-senior-fellow-peter-swire-ftc-deserves-praise-for-its-de-identification-safe-harbor/.
Paul Ohm, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization” UCLA Law Review 57 (1701) (2010), available at http://ssrn.com/abstract=1450006.
Jane Yakowitz, “Tragedy of the Data Commons,” Harvard Journal of Law and Technology 25 (2011), available at http://ssrn.com/abstract=1789749.
Ed Felton, chief technology officer of the FTC, listed de-identification as the top issue of “special interest to techies” in the FTC report. Ed Felton, “Tech Highlights of the FTC Privacy Report” (Washington: Federal Trade Commission, 2012), available at http://techatftc.wordpress.com/2012/03/26/tech-highlights-of-the-ftc-privacy-report/.
Center for Democracy and Technology, “Encouraging the Use of, and Rethinking Protections for De-Identified (and “Anonymized”) Health Data” (2009), available at https://www.cdt.org/healthprivacy/20090625_deidentify.pdf.
Peter Swire, “Keynote – Setting the State: How De-Identification Came into U.S. Law and Why the Debate Matters Today,” Future of Privacy Forum, Conference on De-Identification, 2011, available at http://www.peterswire.net/psspeeches2011.htm.
Peter Swire, “Peeping,” Berkeley Technology Law Journal (2009), available at http://ssrn.com/abstract=1418091.
Peter Swire, “Markets, Self-Regulation, and Government Enforcement in the Protection of Personal Information,” in U.S. Department of Commerce, “Privacy and Self-Regulation in the Information Age” (1997), available at http://ssrn.com/abstract=11472.
Comments on this article