Is DNA Really Personally Identifiable Information (PII)? No. Maybe? Yes!

February 5, 2013

Biometric data is at the limits of what current personal data privacy laws consider worthy of protection. This type of identifier covers fingerprints, voiceprints, and facial images. While the risk factors are not nearly as threatening to consumers as more traditional PII, they do exist. Until recently, the dangers of biometric identification using DNA were more theoretical than real. That has suddenly changed. An article in The New York Times last month put a spotlight on research that proved the feasibility of identifying a person—getting a specific name and address—all from a DNA sequence posted online.

It’s not that regulators have overlooked biometric identifiers. Under HIPAA’s safe harbor rules, for example, the Department of Health and Human Services has a list of 18 e-PHIs that would need to be removed from public medical data for it to be effectively considered de-identified. Along with IP addresses, URLs, email addresses, HHS mentions biometric data, with voiceprints and fingerprints given as the only examples.

I’ve already written about how the Federal Trade Commission, another key US agency involved in data privacy regulation, has issued new guidelines to companies collecting facial images. Driving the FTC’s suggestions—mostly directed at retailers—are the recent improvements in image recognition technology and the availability of massive amounts of tagged photos on social media sites. Image matching software is now good enough so that a face captured by a store’s mall kiosk can eventually reveal ethnicity, mood, and with good likelihood, an actual name behind the face.

The risk of linking a name to a set of fingerprints is less serious for the general public— unless you have a criminal record. However, after the Graduate Management Admission Council  (GMAC) began using fingerprints to establish the identity of students taking their “GMATs” for admission to US business schools, the testing company realized there could be privacy issues.

GMAC ultimately decided to use palm scans, which are based on digitizing vein patterns. Since public databases of hand veins don’t exist, the possibility of identification is eliminated.

I would have put DNA into the same category as palm scans: there’s advanced matching technology—available even at the consumer level—but without a public database, there isn’t much of a privacy issue, and therefore DNA is not really a PII.

However, this is not true anymore, and that was the starting point for the researchers mentioned in the Times article. There are actually two public genealogy databases for tracking down one’s ancestry, Ysearch and SMGF, with a combined 135,000 records of DNA data and covering about 39,000 unique last names.

These genealogy databases simply accept a key—actually a pattern on the Y-chromosome—and then return a surname (along with a confidence level). The idea behind these services is to help subscribers find their ancestors and learn more about family backgrounds.

The researchers then examined whether they could narrow down their search. They assumed that they had the state of residency of the subject along with a birthdate—both of these, by the way, are not considered PII under current HIPAA rules. With these three data points and public US Census data, they were able to prove that successful DNA matches would lead to just 12 people on average. That’s a stunning end result from starting with just a DNA pattern.

How good is the DNA “keyword” match at finding a last name? The researchers projected a success rate of 12% for males—since it’s based on the Y chromosome—with a 5% false positive. This is not nearly as accurate as the facial scans, but still a cause for concern. They concluded that the risk of this DNA-based last name search will grow in the future, and there are other scientists and experts who are calling for more public discussion.

I decided to check the privacy policy of one of the DNA testing services. Here’s the good news. They’ll only release your DNA data to third parties with your consent; they treat genetic data as personal data (like name and address), and they say that the genetic data is stored on “secure servers”.

However, thinking purely in term of bytes, folders, and access rights, I’m wondering how truly secure those DNA files are, and whether there are already hackers looking to get that data using the same techniques and exploits they use to snatch credit card numbers and other personally identifiable information.


Humanizing Big Data

January 9, 2013

HUMAN FACE OF BIG DATA
Some App Results

In less than two months, more than 3 million share and compare questions have been answered, in more than 100 countries, through “The Human Face of Big Data” smartphone survey app.

By collating and analyzing these 3 million+ responses we gained some insightful conclusions related to the attitudes and approaches to life from men and women, young and old, all over the world. Here are just a few of the most interesting findings…

In asking the question “What is most important for good health – diet, exercise, environment or genes?” we discovered that Americans are more likely to believe that good health is in their hands, choosing diet and exercise, while Europeans seem to believe their health is predetermined or out of their control, predominantly selecting either genes or environment

In response to the question “What do you do to help cope with stress most?” we learned that as we get older work and prayer tend to replace friends or the arts as our primary means of stress relief, indicating that older generations prefer to bury themselves in work or deal with stress on their own, rather than by seeking entertainment or distraction
When asked “If I could alter the DNA of my unborn child I would improve their: lifespan, intelligence, immunity or appearance” the findings showed that Americans are most concerned about their children’s education and job prospects, while Europeans worry most about their children’s health, perhaps reflecting the current unemployment rates and standards of available healthcare in these two nations.

While these findings give only a brief snapshot of the world around us, the goal of this app was to encourage people to embrace the subject of big data and to consider its potential to help us shape and change our daily lives. Hundreds of striking examples of ways this is already happening are illustrated in the photographs, infographics and essays within the Human Face of Big Data book.

The anonymous data complied from the app will be made available for educators, data scientists, researchers and the general public to access as a valuable research tool, in order to conduct further in-depth sifting and sorting of the results, that may one day be considered an invaluable snapshot of human history.


Follow

Get every new post delivered to your Inbox.

Join 745 other followers