Big Data and Privacy: Resist the Urge to Re-identify

On Tuesday, Julie Brill, one of the four FTC commissioners, delivered a keynote address, titled “Reclaim Your Name”, at a data privacy conference. Brill points out that NSA revelations help open a larger and parallel discussion over data privacy in the area of e-commerce. Brill feels that just as we debate how much privacy to sacrifice for national security, as consumers we should also be deciding how much of our personal details should be collected, processed and shared for the sake of online convenience. You can read her full remarks here.

Brill’s presentation takes up the topic of anonymizing big datasets. It’s an important issue for the FTC as regulators. As companies start sharing datasets with their partners, there may be enough details in the combined data to re-identify consumers.

The best known example of this re-identification process is matching voting records to census data—see our post on Professor Sweeney’s work. In the social media world, there are equivalent examples.

In line with recent FTC guidelines, Brill makes it very clear that it takes more than just removing name, address, and other classic personally identifiable information (PII) to bring big datasets closer to becoming anonymous. She specifically calls out facial images as effectively acting as a PII, and there are other quasi-identifiers that have to be accounted for as well.

Brill then makes one additional point about regulations, big data, and consumer anonymity that I think deserves more attention.

Essentially admitting that in practice it may be impossible to completely anonymize datasets, she asks companies sharing consumer-based big data to not spend any processing power on re-identification. This voluntary aspect of her request also reflects recent FTC views on data privacy.

The example she gives is of a cellular company sharing its data with a city traffic planning department, but stripping out classic identifying information. In this scenario, it would be technically possible to match the geo-location points and time of day stamps, with, say, external social media check-in data, to re-identify the dataset. This mythical city department, however, should not take that step.

One of the key audiences for Brill’s remarks is the new breed of data brokers, whose business model is based on re-identifying consumer data. They’ve fallen outside of traditional privacy laws covering the collection of consumer profiles–i.e., the Fair Credit Report Act—and have been the target of FTC investigations.

However, her “don’t re-identify” pledge does make great practical and financial sense for everyone else. The goal, of course, is to reduce unnecessary risk from a data breach: by leaving a geo-coded dataset as is, companies would not be making it any easier for potential cyber-thieves to monetize the data.

For my money, that’s the key takeaway from all this: don’t re-identify data, and use data for its intended purpose.

If organizations don’t cooperate, will the FTC step in to enforce new re-identification rules? The types of breaches occurring over the next few years will hold the answer.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s