Personally Identifiable Information Hides in Dark Data

May 3, 2013

To my mind, HIPAA has the most sophisticated view of PII of all the US laws on the books. Their working definition encompasses vanilla identifiers: social security and credit card numbers, and all the other usual suspects. With the additional words “reasonable basis to believe that the information can be used to identify the individual”, HIPAA’s definition takes in digital handles such as emails, IP addresses and even facial imagery. But there’s a little more to HIPAA’s PII definition, and it applies specifically to free form text (commonly found in word processing documents, spreadsheets, presentations, etc.)

The complete list of HIPAA’s PIIs is enumerated in the law’s Safe Harbor guidelines. In plain-speak, these guidelines tell health IT administrators what information is considered private, requiring special authorization to view or process. It includes the aforementioned identifiers, as well as medical record numbers, health insurance IDs, and some others. By the way, we’ve conveniently put this PII list in our omnibus data protection compliance whitepaper.

An unstated assumption made by many is that PII only lives in structured formats—in other words, fields in a database. Readers of this blog of course know that PIIs are often likely to be harvested from the massive amounts of human generated dark data found on corporate files servers.

The HIPAA regulators have understood this as well. In clarifying the rules for removing PII —“de-identifying”—data for publication and general usage, they explicitly cover the possibility that PII can also reside in free-form text. I’ve excerpted the key paragraph from their de-identification best practices below :

PHI [protected health information] may exist in different types of data in a multitude of forms and formats in a covered entity.  This data may reside in highly structured database tables, such as billing records. Yet, it may also be stored in a wide range of documents with less structure and written in natural language, such as discharge summaries, progress notes, and laboratory test interpretations … The de-identification standard makes no distinction between data entered into standardized fields and information entered as free text (i.e., structured and unstructured text)— an identifier listed in the Safe Harbor standard must be removed regardless of its location.

Got that? PHI, which is essentially PII along with other sensitive medical information, embedded in spreadsheets, docs, and presentations is just as worthy of HIPAA privacy protections as fields in databases.

So if we follow these ideas—PIIs can be anything that reasonably links to an individual, and this data can exist in text—to their logical conclusion, then we need to consider a new possibility. Suppose this sentence from a doctor’s notes were uploaded to a file server:

The patient, a technical content specialist at Varonis, a software company, has been complaining about tennis elbow.

The natural question to ask is whether “technical content specialist at Varonis” is a PII?

It’s not a PII in the sense of a uniquely coded key such as social security number or health insurance ID that links back to a person. But in another sense, it acts very much like PII. Don’t believe me? Try typing that phrase into Google and see what comes up.

We’re really talking more about the meaning of the text—or as experts would say, the semantic value—rather than actual letters, numbers, and other syntax. But HIPAA’s Safe Harbor rule even takes this into account: it specifically notes that the “knowledge” in free text can also be used to point back to a person.

As a practical matter, the HIPAA rules mean that any reference to a patient’s job title and company is a violation of the law’s privacy protections.

This leads to a broader discussion on what’s called the “semantic web”. In brief, Google and a few others are already doing leading edge work on extracting meaning and knowledge from web content. You can see for yourself how well Google does this by entering the keywords “height of the empire state building” in a search. You’ll get back an actual answer, 1454’, in addition to all the docs with that exact phrase.

The larger point is that along with stealing PIIs, hackers and cyber thieves are also getting better at mining and interpreting human generated text for personal details, and then building more convincing fake identities to be used in social attacks, such as phishing and pretexting.

Bottom line: these bits and pieces of personal information that are scattered across file servers in clear-text documents can be used to identify an individual with very high likelihood.

That’s important to keep in mind when someone in your company asks, “do we know what’s in our files and the risks involved if our servers are breached?”


Defensible Disposal with Automation

September 13, 2012

It’s no secret that the data on corporate servers is growing exponentially. Documents, presentations, media, spreadsheets, and other files are constantly being created and moved onto servers, and after a while, most of it is rarely used, if at all. However, much of this stale data also must be retained in order to comply with regulatory compliance, or to maintain business continuity.

Many IT departments are faced with the reality of having to either continually expand their storage infrastructure or try to accurately determine which data can be safely disposed. The first option is costly and results in basically paying for information you’ll never use, while the latter can be costly in terms of man-hours and brainpower, especially without an automated process in place.

Let’s examine the options a bit closer.

Do Nothing

While it seems like a simpler solution to keep expanding your hardware and try to hold onto every bit just in case it is needed some time in the future, this sort of inaction with regards to defensible disposal is simply not a viable option. Allowing vast amounts of data to accumulate will make it increasingly difficult for users to find relevant data, slow down e-discovery, cause servers to perform poorly, and possibly even crash them, costing your business precious time and money.

Do Anything

Taking the wrong action can be just as damaging. Deleting your CEO’s old email archive might result in a very uncomfortable conversation; disposing of files that you are legally obligated to retain (for HIPAA, HITECH, SOX, etc.) can cost people their jobs, and possibly result in legal action. That’s something no IT professional ever wants to have to deal with.

Do the Right Thing

It should be clear by now exactly why proper defensible disposal techniques are integral to the survival of any business, especially those with sensitive data. Proper disposal techniques can save money and time by streamlining the process of deleting useless data and allowing for admins to focus on other more pressing needs.

If you’re finding the process itself takes quite a bit of planning and/or some sophisticated technology to do most of the heavy lifting, consider automating with technology like the Varonis Data Transport Engine. Varonis DTE simplifies the process of defensible disposal by leveraging our Metadata Framework, allowing admins to automatically and continually delete or migrate data based on a wide array of criteria, such as the content of the file or the date it was last accessed by a human user. This ensures that information that needs to be retained isn’t disposed of by accident and the data that can be safely deleted proceeds safely to bit-heaven, or bit bucket, or /dev/null.


The Healthcare Market Opportunityy

August 31, 2012

Over the past 6 months there have been a number of data breaches within the healthcare market. With data security breaches costing the U.S. healthcare industry about $6.5 billion a year1 and even with the recognition of these breaches, 50% of respondents to RedSpin’s (an IT security audit firm) say nothing is being done to protect data2, the healthcare market represents a huge opportunity for managed service provider’s to provide cloud backup and recovery services to address this growing issue.

Market Opportunity Abound

With the size and frequency of data breaches alarming the health care industry, now is the time to capitalize on these unfortunate security concerns by stressing the benefits that cloud backup services offer in terms of keeping records secure as well as ensuring Health Insurance Portability and Accountability Act (HIPAA) compliance. With more than 19 million individuals affected by major healthcare information breaches since September 2009 and data breaches from unencrypted devices having increased 525% in 2011, this represents a huge market opportunity for managed services providers already selling services into the healthcare market, or those looking to sell to the healthcare market. Not just every managed service provider can effectively ensure adequate healthcare clinic / hospital data protection so ensure you can speak their vernacular and understand all the compliances and regulations required. As a managed service provider looking to offer or already offering cloud backup services, in order to go after the healthcare market, you need to ensure you have a HIPAA compliant cloud backup platform in place with a FIPS 140-2 certification being a huge bonus.

Why Healthcare Clinics/Hospitals Should Invest in Cloud Backup Services from Managed Service Providers to Protect Patient Privacy?

Investing in cloud backup services ensures a secure backup system for healthcare clinics/hospitals where BYOD is prevalent (as well as those that are not) – as not all backup can protect endpoint devices such as laptops, tablets and smart phones. Investing in newer technologies improves the reliability and speed of recovery for patient data should there be a disaster and minimizes risk of data theft or loss by utilizing the highest encryption security possible ensuring data is encrypted in flight and at rest and only the healthcare clinic/hospital has the ability to decrypt. It also eliminates the shortcomings of tape backup which includes being expensive, vulnerable to obsolescence, potential inability to recover data due to tape failure or being lost/stolen when transported off-site.

If you’re interested in learning more about how to invest in cloud backup services, please visit www.c24.co.uk


Follow

Get every new post delivered to your Inbox.

Join 753 other followers