One of the goals of the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 was to spur adoption of electronic record keeping for what has been a paper-intensive sector of the economy. Realizing that the transition to digital data could lead to larger and more serious breach incidents, regulators at Health and Human Services came up with the Breach Notification Rule: healthcare organizations and theirbusiness associates are required to contact HHS when there’s an exposure of unencrypted health data involving more than 500 individuals.
This breach data is also publicly available, and so I decided to take a peek. It’s clear from the stats that the healthcare industry, although relatively new to computerized record-keeping, is also experiencing significant breaches involving its human-generated unstructured content, or dark data.
Since about 2010, HHS has received over 600 breach notifications for almost 22.1 million health records. I mined this data to create a simple chart based on the top five sources of this breached data, which accounts for about 85% of all records taken. The breach categories, by the way, come from the self-reported descriptions and other incident notes—not always clearly stated, so some judgement calls were made.
Keep in mind, for medical breaches to be reported, the data has to be unencrypted protected health information or PHI—essentially, personally identifiable information such as names, social security and medical insurance numbers. If we excluded the Backup and Other categories, then we can be fairly sure that the remaining nine million exposed records contained dark data. Downloaded in clear-text form from centralized medical information databases, this dark medical data typically finds a home on loosely permissioned folders. From there, it is either directly hacked or accidentally exposed, or then transferred to laptops and other portable devices—USB drives—that are ultimately lost or stolen.
Another source of breached data has been misplaced backup tapes or CDs, which seem to be a significant problem for healthcare data processors. There’s even one incident, accounting for most of the Other category, in which a physical server drive containing 1.9 million patient records was stolen. In all these cases, the data taken was structured—i.e., formatted records. But since the PHI wasn’t encrypted, it wouldn’t take much work for a hacker to zero in and parse out account numbers, names, addresses, and other identifiers.
Bottom line: as far as determined medical data thieves are concerned, it’s better to think of even this structured PHI data as simply badly formatted but target-rich spread-sheets.