After posting our IT predictions for next year, we decided to assign ourselves an even more challenging task. Using recent headlines from the tech press as a baseline, we tried to extrapolate ahead to the year 2025. Where might today’s stories about technology and privacy lead to in ten years if we don’t change how we manage IT security today?

In 2014, we saw many ideas more at home in sci-fi movies and novels become an everyday reality—Star Trek-like replicators in the form of 3D printers, James Bond-ish smart cars, and advanced machine intelligence courtesy of IBM’s Watson. Hold these thoughts as we now present privacy and security related news items from the future along with the questions raised by these emerging threats from our own time in 2014.

Any parallels to Orwell’s 1984 are (we hope) purely coincidental.

Hackers Uses 3D Printed Eyeball to Fool Retinal Scanner

2014: Many data points were created when President Obama got 3-D printed. Whether it’s the president’s or just an ordinary citizen’s biometrics, who should have access to the data points of heads, arms, finger, retinas, etc.?

2025: Interpol’s Cyber Security Division yesterday arrested a gang of biometric cyber thieves. They were caught using an eerily life-like plastic eyeball encased in a super-clear glass block. The thieves had previously hacked into idVault, one of the world’s largest data brokers, and 3D rendered the physical eye structure from stored retinal digital signatures …

Cyber Carjacking Ring Foiled

2014: Automakers know how you roll, but how will they use, store and protect the data collected from our increasingly smart vehicles?

2025: Working from a high-rise office building in Los Angeles, a ring of hackers had been stealing cars remotely by exploiting a new vulnerability found in automakers’ Microsoft-based telemetric controls. After owners parked their self-driving vehicles, the thieves used bots to crawl the IOE (Internet of Everything), insert special code into the navigation module, and then drive the cars to a special garage owned by the hackers. Police say they had never seen …

Data Broker idVault Sued

2014: Personalization has simplified how we locate products and services. With highly targeted advertising and content selection, are we as consumers being secretly penalized and denied access to an alternative world of ideas and options?

2025: idVault, one of the world’s largest personal information brokers, was sued in federal court yesterday. This is the largest ever class-action brought against a data broker. The suit came about when consumers in several states noticed sudden rises in their auto insurance and credit card rates soon after they had installed a free children’s game app in their car’s operating system. The app secretly was secretly sending GPS and other navigation data to idVault, which was then selling the data to financial companies …

Cell Phone Hackers Caught Impersonating Bank

2014: In today’s cellular networks, how can we ensure that we are not being monitored by third parties (private and governmental).

2025: With the cost of cell phone transmission electronics having plummeted over the last few years, 5G equipment is now within reach of ordinary citizens. Beside the new wave of private pop-up cell phone carriers offering free streaming video, hackers have also gotten into the cell phone business. Recently a hacker collective was caught using their own pirate cell phone tower to intercept calls. Their software filtered out connections to banks and brokerage house, handing off the rest to Verizon. The FBI said the hackers appeared to callers as personal bankers …

Clothes and 3D Masks Make the Hacker

2014: With the help of 3D printers and the ability to render various images when shopping, how can we realistically authenticate ourselves for even the most basic services?

2025: The smart mirror technology has improved greatly since department stores began using them in their dressing rooms a few years ago. These special mirrors now allow store customers to view inventory, select clothes, and then render images of the shopper in different virtual outfits. However, hackers were found to have penetrated one high-end department store’s firewall, stealing images and data about its customers from the embedded file servers in the smart mirrors. Using 3D printers, they generated realistic masks, and then dressed in similar outfits to their victims. Police say they almost got away with opening an enormous credit line ….



Great article from the guys at Varonis

I had the chance to talk with cyber security expert Justin Cappos last month about the recent breaches in the retail sector. Cappos is an Assistant Professor of Computer Science at NYU Polytechnic School of Engineering. He’s well known for his work on Stork, a software installation utility for cloud environments.

In our discussion, Professor Cappos has a lot to say about weaknesses with our current approach to password-based security as well as new technologies that can be applied to credit card transactions. He’s worked on his own password hash protection algorithm, known as PolyPasswordHasher, which would it make it very difficult for hackers to perform dictionary-style attacks. Cappos offers some very practical advice on securing systems.

Metadata Era: It looks like Backoff malware was implicated in the Staples attack. Though we don’t know too much about the exploit, but if it’s like other recent attackers, the hackers found it relatively enter the system through phish mail, guessing passwords, or perhaps injection attacks.

Justin Cappos: I did look around for this information, and I see a lot of people reporting, but I don’t see anybody specifically saying or speculating that perhaps it’s similar to Target or some of these breaches. Nothing concrete yet.

That’s not to say there isn’t anything a company can do to protect infrastructure—for example, to harden things, to train users not to open phishing mails, and have people choose reasonable passwords especially on sensitive systems. The problem with any of these defenses is that the attacker has to only succeed once.

Once they get in, typically they can move around, get access to other things. So businesses need to do a few different things to try to protect themselves effectively. Some of which they may already being doing, but there needs to be a strong emphasis on compartmentalization.

You mean …

So the person who does PR for the organization doesn’t, say, have direct access to financial records.

Also, it’s extremely important to have good network monitoring. You need to have a way to detect whether data is moving off our servers—is it going to places we wouldn’t expect it to be going to. Looking for things like, for example, an HVAC subcontractor who occasionally accesses the corporate network but has now suddenly found to be hoovering up data. That should be a red flag!

So once they’re in through phishing or injection, they have the credentials of an existing user, and as you pointed out, you have to start monitoring for unusual behaviors. This internal monitoring function becomes very important. Although it’s not something necessarily that companies focus their resources on.

Exactly. So imagine a quarantine. If you were to quarantine something like a thousand people, you wouldn’t put them all in the same big area, where they’d all interact. Ideally you’d want to isolate them.

At a minimum, you want to cut down on interactions. So when you do a data analysis in your organization, you want to keep track of how these isolated pockets are able to communicate and look for suspicious patterns and behaviors.

How can this be done—is this part of your research?

Not specifically for me. But it is good best practices for lots of different organizations. So the military and government use this compartmentalization approach. As do banks. They will segment information off and in some cases, have isolated networks that are not even connected to the Internet. It really depends on the sensitivity of the data and how it will impact the working style of the people.

So you’re really talking about a data governance function, in terms of what is more valuable and what requires more restrictive permissions.

I consult with lots of startups. And one of the first things I do is I say, “Tell me your worst nightmare about somebody breaking in and stealing something. What is that thing?”

For some companies, it’s data about their customers, for some it’s information about an algorithm. It varies a lot depending on the monetization strategy and what the secret sauce of the organization is.

You want to find that thing, and for larger companies, it’s probably many things, and isolate them as much as possible so it’s as hard as possible for an attacker to get that information.

Sometime it means separating functionality out across multiple servers. So for instance, if your password data is one of the most sensitive things your organization has, you can very easily have a separate server whose only function is to handle password requests, and it did this through a custom protocol that your company wrote.

You would monitor the network and if it got anything other than a password request and returned anything other than a “yes or no”, then you would know immediately that something has happened.

That takes time and takes energy, and you have to implement something a little different to make that happen. If you’re going to protect a really valuable asset, they should do this!

And if you don’t spend the time and effort for say your legacy systems, what would your recommend?

For legacy systems, there’s certainly never an excuse not to follow best practices. They absolutely should be using salting and hashing of passwords, if not something stronger, such as hardware-based authentication or PolyPasswordHasher. They need to be using strong protections for user passwords and data.

They need to be encrypting credit card information. If they’re not really in the security business, they really shouldn’t be storing credit card information, they should consider working with a 3rd party payment processor that will make it so they effectively only have tokens on their server instead of raw credit card data. They can outsource the risk and security concerns with storing credit card information in many cases.

Sure, for some companies it would make sense to outsource to payment processors. But clearly the big box retailers are doing their processing in house.

You mentioned multi-factor authentication. In theory that would have made some of the attacks we’ve seen over the last year much more difficult. Is that a fair statement?

It is. It’s not a panacea—it doesn’t solve all problems. It raises the bar for simple password attacks. It doesn’t necessarily stop people from getting in other ways—SQL injection and other vulnerabilities. Two-factor authentication will not help in that context.

Another way it often does help is to prevent the spread. So if you have a sensitive server that users have to log into with two-factor authentication, even if the attacker figures out the password for users on that server, if they don’t have the second factor they will be unable to get in. That can sometime contain the attack.

Security is almost never about perfect solutions. It’s pretty much about making it harder for the hackers, and buying yourself some time and just making it difficult enough that you no longer become a good target

Right, so it becomes too much of an investment for them and the attackers will move on to an easier victim.

In our blog, we’ve been focused lately on the flaws in authentication systems, mostly as result of SSO or Single Sign On that distributes the hash of the password throughout a system. We’ve written about Pass the Hash wherein once they attackers get the password hash they essentially can become that user. Any recommendations for this authentication problem, and are there longer term solutions?

Sure. There are three things to know about in this area.

The first is that if your organization has a good password policy and makes users choose passwords that have a reasonable degree of randomness, then breaking those passwords—through say dictionary attacks— still can be implausible. What really happens is that if you get those hashes and those passwords behind them are not amazingly well chosen, then one can break them. If they are very strong passwords—like 8 character, randomly chosen and not from a dictionary—those are pretty strong.

If you’re trying to generate passwords as a human, there are tricks you can do where you pick four dictionary words at random and then create a story where the words interrelate. It’s called the “correct horse battery staple” method! [Yeah, we know about it!]

Strong passwords do help a lot. Organizations should be encouraging their users to choose strong passwords. I think that—many experts believe—requiring users to frequently change passwords, say, every three or six months, does much more harm than good. Because users get frustrated by this and are more likely to forget their password, and so choose passwords that somewhat fit the criteria but are easy enough to remember. I wish organizations would do away with this policy, and instead choose a good initial strong password. That would dramatically increase the time it takes for hacker to crack the passwords.

By the way, should we be relying on those password strength meters?

Unfortunately, password strength meters can be fooled—you can give it a poor password that it thinks is a good password. Use it with a grain of salt!

There are lists out there of commonly used passwords—even those that use upper and lower case with symbols—and organizations should be really positive that users are not choosing anything in the popular password list. They should actively block the passwords.

That’s the first thing—focus on passwords.

The second is that organizations like Microsoft, should be really spending more time designing and improving the security of their systems with respect to password storage. The threat model and landscape has really changed in the last few years where hacker are much more aggressively going after password databases.

So I would like to see much better support from operating system vendors for things like hardware protection of passwords. I’d like to see some of the new techniques for password protection—like PolyPasswordHasher and other things like this—integrated more broadly. Anything that will slow or stop attackers.

Microsoft, by and large, has very good security—they have an excellent security team. I would just love to see them have a focus in this area, and do this in a realistic way and even provide patches for older versions, which companies like banks are still using.


It’s a password storage and protection scheme. It’s actually something that’s been done by myself and one of my students. It makes it so you have to crack multiple password in a database simultaneously to know if any of them is correct. It’s much harder for hackers to crack passwords from the hash. It’s simple to deploy–it’s a software change in the server—and it makes things exponentially harder. It’s open-source and free—available for different frameworks.

And the third part of your recommendations?

There’s something called EMV, which is a standard way to handle credit card numbers that’s commonly used everywhere else but the United States.

So there’s a chip on an EMV-based card that protects information—a tiny security computer if you will. If you swipe your card at a terminal, then all you’re doing is authorizing a transaction—you’re not giving any card information. But if you swipe a magnetic card—like what we use in the US—they really have all the information. The nice thing about EMV cards, you have to steal the cards to take advantage of it. The bar is much higher.

What information does the EMV chip give?

A way of thinking about it is that the magnetic strip technology is almost like giving someone your wallet. Basically, every time you hand someone a credit card or credit card number, you give them the ability to make transactions on your behalf. With EMV, you not giving the ability to make transactions in the future, you’re giving an authorization for the current transaction—almost like a ticket for a movie. You can’t reuse it.

Ah, so you use it once and it can’t be replayed in an attack.


If the EMV solution becomes widespread, would that prevent the retailer attacks from succeeding—there wouldn’t be anything the attackers could use again?

No security is perfect, but EMV makes it much harder. It’s not impossible, though. The amount of work you’d have to do is substantial. I wouldn’t anticipate we’d see millions of credit card stolen. It’s not a panacea, but it works well.

EMV raises the barriers and eliminate the easy hacks, which is essentially what we’ve been seeing the last year– retails hacks that required very basic techniques.

Yes, it would no longer be a problem of hackers stealing and then at their leisure moving files. Instead they would have to do real-time, live changes to the transactions. EMV is not perfect, but it makes it harder. And often times in security, harder is enough.

That’s a good way to end this. Thanks Professor Cappos.

Thank you!



A legal hold is a written directive issued by attorneys instructing clients to preserve relevant evidence – such as paper documents and electronically stored information – in an anticipated litigation, audit, or government investigation. However, as businesses increasingly store data in electronic formats, it’s becoming ever more important to be able to manage, preserve, classify, and search electronically stored information (ESI).

A legal hold includes the following steps:

  • Issuing a written hold notice
  • Identifying the right stakeholders
  • Coordinating data identification and preservation
  • Monitoring the implementation of the hold

Who Needs to Comply

Any organization that can potentially come under litigation should educate employees on the company’s legal hold policy as well as how to respond to any legal hold notice they may receive. When a legal hold is issued, attorneys should ascertain that the recipients listed in the legal hold understand their responsibilities. Also, working within the organization’s legal framework, attorneys and the IT Department will take all appropriate steps to retain and preserve ESI.

Risks in Non-compliance

When evidence is destroyed, lost, or altered, the ramifications can be detrimental as it becomes virtually impossible to prove or defend a case. An organization’s failure to prevent spoliation of evidence can result in court-ordered sanctions as well as fines, especially if ESI is found to have been destroyed because a legal hold was not effectively carried out.

Below are consequences and regulations set forth by each association and regulating party.

Title 18 of United States Code Sections

Under Title 18 of United States Code Sections, the individual responsible will be fined and/or face jail time.

“Whoever knowingly alters, destroys, mutilates, conceals, covers up, falsifies, or makes a false entry in any record, document or tangible object with the intent to impede, obstruct, or influence the investigation or proper administration of any matter within the jurisdiction of any department or agency of the United States or any case filed under title 11, or in relation to or contemplation of any such matter or case, shall be fined under this title, imprisoned not more than 20 years, or both.” 18 U.S.C. Sec. 1519.

Federal Rules of Civil Procedure

Under Federal Rules of Civil Procedure Rule 37 possible sanctions are as follows:

  • dismissal of the wrongdoer’s claim
  • entering judgment against the wrongdoer
  • imposing fines on the wrongdoer

How Varonis can help with Legal Hold

1. Finding Evidence

DatAnswers maintains an index so that files containing specific terms can be found at any time.

The Varonis IDU Classification Framework is a data classification engine that can incrementally scan file servers and intranets for documents based on a multitude of criteria: keywords, patterns, date created, date last accessed, date modified, user access, owner, and many more, making it possible for IT to find and preserve relevant evidence.

The IDU Classification Framework is efficient and performs true incremental scans, knowing exactly which files have been modified and require rescanning without checking every single location.

The IDU Classification Framework is an automated classification engine. It does not rely on users to manually flag or tag data (though that is possible). It classifies data across multiple platforms (Windows, NAS, SharePoint, etc.).

Also critical to preserving evidence, DatAdvantage can identify and locate all ESI, show which users and groups have access, and provide an audit on all ESI, such as when the file, directory services object, and email was open, edited, deleted etc.

2. Holding Evidence

Once relevant evidence has been found by the IDU Classification Framework, the Varonis Data Transport Engine can automatically migrate or copy documents into a secure location designated for legal hold where the files cannot be modified or deleted.


During a recent visit to Brazil, I encountered many customers and partners who faced a similar challenge – providing their clients with a safe, secure and genuinely easy way to share files and collaborate with data.  All faced a number of barriers and none were happy with the current offerings of cloud based file sharing solutions.  Generally speaking:

  • All required a secure way to share files with internal and external people– partners, vendors and employees
  • All tried to block access to file sharing sites and no one thought they were successful in doing so
  • All were concerned about the additional resource requirements to manage and control cloud file shares
  • Many wanted the same user experience and processes  for internal  and external collaboration
  • Not one had a plan to fulfill these requirements
  • All were required by the business areas to provide a solution in the near term

The following 5 criteria summarize their requirements, which are not currently fulfilled by cloud based file sharing solutions:

1. Ongoing guarantee of rightful access

Customers clearly state that the security of cloud based file sharing solutions is a primary concern.  They require a comprehensive audit trail of all usage activity, the ability to ensure permissions are granted and revoked at the appropriate times by the appropriate people, and the ability to develop different profiles for different data and people based on data sensitivity, customer location, and role.

2. Ability to leverage existing infrastructure and processes

Customers want to leverage their existing infrastructure and processes instead of purchasing a new solution, and have no wish to reinvent their processes for managing data on a third-party cloud solution.  Customers have processes and applications to perform backup, archival, provisioning and management of existing infrastructure, and they are confused about how to perform these functions within a cloud-base file sharing solution.

3. Ensuring Reliability with Accountability

IT organizations have defined service levels for their internal clients,  and are accountable for the delivery of each service. If they don’t deliver, there is no question about whose responsibility it is.  Service levels associated with cloud based file sharing must be negotiated like other third party services – there are typically few guarantees of performance and remedies for non-performance are limited.

4. Providing an intuitively simple user experience

Regardless of the solution, IT Managers are very concerned about a new user experience for their clients.  Most indicate that a different user experience will require training, impact the number of calls for support, and reduce productivity at least temporarily.  Ultimately, IT Managers would like leverage the user experience that their user population has already mastered.

5. Predictable expense

Typical cloud based file sharing solutions are priced based on amount of storage— storage requirements often grow at a surprising rate. Customers may need to negotiate storage costs with cloud providers on an ongoing basis.


To get a sense of where the PCI Data Security Standard (DSS) is heading, it helps to take a look beyond the actual language in the requirements. In August, PCI published a DSS 3.0 best practices document that provided additional context for the 12 DSS requirements and their almost 300 sub-controls. It’s well worth looking at. The key point is that PCI compliance is not a project you do once a year just for the official assessments.

The best practice is for DSS compliance to be a continual process: the controls should be well-integrated into daily IT operations and they should be monitored.

Hold that thought.

Clear and Present Dangers

One criticism of DSS is that it doesn’t take into account real-world threats. There’s some truth to this, though, the standard has addressed the most common threats at least since version 2.0—these are the injection style attacks we’ve written about.

In Requirement 6, “develop and maintain secure systems and applications,” there are sub-controls devoted to SQL and OS injection (6.5.1), buffer overflows (6.5.2), cross-site scripting (6.5.7), and cryptographic storage vulnerabilities (6.5.3)—think Pass the Hash. By my count, they’ve covered all the major bases—with one exception, which I’ll get to below.

The deeper problems are that these checks aren’t done on a more regular basis—as part of “business as usual”—and the official standard is not clear about what constitutes an adequate sample size when testing.

While it’s a PCI best practice to perform automated scanning for vulnerabilities and try to cover every port, file, URL, etc., it may not be practical in many scenarios, especially for large enterprises. Companies will then have to conduct a more selective testing regiment.

If you can’t test it all, then what constitutes an adequate sample?

This question is taken up in some detail in the PCI best practices. The answer they give is that the “samples must be sufficiently large to provide assurance that controls are implemented as expected.” Fair enough.

The other criteria that’s supposed to inform the sampling decision is an organization’s own risk profile.

Content at Risk

In other words, companies are supposed to know where cardholder data is located at all times, minimize what’s stored if possible, and make sure it’s protected. This information then should guide IT in deciding those apps and software on which to focus the testing efforts.

Not only should testing be performed more frequently, it’s also critical to have a current inventory, according to PCI, of the data that’s potentially hackable—let’s call it data at risk—and users who have access.

For Metadata Era readers, this is basically the Varonis “know your data” mantra. It becomes even more important because of a new attack vector that has not (yet) been directly addressed by PCI DSS. I’m referring to phishing and social engineering, which has been implicated in at least one of the major retail incidents in the last year.

Unlike the older style of injection attacks that targeted web and other back-end servers, phishing now opens the potential entry points to include every user’s desktop or laptop.

Effectively, any employee receiving a mail—an intern or the CEO­­—is at risk. Phishing obviously increases the chances of hackers getting inside and therefore raises the stakes for knowing and monitoring your data at all times, not just once a year.


During this past year, we’ve been reminded (too) many times that data breaches are costly and damaging to a company’s reputation. According to the Ponemon Institute’s 2014 Cost of Data Breach Study, the average total cost of a data breach—which can include credit monitoring, legal fees, remediation, and customer loss—for the companies who participated in the research report increased 15%, to $3.5 million USD. Also, the average cost paid for each lost or stolen record containing sensitive and confidential information increased more than 9% from $136 in 2013 to $145.i In short: failure to protect sensitive data has a quantifiable cost, and the theft of that data has bottom line implications. However, are C-level execs viewing files and emails containing customer records and other sensitive information as bits and bytes on a disk, or do they view them as piles of unprotected cash?

Unfortunately, it has been much more of the former, based on the huge data heists of the last year. The tide, though, may finally be changing. Here’s what HP CEO Meg Whitman had to say about the cloud, security, and Big Data:

“When I am with my fellow CEOs…these are three areas that me and my colleagues are worried about…Every CEO lives in fear of a Big Data breach, loss of data, a hack into the system that compromises our company’s reputation. And reputations take years and years to build and can be destroyed overnight.”

Our guess is that executives will have no choice but to join Ms. Whitman and start weighing the potential impact of data loss and how it can evaporate years of trust and brand equity in a heartbeat.

Unsure if your environment is well-protected? Get a free 30 day risk assessment! Varonis will show you where your sensitive content is, who has access to it, and more.



Steve Fingerhut is a VP of Marketing at SanDisk.  In his inaugural guest blog post for the Metadata Era, Steve discusses how enhancing existing server investments with solid-state memory can speed up Big Data analytics while keeping costs in check.

Metadata readers know better than others that we’re living in an era of data- massive data generated by web transactions, our mobile devices, social media and even our refrigerators and cars. The numbers are stunning. Data is growing at dizzying exponential rates: 90% of the world’s data was created over the last 2 years alone, and by 2020 data will increase by 4,300%

The majority of data produced today is termed ‘Unstructured Data’, which is data that does not fit well into traditional relational database systems.

This category usually includes emails, word documents, PDFs, images, and now social media. To give you a glimpse of how much unstructured data we’re generating: every minute, 100 hours of video are uploaded to YouTube, and more than 100 Billion Google searches are done every month. But what do we do with all of this data?

Analytic Apps Crave Big Data

The giants of the web have long used data as a tool to help them understand customer behavior. For example, Ebay produces 50TB of machine-generated data every day(!), collecting and recording user actions to understand how they interact with their website.

By analyzing data in their focus area, businesses can respond to patterns and make needed changes to improve sales, achieve higher engagement rates, enhance safety or help guide their overall business strategy.

Big Data is no longer just a tool for these web giants. Collecting, analyzing and utilizing data is critical for businesses of any size to remain competitive, and as such, businesses are collecting more data.

When it comes to Big Data, bigger is better. Analytics that are meant to forecast future probabilities (predictive analytics) become far more accurate when increasing data-set size to a massive scale. So companies are expanding projects to help their big data grow even bigger.

Research shows that companies with massive investments in Big Data projects to mine data for insights, are not only generating excess returns but are also gaining competitive advantages. So it’s no wonder data is becoming the most precious commodity of organizations today.

Extending Memory

For data center managers who need to contend with the growth of business data, finding the infrastructure to support storing, archiving, accessing, and processing these huge data sets has become one of the biggest concerns for organizations, and imparts great challenges.

One approach is to divide- and-conquer the problem by distributing the data to separate servers with idle CPU capacity and storage resources. Having many computing units operating in parallel as, say, part of a Map-Reduce platform, is one way—though complex— to handle the problem.

Another idea is to squeeze more performance out of existing computing elements. For many kinds of Big Data applications, gigabytes of data points often have to be manipulated at a single time—for example, in complex statistical operations.

It’s far faster (by orders of magnitude) to have the data in memory at the time it’s needed instead of accessing it from disk storage. But it’s often not feasible to do this for all but the most powerful (and expensive) high-end servers with their very large memory spaces.

An effective route to contend with these challenges is to use SSDs or flash-based disk drives for this task. SSDs have the same type of memory found in mobile devices and cameras, but they’ve been expanded and customized to take care of far larger capacities and data center reliability. Would you be surprised to learn that the big web giants (like Amazon, Facebook and Dropbox) have long moved to include flash-based Solid State Drives in their storage infrastructure?

As such, SSDs deliver far superior performance than legacy storage– 100x that of old-fashioned hard drives. Fitted with SSDs, even standard servers can sort and crunch huge amounts of data without the much-feared “disk penalty”— losing valuable time through seeking and accessing data blocks from a drive’s magnetic media.

Other benefits: without any mechanical parts, companies can eliminate sudden, unpredictable disk failure from their list of risks!

A Real Added Value

But there is more to that. You might be wondering what the cost impact of flash is, and if your organization can afford implementing SSDs.  I actually think that you can’t afford not to, and let me explain why.

As we look at Big Data and analytics, applications are not only coping with huge data sets, but also data from multiple data sources, often requiring tens of thousands (if not hundreds of thousands) of operations per second (IOPS) for each workload.

To realize such high level performance with traditional drives, IT managers have had to ‘stitch together’ a huge pile of hard drives to jointly supply the needed IOPS. But bringing together so many drives not only generates complexities and additional points of failure, it also means managing and paying for more racks, networking, more electricity to power the infrastructure, more cooling costs and more floor space to pay for! As SSDs deliver 100x performance, you will require far less hardware to analyze and perform complex operations, which translates to cost savings both on infrastructure and operation.

Let me add some numbers to support my claims. Recently, we conducted a test using Hadoop, a Big Data framework used for large-scale data processing. We compared the use of hard drives vs. solid-state drives to see not only how much performance gains SSDs can deliver, but also to calculate their impact on costs.  As you may have already guessed, SSDs came out winning big on both ends. We saw 32% performance improvement using 1 terabyte dataset and better yet, a 22%-53% cost reduction, depending on the workload’s pattern of access to the storage.

Getting the Job Done Right

When aiming at optimizing Big Data analytics, there’s still a larger point to be made. It takes two elements to get the job done—a combination of hardware advances—SSDs, for example—as well as smart software. Companies will need both to contend with the oncoming data tsunami and ensure they can make the most of their analytics to remain competitive.


I’m always a little surprised by the reaction from customers regarding off-site storage services.  It goes something like, “Well, the price is so good, that I don’t really need to know anything else.”  From a pure accounting standpoint, I do see their point.

As a company goes down the road of evaluating low-cost backup and disaster recovery service providers, they should stop and “read the fine manual” as we say in IT: in this case, it’s the small print contained in the Terms of Service. I’ve looked at more than a few of these agreements and here are three key points that you should keep in mind:

1. Security Is Ultimately Your Responsibility

You’ll often see language in the ToS that says “they take security seriously” and “it’s very important”, but there’s additional legalese that states the providers can’t be held liable for any damages as result of data loss.

In fact, some of the ToS have a clause that explicitly says you are responsible for the security of your account. Yes, they will encrypt the data, and you may be given the option to hold the security keys. In a very strong sense, the security hot potato remains with you even though they have the data. When calculating the true costs and risks of these services, keep that in mind.

2.  Two-Factor Authentication?

As Metadata Era readers, you’re no doubt wondering about two-factor authentication. As a kind of a virtual commercial landlord, these services hold data for lots of businesses, so you might expect building  security would be tight—“show me your badge”.  After all, these backup services are a magnet for hackers.

I didn’t see two-factor authentication listed as a standard part of the packages of the cloud providers I looked at. However, there are third-party services available that can provide out-of-band authentication through a separate logon solution, but at an extra cost.  And you’ll have to contract separately with them.

3.  Data Availability

You store your data in the cloud with these companies, so you’d expect some promise that the data will be there when you need it.  Of course, on the public Intertoobz, there are limits to what they can be responsible for. Typically there are clauses in the ToS that exclude the digital equivalent of acts of nature—e.g., DoS attacks.

Outside unusual events, these back-up services generally don’t even provide a likelihood of availability—99%, 99.9%, or pick your sigma.  And the most they’re liable for when there’s loss of data dialtone is the subscription fee.

This is not to say that you can’t get a better deal—Service Level Agreements (SLAs) that compensate when certain metrics aren’t met—but for low-price, one-size-fits-all bit lockers, there is usually no or limited opportunity to negotiate.


If you already have an outsourced data backup or disaster recovery solution in place with a sensible SLA and you can truly estimate the cost savings, and you’re getting a blue-light deal, then more power to you.

However, for everyone else, a good in-house IT department using purchased archiving or transfer solutions can offer custom security solutions and high-availability, along with guaranteed accountability.

Authentication Lessons from the Magic Kingdom: A Closer Look at Kerberos, Part I

The flaws in NTLM I’ve been writing about might lead you to believe that highly-secure authentication in a distributed environment is beyond the reach of mankind. Thankfully, resistance against hackers is not futile. An advanced civilization, MIT researchers in the 1980s to be exact, developed open-source Kerberos authentication software, which has stood the test of time and provides a highly-secure solution.

How good is Kerberos? Even Microsoft recognizes it as superior and openly recommendsand supports it—albeit through their own version. The Kerberos protocol is more complicated than NTLM and harder to implement but worth the effort.

It’s also quite difficult to explain in a blog post. Kerberos involves complex interactions in which “tickets” or tokens are handed over by clients to various guardian servers. Faced with having to discuss Kerberos using all the usual protocol diagrams (seeWikipedia if you must), I decided to look for a better approach.

While I’ve no proof of this, it’s possible that the Kerberos authors may have been inspired by a real-world authentication systems used in theme parks—perhaps even looking at Disney World.

The General Admission Ticket (Kerberos’s Ticket Granting Ticket)

I haven’t been to the Magic Kingdom in a good long time, but I do remember an overall admission “passport” that allowed one to enter the park but also included tickets for individual rides—by the way, you can read more about Disney ticketing here.

Kerberos has a similar general admission concept. Unlike NTLM, user and client apps have to interact with what’s called a key distribution center (KDC), and initially the logon component, before they can even authenticate with and use individual services.

You can think of the front gate at Disney World, where you purchase the passport and the first round of authentication checks are made, as the Kerberos KDC logon service, and the Disney passport as what Kerberos refers to as the Ticket Granting Ticket or TGT.

As I recall, the passport lets you gain access to some of the rides, but then for the really good ones—say Pirates of the Caribbean—you’d have to pull out the individual tickets. So once in the park, the passport booklet always authenticates you as someone who has paid the fee to get in, thereby allowing Disney customers to use the individual ride tickets or purchase additional ones as well.

It Takes Three to Authenticate

NTLM authentication provides a binary relationship between a user and a server with no central authentication authority.  Perhaps more like a carnival where you show generic tickets—typically easy to forge!—directly to the ride attendant.

Kerberos, though, introduces a third component, the KDC. By the way, Kerberos refers to a mythological three-headed hound that guards the entrance to the underworld—we’re talking one tough guard doggy.

In the physical world, a complex administrative process is required to validate a document—theme part ticket, passport, driver’s license, etc.—as belonging to the holder and also making the paperwork difficult to duplicate. Perhaps not surprisingly, there’s similar complexity in issuing Kerberos’s TGT.

Here’s how it works.

Like NTLM, Kerberos also uses passwords and other IDs indirectly as keys to encrypt certain information. To start the authentication process, the Kerberos client sends basic identity data—user name and IP address—to the KDC. The KDC logon component than validates this against its internal database of users, and prepares the TGT—the digital version of the Disney passport.

The KDC first generates a random session ID. It uses that as a key to encrypt the identifying information sent by the client along with the session ID and some time stamp data. This forms the TGT. To say it another way, the TGT contains some unique data about the user along with the key that is then used to encrypt the whole shebang.

For this whole thing to work, the client needs the session ID.  But, of course, you can’t pass this as plain text, so the session ID is itself encrypted with, what else, but the user’s password or more precisely the hash of the password.  And then the two encrypted chunks of data are sent back to the user: the encrypted session ID and the encrypted TGT.

It’s a Small Authenticated World 

For all this to work, Kerberos makes the traditional assumption that only the user and the KDC have the password—a secret shared between the two of them. The client software asks the user for a password and then applies the hash to decrypt the session ID sent back by the KDC.  The unwrapped session ID decrypts the TGT.

At this point, the Kerberos client has access to the overall IT theme park, but just as in Disney, needs to go through another process to, so to speak, ride the server.

Just stepping back a bit, we can see the beauty of the first interaction to gain the TGT. The client is authenticated since it has the hash of the user’s secret password that decrypts the session ID. The server is authenticated since it has the session ID used to encrypt the TGT.

Of course, the server side of mutual authentication generally is not an issue in theme parks—unless somehow fake Disney Worlds started popping up that were taking advantage of paying customers!

But as I pointed out last time, rogue servers are a problem with NTLM’s challenge-response protocol. Kerberos completely solves this both at initial authentication and, as we’ll see next time, in gaining access to individual IT rides.

There’s a lot to digest here. I’ll continue with the rest of the Kerberos process in the next post.




Have you been in this movie?

You’ve been working for two months on a big project to analyze widgets — sales, marketing effectiveness, whatever. The first real deliverable is a presentation. A few versions are in your team’s shared folder, a few copies have been sent via email, one is in your home folder, your designer saved an update or two in Dropbox, and the final version will go in SharePoint.

You’re getting close, so your boss is now asking you to email her the latest version every other day (she doesn’t have access to the file server from her iPad). You’ve started to receive “Your mailbox is full” messages because the presentation is 15MB.

You want to pull your hair out. In the age of self-driving cars, shouldn’t file sharing be easier than this?

Check out our new whitepaper, 4 Things You Need to Know About the Future of File Sharing, to see if this story has a happy ending.