Big Data, for Better or Worse: 90% of World’s Data Generated Over Last Two Years

A full 90% of all the data in the world has been generated over the last two years. The internet companies are awash with data that can be grouped and utilised. Is this a good thing?

An increasing amount of data is becoming available on the internet. Each and every one of us is constantly producing and releasing data about ourselves. We do this either by moving around passively — our behaviour being registered by cameras or card usage — or by logging onto our PCs and surfing the net.

The volumes of data make up what has been designated ‘Big Data‘ — where data about individuals, groups and periods of time are combined into bigger groups or longer periods of time.

Research advantages

Petter Bae Brandtzæg of SINTEF ICT points to the huge research centres now developed at internet companies such as Facebook and Google.

‘The advantage they have is the enormous volume of data that other social researchers can only dream of,’ he says. However, it has also changed the way SINTEF researchers work. Even those not working in the major internet companies can still access Big Data.

Brandtzæg has investigated a tool called Wisdom developed by the American-based company MicroStrategy, and has started applying it in the delTA-project which addresses young people’s social activity on the internet.

‘This gives me access to data about over 20 million people — without making a single inquiry. I can analyse different preferences on Facebook and look at age and gender differences between various groups and nations across the world. So far I have compared gender differences in social activity on Facebook between people in Norway, Spain, England, USA, Russia, Egypt, India and China.’

Data protection is a problem we often associate with Big Data, but according to Brandtzæg, data from Wisdom is restricted to large groups and does not go down to ‘individual level’. This makes it possible for him to compare large groups without any data protection problems.

Short, transitory information

Big Data makes it possible to achieve research results that cover a wide range of issues, and can tell us a great deal about developments in the world in many different areas. It is possible to carry out thorough analyses and comparisons between countries and different genders.

For example, researchers in Facebook’s own research department have looked into how people across the world update their messages, and what kind of information they post about themselves and their lives.

‘The surveys show that the messages people have been posting have been getting shorter each year,’ says Brandtzæg. ‘This reflects the increase in other types of fast social communication, such as Twitter, which has achieved huge popularity because it is about expressing oneself briefly and concisely in a maximum of 140 characters. Another trend in that direction is that young people are telling their stories using images rather than text. The current Instragram craze could be due to the fact that you don’t have to write anything.

Comparing data

These volumes of data can therefore provide us with useful information. However, Big Data can become a problem when different sources of data are compared for commercial use in targeted advertising campaigns.

It is becoming increasingly common for data about our location to be linked to our purchasing preferences — about what we like and don’t like. Facebook has made big strides in this area.

Vulnerability and data protection are the dark sides of our new entry into huge data sets and registers.

‘Who knows — in two years, perhaps the tax register will be linked to the health and insurance register?’ says Petter Bae Brandtzæg. ‘And tax data can go astray; it has happened before.’

What opinions are being communicated?

The overwhelming volume of data being produced raises the issue of the content of all this information. What is being communicated?

The Networked Systems and Services department at SINTEF, to which Petter Bae Brandtzæg belongs, has recently had a bid accepted for the EU REVEAL project. In this project, researchers will look at combinations of different data sources and learn about people’s ability to express themselves, and about the quality and truthfulness of data registered on social media. What is the content of these media? Who are the senders? Who else has said the same thing?

‘We will look at various sources in relation to each other, and for example find out how trustworthy Twitter messages are,’ says Brandtzæg. He also points to the new trend in fragmenting information across many channels — such as Facebook, SMS, e-mail, blogs, Twitter and Instagram.

How trustworthy are the media?

The ability to disseminate information to large groups in real time has made Twitter and Facebook important communications tools when major events take place.

When hunting for the Boston terrorists, the police, authorities and traditional media also used social media like Twitter, Instagram, Reddit and Facebook to actively collect and disseminate information about the incident. Several voluntary groups were also set up via social media, in order to try and help the police. However, social media as channels of communications proved to be not entirely beneficial, but also a source of confusion and misinformation.

Can Big Data be used as a resource for journalists, and how trustworthy is the information available on social media? This is one of the subjects that the SINTEF researchers will be looking into as part of the EU REVEAL project.

Related articles


How Sports Fans Engage With Social Media

Below there are some great statistics on how sports fans engage with social media:

Some key highlights:

  • The number 1 social media platform is Facebook, followed by Youtube and then twitter
  • On game day the solution that is used most is Twitter followed by Facebook
  • After the game instagram is the out right winner
  • Google+ and YouTube are on the rise among fans. When fans responded to a question about which platforms they use to “disseminate and acquire sports information,” those two platforms showed the most year-over-year growth, at 94% and 35%, respectively.


Fan Engagement _Infograph_FinalVersion

Thanks to

Big Data and The Nonprofit Organization

I’m a little bit behind on the hot topic of Big Data, but I’ve been meaning to write about it for awhile, and just got… busy. It’s maybe less of a hot topic now than a month or few months ago, but I don’t think it’s being talked about enough when it comes to it’s role in the nonprofit industry.

We mostly hear about Big Data coming out of the Silicon Valley and Facebook. The information Facebook has on it’s users is astounding, frightening even. The ads on my Facebook sidebar are a fairly accurate depiction of the things I’d actually be interested in – free trades if I join E-TradeHuman Rights Campaign, MBA scholarship opportunities, etc… My posts about starting grad school, social liberalism, and my involvement in the stock markets have made their mark, whether I intended them to or not. Every search we make online is recorded, and companies are using Big Data to profit Big Time.

That’s all well and good, I’m putting myself out there on the internet, let them use whatever information they want to extract. But how can nonprofits get in on this action? Nonprofits won’t just start marketing hiking boots to people who love the outdoors. It’s not so simple.

I work for a social services nonprofit. We provide direct services to our target populations and we collect a fair amount of demographics on them. For example, in terms of fighting hunger, we register the income of families and level of hunger insecurity. We can compare this to ourselves overtime to see if there has been any improvement; now with more prolific information available, we can also compare it to wider statistics across San Diego County or across the nation. Sure, that’s a basic example and frankly, those kind of reports have been coming out of nonprofits for a long time since hunger is such a significant force in the industry.

Especially awareness and advocacy is a realm within the nonprofit industry that can benefit from the use of Big Data. Harvard Business Review discusses how international information gathered about human rights abuses can bring about truth that would otherwise remain under the radar. Also, in terms of gauging literacy not only locally, but nationally and globally as well.

Big Data shows where social services are falling short and where they are succeeding. It shows what funds are being used most effectively so that donors and prospective donors can make intelligent choices so that $1 today creates a bigger impact than $1 yesterday.

The unfortunate part of this story, however, is that many nonprofits won’t have the opportunity to utilize Big Data. Many will likely admit the benefit, but simply don’t have the resources to spend time, money, and energy on it. Donors and volunteers want their contributions to go to immediate needs, not long-term innovation. The fact that most nonprofits are entirely volunteer-run, with zero personnel, doesn’t help the case either. For this, nonprofits are always going to be a step behind corporations. Facebook will always have the upper leg on Big Data than any nonprofit organization, no matter how big.

Big Data has to be publicly available to everyone. Companies, organizations, individuals can all decide to use it in order to boost profits, raise awareness, provide services, the options are endless. But if Big Data remains to be an asset only to those that can afford it, it’s benefit to the nonprofit industry may never be realized.

posted in BusinessEconomicsNonprofitSocial Sciences by 

Email Security: It’s Every Employee’s Business

Email security has become part of the job description for every employee. All it takes is one employee to cause a breach that opens up the entire company. For example, consider The New York Times: the recent breach by Chinese hackers was done via a phishing or spear phishing email. All that was necessary was that one email to be opened, and The New York Times network was accessible to the hackers. And once an attacker is behind the firewall, then the hacker can do anything.

Recently, hackers have been getting even more creative. One of the students in the information security class I teach showed me an email that she received. It contained a message about email phishing schemes and what to look for. The subject line was incorrect when compared with previous emails from the same organization. The body of the email had an incorrect logo and a slightly incorrect signature line. Also, there was a link with a call to action that requested my student to sign in to her account and learn more. She reported this email to the company who allegedly sent it. Had my student not been aware of phishing schemes, she might have clicked on the link and opened up her system to hackers.

Without proper training, it is easy for an employee to accidentally open and launch a window for a hacker. It is the duty of every personnel department to train new employees as to what to look for when receiving email messages. This information should be included in employee manuals and should also be posted on lunch room walls as reminders. With the volume of emails we all receive on a daily basis, it is very easy to forget that one of the emails could be a “Bomb” that could cause a breach. And a network breach can lead to data loss, loss of reputation, and denial of services for your employees and clients.

There are two types of phishing email messages: phishing and spear phishing. Phishing is a generic type of email that is sent to everyone in a company with the hope that someone will open the email and click on a link or open an attachment. There are no names attached to it, the subject line is generic, and the TO: line usually says recipients_not_disclosed. That’s a dead giveaway! Finally, the FROM line does not conform to corporate email standards.

The second form of phishing is called spear phishing. This type of email is more insidious. Someone or some organization has taken the time to find information about a specific employee and personalize an email message to make it look like it has been sent to that person from someone he or she knows. As a result, the email looks legitimate. This email is designed through a few methods. The attacker scours Facebook, LinkedIn, Twitter, and possibly financial information sites, such as, Hoovers. The hacker may make calls to a company’s receptionist to find other pertinent information regarding the email recipient, possibly email address and/or phone number. In bigger companies, they may even call the IT department and claim that they are the person of interest and forgot their email password and ask for it to be reset. Hopefully, there are policies in place with the IT department that make it impossible for someone to change a password without multifactor authentication (multiple types of ID must be given before the password can be changed – this is an issue for another post). Spear phishing emails are usually sent to management-level employees since they tend to have more network privileges.

Once again, even with spear phishing, the questions one must ask include: Are you expecting an email from this person and do you even know him or her? Is there a link in the body of the email? If yes, do not click on it. If you really must know what the link is, send it to the IT department or your security team and let them confirm if it is legitimate. Due to the speed of business these days, it may be difficult to remember what to look for, but it’s also difficult to recover from a breach. It can happen to anyone, don’t let it be you for your company’s sake.

Host computers should all have a good virus scanner to scan inbound emails and attachments. After that, here are some things to look for when determining if you’re looking at a phishing email. Does the email address in the FROM: line correspond to the corporate email layout? This may mean: last name first, or first name last. When a message is sent to you, are you expecting an email from that person or is the email coming from someone you don’t know? Look at the subject line of the email: Are there any misspellings in the subject line, and does it make sense?

Make it a policy to never click on live links within an email message. A live link (one that is colored and underlined) could look like a legitimate link but the actual link may send you somewhere else. If you really must know what the link is, copy and paste it into the notepad program. This will show where the link is actually pointing you to. Hovering the mouse over the link will reveal the actual URL. However, if the URL is embedded in an image within the email, you will have to retype the entire URL. There are two other options for shortened links (for example, or

Sometimes emails arrive in your inbox under the guise of legitimacy. They appear to come from somewhere within your organization, but they’re not. An email arrives and asks to change your security credentials – but don’t be fooled. First of all, there should be a general announcement regarding this topic distributed company-wide to all users. It will be sent out by one person, not from “The Security Team.” Be aware of that. Emails regarding this sensitive issue must be sent by individuals, not groups, and an email sent by an internal employee will adhere to corporate email structure, fakes do not.

Many breaches come from an email that looks legitimate from an internal employee. So, look at the signature line at the bottom of the email. If it isn’t the standard signature line that your company uses for all emails, it’s probably suspect. I realize that checking an email to be sure that it’s real can be time-consuming, but the more you look for errors, the better you become at spotting them.

The larger a company is, the harder it is to remind employees about staying vigilant. But in the long run, what’s worse: reminders or hackers? You do the math.



This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

Changes Big Data And Technology Have Brought Into The Retail Industry

There was once a time when retailers relied on large spreadsheets to keep track of things. Critical employees had to fly around in order to find the best products and maintain the best inventory. Today, the scenario is totally different. Big Data has played a large part in the changes witnessed in the retail industry.

Some have adapted well to the changes (some have even taken advantage of the changes) and some have struggled with them. Marianne Bickle (Contributor,, looks at some of the changes that Big Data and technology have brought about…

1. Retailers are finding it more difficult to make predictions. This might actually come as a surprise to many considering that Big Data actually empowers retailers to pinpoint what a particular customer wants. What many people do NOT realize is that it is “a two-edged sword.”

A consumer group might be lost simply because they have moved to a different mobile device which isn’t supported by their current retailer. In this case, it might be difficult to predict how many customers you’ll lose (or have lost).

2. Poor customer experience reports now spread like wild fire. It’s no longer the case of one unhappy customer telling ten other people. With social media, it could be a few millions before your company prepares an official position. There are instances of viral videos that have “hurt” businesses.

3. Trends are changing faster and businesses have many more tools that can help them gather vital information about the trends (Twitter, Facebook, email, etc).

4. It is now critical that the analysis of data provides insight into why and how consumers buy a particular product. Such analysis should also provide their demographics and psycho-graphics. Otherwise, money spent on advertisement will be a waste.

To read the rest of this article, go here…

Revealed: Secret PIIs in your Unstructured Data!

Personally identifiable information or PII is pretty intuitive. If you know someone’s phone, social security, or credit card number, you have a direct link to their identity. Hackers use these identifiers, along with a few more personal details, as keys to unlock data, steal identities, and ultimately take your money. In some of my recent blogging, I’ve referred to the blurring of lines between PII and non-PII data. Case in point: it’s been known for at least 10 years that there are specific pieces of data, which in isolation may appear anonymous, but when taken together they’re just as effective at identifying a person as traditional PII.

The easiest to understand of these so called quasi-PIIs is the trio of full birth date, zip code, and gender. If a company  published a dataset that had been “de-identified” by removing all the standard PIIs, but left those three data items alone, a smart hacker could with very high likelihood find the name and address of the person behind that data.

Why would this work?  At a very basic level, the identity thief is effectively doing the work of a detective–essentially going through lists looking for matches. The lists in this case are voting records, which are available from most US towns and counties at a nominal fee– typically around $40. Voting records contain name, address, and most importantly full birth date; zip codes can be easily determined from the address.

By looking for matching birth dates and zip codes, savvy hackers narrow down the search to a few names. Add gender information and for most zip codes in the US, hackers can arrive at a unique name. Of course, the more additional information or clues gathered, especially taken from social media and other web sites, the easier it is to filter out names when there’s more than one candidate.

A quick back of the envelope calculation tells you why one might do very well with this approach. Taking 365 days—ignoring leap years—and multiplying by an average age of 80, it works out that a complete birth date gives 29,200 “bins” to place a zip code’s worth of people. If you have gender information, you double the number of slots, to 58,400.

I can hear nitpickers out there saying that voting rolls contain names of those over the age of 18, so you would have to remove 6570 slots. True enough, but researchers have shown it’s possible to exploit Facebook’s leaky handling of data on school age minors to partially address this gap.

In any case, based on the last US census, there are over 40,000 zip codes, with an average of only 7000 people per zip code. On a gut level, it seems there’s a good chance most of those 7000 people will find themselves alone in one of those 58,400 slots. In other words, the odds are very good that most of them won’t share the same date of birth, zip code, and gender.

The real validation of this type of  hacking attack came from Carnegie Mellon University computer science professor and data privacy expert Latanya Sweeney, who ran the numbers back in 2000. Using then current census data (broken down by zip codes and age groups), she was able to identify 87% of the people in the US working with just those three non-PIIs.

Fortunately, Sweeney’s research and results from other experts have made their way to policy makers. For example, when medical research on patients is published, HIPAA’s Safe Harbor de-identification rules say that no geographic unit smaller than a state can be included in the public data. Full dates (e.g., admission, birth) must also have the year removed.

With US regulations on PII varying by the particular legislation, this is by no means a universal rule. However, the Federal Trade Commission, an influential regulatory agency on privacy matters, has recently issued new best practices on data de-identification. They’ve called for all companies to achieve a “reasonable level of confidence” that their public data can’t be linked back to an individual. Clearly, the combination of birth date, zip code, and gender would fail that test.

Are there other quasi-PII’s out there? Of course! The larger problem is that consumers are sharing all kinds of information about themselves on web sites and social forums. In a possible scenario, think of an online retailer collecting preference data about its customers—sports interests, hobbies, etc.—along with geographic data and perhaps income information.

These data items would not be considered traditional PII.  If hackers pulled this “anonymous” data from a poorly permissioned file on a server, you could imagine them mining various special interest sites, looking for names that match up based on those interests and geo data.  Once they have a match, the next step might be a phishing attack, with the hackers pretending to be the retailer.

For companies that want to stay ahead of the coming stricter de-identification rules—that are being considered here in the US  and will likely become law in the EU—it would be worth their while to start carefully reviewing their non-PII data. Wherever that data might be on their file system.


Looking for social commerce inspiration?  Here’s an interesting initiative from Magazine Luiza, Brazil’s4th largest retailer , that builds on the curated commerce trend.

Last year and bucking the trend of bringing social to the store, Magazine Luiza brought the store to social, by inviting people to curate their own mini-store on Orkut and Facebook.

The ‘Your Store’ (Magazine Você) initiative invited consumers to stock their own mini-store with up to 60 items from Magazine Luiza’s inventory.  Users could personalise the store, offer personal reviews and comments, and get 2.5% – 4.5% commission for any sales made.  Fulfilment and logistics was handled by Magazine Luiza.

Contagious Magazine reports that whilst the idea was popular – 53,000 stores were opened, and whilst conversion rates were 40% higher than traditional stores, only a total of 10,000 products were sold.

Whilst this could be seen as another nail in the coffin of the ‘bring the store to social’ variant of social commerce, we think it points to an opportunity.

How about if the idea was tweaked – a la OpenSky – to offer member organisations/certification bodies for independent professionals a simple solution for their members  (Think personal trainers, caterers,  yoga/Zumba instructors, photographers, hairdressers, educators). Self-employed professionals depend on, and use their social networks and followers to build their businesses, so there would be a natural fit for curated store on a blog, linked from YouTube, or even Facebook.

If you’ve ever been to a Zumba instructor event, you’ll see why this would work.  Instructors buying sack loads of Zumba gear to sell to their members.  It’d be a useful benefit from the Zumba Instructor Network if they could do this without having to manhandle the gear themselves – and it’d keep member dues coming in.

As the science of promotions shows, the key to success will, of course, be to run any such store with two-sided promotions, both the curator and customer should get a better price than can be found elsewhere. Otherwise the idea is dead in the water.  But done right, here’s a real opportunity in the social commerce space.


Thanks to social commerce today.

Connecting the world a Microsoft documentary

This video documentary by Microsoft explores how digital and specifically, Interaction Design, is and will change our lives in an ever connect world. It’s 18 minutes long but well worth a watch. I thought I’d paraphrase a few of the most thought provoking comments from the documentary below:

“‘Without humans there’s nothing interesting to talk about.”

“We are in the phase where we are a little confused about what’s important in life.”

“It’s about understanding that ecosystem where the human is at the centre.”

“It’s about getting more of the physical world connected with the digital world.”

“What we design as a man-made object is only complete when there are people using it”


Mobile Apps Rankings

Wondering why Apple (AAPL) is sinking so much effort into building its own Maps application? Because it doesn’t want Google (GOOG) to gobble up all the revenue from big-name mobile applications. ComScore has published its most recent monthly review of the top iOS and Android apps in the United States ranked by unique visitors and has found that Google captured 5 of the top 6 spots with Google Maps, Google Play, Google Search, Gmail and YouTube. In fact, Facebook (FB) was the only non-Google app to crack the top 6, although it also had the benefit of being the most-visited app in the entire country by a margin of more than 10 million unique visitors. iTunes was the only Apple app to crack the top 10, meanwhile, as it ranked eighth with roughly 46 million unique visitors last month.