It's hard to read a blog, pick up a magazine or have a conversation about business these days without the term "Big Data" coming up in some form or another. Like many buzzwords and industry terms before it, the term "Big Data" has many true and possibly untrue things attributed to it. Regardless of whether or not it can cure the common cold or leap tall buildings in a single bound, Big Data is here to stay.
According to Forrester, Big Data is the frontier of a firm's ability to store, process and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks and serve customers. It is the product of companies and organizations trying to improve profits, products and customer satisfaction with the huge amount of information being generated every day.
According to IBM, we create about 2.5 quintillion bytes of data every day. This means in the last two years, we've created nine-tenths of the data that exists. This data comes from a variety of sources, including: finance, meteorology, GPS location signals, social media posts and more.
What this means is we are now faced with more data than we can reasonably manage with database management tools or more traditional data processing. While the size of the data set might vary, in 2012 the largest size that could be reasonably processed was in the exabytes.
We are also not just talking about the size of the data sets, while that certainly is a factor. There are three criteria that come into play and help us define something as Big Data:
- Volume, which we've already discussed, refers to the amount (in petabytes, exabytes, etc.) of information produced.
- Variety or how the different types of structured and unstructured data need to be processed and correlated.
- Velocity or how quickly data is produced that requires processing and analysis.
What Can You Do With It?
The thing to remember about Big Data is, whether or not we want it, it's there. People aren't going to stop tweeting, checking in on Foursquare, purchasing, commenting, and all the other behaviors and actions that contribute to this large amount and frequency of information being generated every minute.