Friday, August 1, 2014

Big Data

"Clifford, what do you exactly do you do/study"?

"Oh, well I work with big data".

"Really, what does that mean"?

"It means that I use very large sets of data to build statistical/mathematical models to check for trends, look at probabilities...."

This question has been posed to me many times since I first started to dedicate my life to the exciting world of big data. I love to share with people what it is exactly what it is that I do, but I often get the feeling that most people really don't fully grasp what it is that I do. However, when I start to explain that I use math to make predictive models, I see peoples eyes glass over with a "Wow this guy must be some kind of genius". While I would love to be able to make the claim that I am some kind of Steven Hawking, I am just your average math lover.

Ever since I was a little kid, math has been one of the very few things in life that has made any sense. To this day I still struggle deeply with grammar and spelling, if it was not for spell check I would be in deep trouble. However, math has always been a subject that has been my guiding light in the world of academia.Which is why I got my degree in Actuarial Science (Business Mathematics and Statistics).

Now that you understand why is it I do what I do, let me further explain what it is that I do. I actually work for The Kroger Company, they are one of the nations largest "traditional" grocery store chains. The department that I work for, focuses on trying to improve the shopping process, and the team that I work for specifically works on trying to make your check-out experience quick, smooth, and painless. I am the analyst on my team, so that means that I get to be the one that gets to have the real fun, no matter what anyone else thinks or says.

How does this all relate to big data? Big data is simply just a name that has come about to identify all the data that companies these days collect on their customers. This data is things like buying trends, shopping trends, favorite items, etc, etc. People like myself, who have spent many hours learning how to handle this amount of data, help companies understand what it is that their data is telling them. I use various computer software to help "mine" through the data. Once I have identified the type of data that I want, I then build a statistical/mathematical model that helps me to understand what the data is telling me. Often the size of the data that I am working with numbers in the millions of observations. My largest data set to date has been in excess of 13 million unique observations.

One thing you should know about big data is that it takes lots of time and patients. When I work with a set of data that is larger of about a million observations, my computer really slows down and has a very hard time working with that much data. In addition to that, you also have to be willing to go down one path, realize that all the work that you have done up to that point is wrong, and start over. I have done this more times then I care to admit. But in the end, with lots of hard work and time I am able to understand the story that the data is telling me.

So there you have it, what it is that I actually do. Yes, it is really nerdy, but I love it. And no I really am not that smart, I just really like what I do.