When many people think of the term “Big Data,” they think of companies collecting and using information in ways they weren’t doing before. In reality, that’s only half true. Corporations of all sizes have always collected data, but the digital era is the first in which they’ve been able to make meaningful use of it.
However, after last week’s Big Data Toronto conference at the Metro Toronto Convention Centre, I (foolishly) thought I understood what “big data” meant.
“I think of big data as being able to provide smarter solutions and provide innovation through large data sets,” said David Stewart, chief business officer at Turo, which is the definition I would be fed in one way or another all day long. Stewart has worked for several technology companies in the past, including Google, and eventually came to understand that data is central to how technology operates.
Turo is a San Francisco-based car-sharing platform that operates in 2500 cities and 300 airports thus far. The company expanded into Canada this past April. It requires users to provide their license plate, insurance information and credit card information since the money from renting out a vehicle is deposited into users accounts at the end of the month. The company has achieved approximately 1 million sign ups since launching in 2009. Stewart explained all this on a panel about fifteen minutes after we spoke.
Based on the information I’d gathered over the course of June 14th, I decided ‘Big Data’ must mean the ability to crunch large datasets to derive insights from that data – and in the simplest context, I was right.
It wasn’t until I arrived back at the building I work in and announced that I’d returned from my enlightening journey into the world of data, analytics and insights that someone put into context how relative the term “big data” actually was.
After briefing my editor on what I’d gathered from the day, the interviews I’d done and the panels I’d sat in on, Jon (who’s last name I won’t use) a member of another company housed in our building who’d overheard our conversation, looked over and asked me the following question.
”So, what’s big data?” “Excuse me?” I responded. “Big data? What is it?” he asked again.
Thinking about it in that moment, I realized I didn’t know. Everyone I’d spoken to had given me the same general definition, but no one had given me any numbers, facts or figures to actually quantify how much data qualifies as “big data.”
“A lot of people try to define “big data” in terms of volume, but I try to define it in terms of the value you can extract from that data,” remarked Chul Lee, head data scientist at Under Armour.
While companies have always gathered data, being able to store it, quantify and feed it back to the consumer in the form of a better customer experience is relatively new for companies that are not Google and didn’t build their entire business around the process of storing and returning data.
Kobo, the e-book reading platform, is a prime example of the give-and-take mentality that drives most data-backed industries. The company collects readers’ data and returning it to them it in the form of a personalized reading list for each of its 12 million users (estimate as of 2013).
Kobo collects different kinds of information about its users including purchase details, banking information, search information and reviews left on different books. The company also collects data about reading speeds and genre for the user’s own personal use, though there is an opt-out option for some of this collection. To the Head of Big Data at Kobo, Inmar Givoni, the term refers to the intersection of three things:
“The engineering of data structure to handle massive amounts of infrastructure; the algorithmic aspects of it, or machine learning and finding patterns in the data in a meaningful way; and domain expertise or understanding how these things can be used for your own particular line of services.”
However Jon was working on a program that was comprised of approximately 15 million rows of data as we discussed what “massive amounts of data” really meant. He didn’t like to use the phrase “Big Data” because he didn’t think it was applicable to the volumes he had to work with and conveyed that the rest of his team felt the same.
Not only did I have to Google what a “row” was, but I then had to ask him whether or not 15 million was a lot. I subsequently discovered that a row was defined as a singular, structured data item in a table that’s part of a larger database, and that no, it wasn’t a lot. He went on to say that even if he’d been working on a program that comprised 100 million rows of data, he wouldn’t consider that a lot.
While I had heard from industry experts hailing from IBM, Wattpad, InterSystems, Tangerine, etc. who went on to explain the process of storing, organizing and evaluating “massive amounts of data,” he referenced that the trillions of data rows that Google had to pull from in order to produce that definition in just 0.58 seconds.
After doing a little digging, I was finally able to uncover that Google stores approximately 10 to 15 exabytes of data and facilitates an average of approximately 40,000 search queries per second.
One exabyte equals about 1 million terabytes or approximately 30 million personal computers. To give you an even better sense of how much that is, Canada houses approximately 35 million people in total. The entire world produces about 2.5 exabytes of data per day and five exabytes is equal to every word ever spoken by mankind…ever.
Now that is a lot of data.
One by one, each company presenting at Big Data Toronto was more data literate than the last. Up first, was the opening keynote Jonathan Carrigan, senior director of business intelligence and platforms at Maple Leaf Sports Entertainment. Though not a data scientist himself, Carrigan calls himself a “coach” of sorts, who provides a strategic framework for his team and proceeds to “get the hell out of the way and let them do their job,” technically speaking.
He describes MLSE as using all the data they collect from sales, events, sports games, restaurants, tickets, etc., to improve the overall function of the business and gain some insight into what makes the company money, where it comes from and how they can capitalize on that data to make more.
Carrigan goes on to say that the company “adopted a vision for leveraging data and analytics,” over a year and a half ago. If it’s only been a year since one of Canada’s largest companies began trying to understand their data, then imagine for a moment the data-literacy of the average Canadian consumer.
“Data science is one of the most important fields moving forward because industries that never needed to leverage data before are starting to,” said Turo’s Stewart about the importance of data science to companies in the future. He went on to describe how consumers feel about their personal data being used by the private sector.
“I think this becomes most uncomfortable when the data becomes the business model. Frankly, I think there should be regulation and privacy requirements to make sure companies are acting responsibly with that data.”
Under Armour’s newly recruited Head of Data Science, Lee, joined the company when the startup he previously worked for, My Fitness Pal, was acquired, also spoke on the topic of data. Under Armour, according to Lee, had an extremely underdeveloped data team before he arrived. Now, in order to feed its newly implemented technology and wearables division, the company collects all kinds of fitness and health data from its consumers.
“It was a really fascinating experience because though Under Armour didn’t have a data team per say, it did have tons of data assets. The ultimate goal of storing data is to create something valuable for the customer” said Lee.
The conversation about “Big Data” represents the first time in history companies have been able to not only store, but make practical use of the information they collect from us. In the near future technology will likely know so much about us solely from the data we share that it will proceed to create for us a personal world tailored exactly to our needs and wants – before we even have the chance to need or want it.
While the option to opt out still exists, such as the one provided by Google’s latest AI messaging project Allo, people that refuse to relinquish enough personal data will limit their interaction with wider and wider circles of technology, as the dependency on data becomes more realized.
“The less data you’re willing to share, the less we can personalize the experience for you,” remarked Kobo’s Givoni.
To many consumers, the most important questions about data collection are those about privacy, while corporations value the insights they can derive from the data they collect. Not only can companies learn more and more about the people they’re selling to, but they can use that information to sell better and more directly.
It’s becoming more and more important for companies too be able to verify that the information they have in their care is secured against the ever-growing threat of cyber attacks. The company cares, Lee stated, because it knows that consumers relinquish their information on the understanding that it will be protected.
“With technology innovation, we should be mindful of some risk, but we really care about our privacy issues and protecting our data house,” said Under Armour’s Lee.
Givoni went on to confirm however that she believed Industry and academic standards will emerge that require companies to publicize that their websites and apps adhere to certain principles of data protection. “As a scientist, I view it as less of a dilemma and more of a problem to solve. People who do this for a living will have to think about it and find a way for data to be used in a way that makes people feel comfortable,” she said.
So in essence, “big data” can be defined as the biggest data set a company can fathom to deal with at any given time. Whatever “big” means to you, that’s what “big data” is, since there is no quantifiable measurement. However, more meaningfully, “big data” refers to detecting patterns within massive (massive) datasets to deliver an experience for the customer that’s tailored specifically to their needs. A company gets to know you better, so it can serve you better. However, it goes far deeper than that – into the homes, smartphones and bank accounts of consumers.
Despite this knowledge, while data mining can make many people uncomfortable, it’s a process that’s consistently being refined and perfected. It will grow increasingly necessary and more inconspicuous because as we’ve seen, statistical insights or “big data” will determine the way companies operate in the future, and therefore the day will surely come when encryption and innovation can coexist.
Until that day comes, say hello to the hundreds of corporations peering up at you through your smartphone display.