The term big data started to show up sparingly in the early 1990s, and its prevalence and importance increased exponentially as years passed. Nowadays big data is often seen as integral to a company's data strategy.
Big data has specific characteristics and properties that can help you understand both the challenges and advantages of big data initiatives. The general consensus of the day is that there are specific attributes that define big data. In most big data circles, these are called the four V’s: volume, variety, velocity, and veracity. But Big Data goes beyond the 4V’s.
Let us take a look at the 4V’s and the attributes beyond it.
Volume: Big data is always large in terms of volume. The overall amount of information produced each day is rising exponentially. Some experts have predicted that the amount of data generated in the last two years is more than what has been created before that throughout human history. It is also projected that 2.3 trillion gigabytes of data are generated each day.
Variety: The endless variety of data is more impressive than its sheer volume. The diversity is not only regarding devices or sources of data generation but also the type of data, along with structured and unstructured data. Data is generated through multiple devices. At present, data scientists are more inquisitive about unstructured data, which can be in the form of social media comment, voice recording, or media files. Using machine learning techniques and natural language processing, data scientists can understand customer behaviour.
Velocity: The frequency of incoming data is also increasing each day. For example, many reports published on what happens in an internet second show overwhelming numbers. In an internet second, more than 50,000 Google searches are completed, more than 125,000 YouTube videos are viewed, 7,000 tweets are sent out, and more than 2 million emails are sent. The flow of data is huge and constant, which can help researchers and companies to make valuable decisions.
Veracity: Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems.
Validity: Similar to veracity, validity refers to how accurate and correct the data is for its intended use. According to Forbes, an estimated 60 percent of a data scientist's time is spent cleansing their data before being able to do any analysis. The benefit from big data analytics is only as good as its underlying data, so you need to adopt good data governance practices to ensure consistent data quality, common definitions, and metadata.
Volatility: Big data volatility refers to how long is data valid and how long should it be stored. In this world of real time data you need to determine at what point is data no longer relevant to the current analysis.
Visualization: Another characteristic of big data is how challenging it is to visualize. Current big data visualization tools face technical challenges due to limitations of in-memory technology and poor scalability, functionality, and response time. You can't rely on traditional graphs when trying to plot a billion data points, so you need different ways of representing data such as data clustering or using tree maps, sunbursts, parallel coordinates, circular network diagrams, or cone trees. Combine this with the multitude of variables resulting from big data's variety and velocity and the complex relationships between them, and you can see that developing a meaningful visualization is not easy.
Value: Last, but arguably the most important of all, is value. The other characteristics of big data are meaningless if you don't derive business value from the data. Substantial value can be found in big data, including understanding your customers better, targeting them accordingly, optimizing processes, and improving machine or business performance. You need to understand the potential, along with the more challenging characteristics, before embarking on a big data strategy.