Big data is an expression that tends to be overused in the recent years,
but really does signify an enormous breakthrough in our business data analysis capabilities.
The advancements in the BI world have become possible thanks to technological developments,
and the ability to accumulate huge amounts of data.
Big data systems can be said to include the 4 Vs:
- Volume: Systems that create massive amounts of data (hundreds of Teras), require data handling techniques which are different than what is available today.
- Variety: Data systems which are based on tabular structured data have been in use for many years. Nowadays, other, more flexible formats are available. The source systems are portable, cellular-based systems or chips, which transmit signals, busy eCommerce websites etc.
- Veracity: Data integrity and reliability comes under question, because due to the nature of the data, it is not always clearly structured, and systems were not always designed to create logs which are intelligible and easy to analyze. Thus, there is a need for a new kind of data analysis.
- Velocity: The increase in data within these systems is immense. Systems such as these aggregate SMSs, track Facebook activity, online data transmission information, stock trading etc. This exponential increase in data requires different architectural and technological solutions, compared to those which were previously available.
We at DataCube believe that at the base of a Big Data system, there must be a clearly defined business need that the organization aims to answer.
Without this, one will needlessly waste resources and manpower.
To demonstrate our point, we prepared a short clip describing a Big Data system we constructed for a customer of ours:
This case study exemplifies how a system that produces data in different locations in the world, part of them on the cloud and part on premises, is managed.
In the last 2 years, DataCube and accumulated vast and varied experience in the design and implementation of Big Data systems, built upon the most advanced technologies used for cloud or on premises systems, and in implementing advanced statistical models, which allow for efficient prediction of machine and user behavior patterns.