Personally speaking, the IT industry has finally got exciting again! Why, because we are now living within the digital data age where almost everything we do generates some form of a data trail. Not only do we generate data, but our devices and interactions on the internet are all generating vast forms of data and they are all different types ranging from video, pictures, blogs, voice, tweets etc., to name just a few. The real value for businesses is to make use of data to provide a greater competitive edge or to support a greater level of service fostering loyalty to their brand.
To make good decisions, you need good data and good analytical capability. This is the challenge that most IT Software companies are facing today. The business of information management, helping companies to make sense of their proliferating data—is growing in leaps and bounds. In recent years Oracle, EMC, IBM, Microsoft, HP and SAP have between them, spent more than $15 billion on buying software firms specialising in data management and analytics (see image - courtsey of Ramon Chen:Cloud 'N Clear). This industry is estimated to be worth more than $100 billion and growing at almost 10% a year, roughly twice as fast as the software business as a whole.
The term itself "Big Data" is also misleading, akin to a marketing term, as it really doesn't quantify what "Big" actually means. For example, large enterprises such as Telecommunications or Finance organisations have lived with huge datasets for a long time. Expanding on storage for them is nothing new. However, as storage requirements continue to expand and prices continue to fall, today's "Big Data" is tomorrow's "medium" and next week's "small". I suppose the best definition that defines "Big Data" is, when the size of the data itself becomes part of the problem. Although, when it comes to actual size of data we are talking about gigabytes to petabytes and beyond.
However, many companies will still continue to use relational databases for years to come; they are still good if you have predictable queries running over a structured datasets. But, companies today are increasingly likely to be working with unstructured data. For example, if you wanted to understand simple attributes such as sentiment from social media feeds, than SQL will not provide this for you. The fact of getting more value out of your data is pushing companies to go beyond the structured relational models, to reviewing the integration of un-structured data platforms. Traditional structured data models simply will not be effective at this scale. While we've mentioned the un-structured data platform concept let me introduce "Hadoop" – Hadoop is the open source implementation of the technology that Google invented called "MapReduce". Rather than going into any technical detail at this point, Hadoop is the key component for the un-structured data platform. Also, to complement the data platform there are several Hadoop companion products that are being developed. Given that Hadoop is an open source project, no single company owns the rights. However, companies such as Cloudera and many others are contributing to the Open Source core, providing complementary management products, support and training for potential and existing Hadoop customers.
So, while adoption continues to grow, the industry is changing fast. I personally feel that some companies are simply jumping on to the bandwagon of the marketing term "Big Data". To be honest Big Data should really be viewed as the next generation of the Enterprise Data Warehouse (EDW Version 2.0).
Top takeaways challenges to support Big Data:
- Consolidation or partnership alliances formed to support Big Data open cloud infrastructures.
- Hadoop management capability will integrate to standard Enterprise IT management tools.
- Big Data Mining/Analytics will fundamentally change the way business is done not only online, but also offline.
- In-memory analytical tools will provide better insight for Big Data analytics for on the fly analytical results.
- Vendors such as Oracle, EMC, IBM, Microsoft, HP and SAP will develop Data Warehousing integration tools that will link Hadoop and other Big Data platforms to their structured database environments.
- Development of NonSQL databases – a distributed key-value database will continue to grow. IT companies will start to add additional functionality and integration capability for up-sell/cross-sell opportunities.
- There will be range of appliance hardware servers from IT vendors that will support Big Data platforms.
- Most organisations will be collecting vast amounts of data, but currently will not be doing anything with it.
Let's see how the industry shapes up to respond to this challenge.