For many years there was stability in the database segment of the IT industry. Relational Database Management Systems (RDBMS) such as Oracle, SQL Server and MySQL (to name a few) had proven themselves to be capable of handling vast amounts of data and that seemed to be all that modern businesses required. One of the leading solution providers in the ERP space said that it had 30,000 tables in its ERP database and that could handle the complexity of practically any business.
But the advent of Big Data changed the database scenario as nothing before it. Many IT professionals and business executives still do not comprehend the challenges and opportunities presented by Big Data and how the cloud is uniquely positioned to handle this. Understanding a few simple concepts and capabilities will help you extract the best from this emerging area.
What is Big Data
In computing parlance, Big Data is defined as datasets that are so large that they cannot be handled by conventional RDBMS. Handling this data which can be up to zettabytes in volume (1021 bytes) is a different task altogether. You can encounter data of these volumes if you are working in scientific fields, in weather forecasting and in pharmaceutical research etc. Any large city, with its thousands of security cameras all recording video footage could easily hit this kind of archival volume in fairly short periods of time. All large international retailers, relying extensively on RFID chips to manage the movement of inventory and an international supply chain also generate enormous quantities of data.
While all business analysts agree that there are gold mines of business information in this data, the challenge is to understand the trends and to be able to put them to business use. The capability to handle Big Data is required and only cloud based applications have the capability to handle such a task.
Turning to the Cloud to handle Big Data
One initiative that has proved extremely successful in handling enormous quantities of data is the open source cloud based Big Data management system known as Hadoop1. This was built from day one with the requirements of handling Big Data in mind. Hadoop combines both data processing capabilities and the physical cluster management capability to handle very large clusters of database servers and data. For example, in 2008, Yahoo! reported that it was using the world’s largest Hadoop application then which was working on a 10,000 core Linux cluster and handled all web queries.
Two years later, Facebook announced that they were running the largest cluster which was handling 21 PB of storage (1 PB = 1015 bytes), soon thereafter, this grew to 30 PB. This kind of storage and processing capability simply cannot be built in a conventional in-house IT center. You need the kind of capabilities that only the cloud can provide. In yet another example of Big Data management, the New York Times used 100 Amazon Elastic Compute Cloud instances to process 4 TB of raw images into 11 million PDF files. The task took about 24 hours and cost – hold your breath – all of $240. This is the kind of capability that is possible with cloud computing1.
In yet another case of using the cloud to handle Big Data efficiently1, a cancer research team built a 50,000 core supercomputer to perform some core cancer research computations. This supercomputer was assembled in data centers spanning the globe. This virtual supercomputer ran for three hours and cost just $4,828 per hour to run. Such a supercomputing cluster can be set up by any qualified person in a matter of hours without any external help.
Maybe you are not yet into Big Data, but if you are passionate about growing your comp-any, it is only a matter of time before you will encounter Big Data with its many challenges and opportunities. It is important to understand that handling Big Data need not be a constraint and that you can take advantage of cloud computing techniques to get the best out of large data sets that your competitors may be afraid to make use of.
Be Part of Our Cloud Conversation
About the Guest Author:
Sanjay Srivastava has been active in computing infrastructure and has participated in major projects on cloud computing, networking, VoIP and in creation of applications running over distributed databases. Due to a military background, his focus has always been on stability and availability of infrastructure. Sanjay was the Director of Information Technology in a major enterprise and managed the transition from legacy software to fully networked operations using private cloud infrastructure. He now writes extensively on cloud computing and networking and is about to move to his farm in Central India where he plans to use cloud computing and modern technology to improve the lives of rural folk in India.