GMO Cloud

When Data Grows – the Cloud is Built for Big Data

July 16, 2012 | Uncategorized

For many years there was stability in the database segment of the IT industry. Relational Database Management Systems (RDBMS) such as Oracle, SQL Server and MySQL (to name a few) had proven themselves capable of handling vast amounts of data. One of the leading solution providers in the ERP space said that it had 30,000 tables in its ERP database that could handle the complexity of practically any business.

But the advent of Big Data changed the database scenario overwhelmingly. Many IT professionals and business executives still do not comprehend the challenges and opportunities presented by Big Data and how the cloud is uniquely positioned to handle this. Understanding a few simple concepts and capabilities will help you extract the best from this emerging area.

What is Big Data?

In computing parlance, Big Data is defined as “datasets so large as to be unmanageable for conventional RDBMS.” Handling zettabytes (1021 bytes) of data is a different task altogether, and you can encounter data of these volumes in scientific fields, weather forecasting and pharmaceutical research. Similarly, any large city, with its thousands of security cameras all recording video footage could easily hit this kind of archival volume. All large international retailers, relying extensively on RFID chips to manage the movement of inventory and an international supply chain, also generate enormous quantities of data.

While all business analysts agree that there are gold mines of business information in this data, the challenge is to understand the trends and put them to business use. The capability to handle Big Data is required and only cloud-based applications can currently do it.

Turning to the Cloud to handle Big Data

One initiative proven extremely successful in handling enormous quantities of data is the open-source cloud-based Big Data management system known as Hadoop1. This was built from day one with the requirements of handling Big Data in mind. Hadoop combines both data-processing capabilities and the physical cluster management capability to handle very large database servers and data. For example, in 2008, Yahoo! reported that it was using the world’s largest Hadoop application, working on a 10,000-core Linux cluster which handled all web queries.

Two years later, Facebook announced that they were running the largest cluster which was handling 21 PB of storage (1 PB = 1015 bytes). This grew to 30 PB soon after. This kind of storage and processing capability simply cannot be built in a conventional in-house IT center. You need the kind of capabilities that only the cloud can provide. In yet another example of Big Data management, the New York Times used 100 Amazon Elastic Compute Cloud instances to process 4 TB of raw images into 11 million PDF files. The task took about 24 hours and cost – hold your breath – all of $240. This is the kind of capability rendered possible by cloud computing1.

In yet another case of using the cloud to handle Big Data efficiently1, a cancer research team built a 50,000-core supercomputer to perform core cancer research computations. This supercomputer, assembled in data centers spanning the globe, ran for three hours and cost just $4,828 per hour to run. Such a supercomputing cluster can be set up by any qualified person in a matter of hours without any external help.

Maybe you are not yet into Big Data, but if you are passionate about growing your company, it is only a matter of time before you will encounter Big Data with its many challenges and opportunities. Big Data need not be a constraint! You can take advantage of cloud computing techniques to get the best out of the large data sets your competitors may be afraid to make use of.

To gain a better understanding on how big data can be optimally managed in the cloud, visit GMO Cloud’s High Availability Configuration page.

Be Part of Our Cloud Conversation

Our articles are written to provide you with tools and information to meet your IT and cloud solution needs. Join us on Facebook and Twitter.

About the Guest Author:

Sanjay Srivastava has been active in computing infrastructure and has participated in major projects on cloud computing, networking, VoIP and in creation of applications running over distributed databases. Due to a military background, his focus has always been on stability and availability of infrastructure. Sanjay was the Director of Information Technology in a major enterprise and managed the transition from legacy software to fully networked operations using private cloud infrastructure. He now writes extensively on cloud computing and networking and is about to move to his farm in Central India where he plans to use cloud computing and modern technology to improve the lives of rural folk.