Friday, March 18, 2011

HP CEO Leo Apotheker says "relational databases are becoming less and less relevant to the future stack"

interesting, my take -

The call to NoSQL  is a wakeup call because unlike IBM, Oracle and Microsoft, HP doesn’t have a relational database franchise to protect. Sure it sells a boatload of servers to run relational databases, but its not locked in from a customer information perspective. HP and VMware are in a similar situation here.

Oracle is great for transactional workloads- we all know that – but it should not be the default choice for all data storage. Oracle is overly heavyweight, and demands design time data model decision-making which makes very little sense in an age of linked data, used and reused in new contexts. Its also just too expensive to be used as a straightforward a bucket of bits; MySQL is more appropriate for that role – but developers are moving on when it comes to graph and document databases(Neo4J). But the web is churning out a host of interesting new stores- Cassandra is a speedy key value store database built and open sourced by web companies. It seems highly likely will make play in Hadoop too.

Tuesday, March 15, 2011

Event Processing -Big Data@Cloud Connect 2011

Observation from the Session, presented by Colin Clark, CTO, Event Cloud Processing :

In many ways, Big Data is what clouds were made for. Computing problems that are beyond the grasp of a single computer—no matter how huge—are easy for elastic platforms to handle. Big data processing pioneer Colin Clark was discussing how to discover hidden signals and new knowledge within in huge streams of realtime data, applying event processing design patterns to events in real time.

Colin was talking about high velocity, big data.  Then, gives his Complex Event Processing Criteria:
Domain Specific Language
Continuous Query
Time/Length Windows
Pattern Matching

Example of what Colin was talking about: “Select * from everything where itsInteresting = toMe in last 10 minutes”

How much data does that return? How much processing will it take? 

Limitations of current CEP solutions: memory bound, compute bound and black box.  Using CEP, can analyze data in-flight, but have limitations. Other challenge is time series analysis.

A technique available for time series analysis is symbolic aggregate approximation (SAX). 

He was describing the construction of a “SAX word” from a days worth of IBM trading.  Then, search history for that same word, to find a pattern.

Getting closer to solving the high velocity, big data problem.  But, still too much data to process.  So, the next element in cloud event processing is Map/Reduce. 

Still though, need to address the real-time (event-driven) aspect.  Brings us to virtualized resources (cloud).

So, assuming: High velocity, big data = CEP + SAX + Streaming Map/Reduce + virtualized resources, which equals Cloud Event Processing’s Darkstar.

Today, Darkstar is working on Wall Street, doing market surveillance at the exchange. 

Friday, February 25, 2011

AWS Cloud Formation

Amazon Web Services announced AWS CloudFormation, which lets developers and system admins use recipes to create and provision resources in Amazon cloud. This is conceptually similar to Opscode’s Chef Recipes which lets ops folks configure some aspect of the systems in their ecosystem. Clearly, AWS must have seen how Chef (and, of course, cfengine and puppet) has changed the configuration management landscape and wanted to do something which will help baking clouds much easier. CloudFormation is a right step in this direction.

With CloudFormation, developers can either use templates available in the library or create their own templates to describe AWS resources and the associated runtime needs of their application without worrying about the order in which the AWS resources should be provisioned and, also, making the provisioning work seamlessly with live applications without any disruption.

AWS CloudFormation supports many AWS resources including EC2, EBS Volumes, Load Balancers, Elastic IP, Security and Autoscaling Groups, Elastic Beanstalk, Cloudwatch Alarms, RDS, SimpleDB, SNS, etc.. They have also released recipes to install some of the open source applications like WordPress, Drupa, tracks, Redmine, Joomla, etc.. With just a few configuration details like the type of EC2 instance, autoscaling limits, etc. one can easily get these applications running in minutes.

AWS CloudFormation is definitely the next logical evolution for Amazon. This also gives them an opportunity to try and lock in their customers inside their ecosystem. If anyone expected them not to take this step, they are being naive about how business is done in this competitive world. Amazon is doing everything right to stay as the largest cloud player in the market. However, I do think that it doesn’t bode well for the players in the AWS ecosystem.

Some pundits see this as a direct threat to Chef and Puppet but it is definitely not the case. Chef and Puppet are more focussed on configuration management and are not reliant on AWS in any way. However, it does affect some large players like Rightscale and smaller ones like Bitnami . Even though their businesses are not entirely reliant on AWS alone, it does highlight the risk of any provider being reliant on a single cloud provider ecosystem, especially ambitious ones like Amazon. Do you think tt is yet another wake up call for anyone wanting to build a business around AWS Cloud?

Wednesday, February 23, 2011

Lets take Amazon recently announced a video streaming service available for free to their Amazon Prime subscribers. From what we have heard in the media, Amazon will end up offering a competitive service to Netflix. We all know that Netflix has recently moved most of their infrastructure to Amazon Web Services .

When Amazon announced the new video streaming service, cloud pundits raised some valid questions on whether Amazon will tweak the QoS of their cloud infrastructure to help their own service over Netflix. Clearly, in a highly competitive environment like what exists between Netflix and Amazon, this is a realistic possibility. Remember, in spite of the openness we tout in the cloud world, the cost of moving the infrastructure away from Amazon will be prohibitive for Netflix. Will Amazon play dirty games to kick a competitor away from the market or will it play straight and protect their booming cloud business.

As soon as this discussion came out among Clouderati (a loose group of cloud practitioners, vendors, pundits on Twitter), Amazon CTO, Werner Vogels, jumped in immediately and clarified  that Amazon has no such plans and he even highlighted that any such attempts will be shortsighted move and doesn’t bode well for the longevity of both Amazon Web Services and Amazon’s brand itself. He pointed out to how Amazon is already living in peace with competing third party merchants on their ecommerce platform. He even highlighted the fact that Amazon has a history of cannibalizing their own business to support a customer oriented view.

I agree with Werner’s arguments completely. Some of my thoughts on this development are:
Amazon is very smart to not cannibalize their larger brand for short term gains
Netflix is not dumb to consider such possibilities before they decided to put their entire business on Amazon infrastructure
More importantly, regulators are not blind. Any anti-competitive measure will not sit well with regulators and Amazon knows it pretty well

However, this does raise a very important question which every business planning to move their infrastructure to a third party cloud provider should consider. Are they willing to trust the cloud provider not to poach into their business? Do they have enough protection through their SLAs? Depending on the nature of the business, it is important that due diligence is paid while evaluating the risks.

Saturday, February 19, 2011

NoSQL is a fad

I attended a Silicon Valley Cloud Meetup this week where Siddharth Anand (Sid) gave a great talk on how Netflix has moved to the cloud from applications hosted in their own datacenter. In particular he focused on moving applications from using a traditional RDMS (Oracle in this case) to first SimpleDB (an Amazon offering) and then Cassandra (an open source Key/Value store). SriSatish Ambati (DataStax <http://www.datastax.com/> , has given a great overview of Cassandra’s history and capabilities (slides <http://slidesha.re/svcc_nflx> ) as a warm up talk.

Sid focused down on the actual issues their engineering team ran into when moving to the could, including:
  • Data Model changes required
  • Living without SQL support
  • No joins
  • No transactions
  • No triggers
  • etc.
Netflix accomplishes this by a common layer in their architecture that deals with the NoSQL and hides some of these idiosyncrasies from the application layers above. This is a lot of work (as evidenced by their recent hiring spree – they do these talks to recruit), but is necessary if they are to outsource their resource management and planning to the cloud so that they can scale efficiently.

Sid will give a followup part 2 talk that deals in more depth with the Cassandra issues that they are still working on solving. His slides <http://slidesha.re/hOkpT1> and a whitepaper <http://practicalcloudcomputing.com/post/1267489138/netflixcloudstorage> are available if you want more details.