Friday, March 18, 2011

HP CEO Leo Apotheker says "relational databases are becoming less and less relevant to the future stack"

interesting, my take -

The call to NoSQL  is a wakeup call because unlike IBM, Oracle and Microsoft, HP doesn’t have a relational database franchise to protect. Sure it sells a boatload of servers to run relational databases, but its not locked in from a customer information perspective. HP and VMware are in a similar situation here.

Oracle is great for transactional workloads- we all know that – but it should not be the default choice for all data storage. Oracle is overly heavyweight, and demands design time data model decision-making which makes very little sense in an age of linked data, used and reused in new contexts. Its also just too expensive to be used as a straightforward a bucket of bits; MySQL is more appropriate for that role – but developers are moving on when it comes to graph and document databases(Neo4J). But the web is churning out a host of interesting new stores- Cassandra is a speedy key value store database built and open sourced by web companies. It seems highly likely will make play in Hadoop too.

Tuesday, March 15, 2011

Event Processing -Big Data@Cloud Connect 2011

Observation from the Session, presented by Colin Clark, CTO, Event Cloud Processing :

In many ways, Big Data is what clouds were made for. Computing problems that are beyond the grasp of a single computer—no matter how huge—are easy for elastic platforms to handle. Big data processing pioneer Colin Clark was discussing how to discover hidden signals and new knowledge within in huge streams of realtime data, applying event processing design patterns to events in real time.

Colin was talking about high velocity, big data.  Then, gives his Complex Event Processing Criteria:
Domain Specific Language
Continuous Query
Time/Length Windows
Pattern Matching

Example of what Colin was talking about: “Select * from everything where itsInteresting = toMe in last 10 minutes”

How much data does that return? How much processing will it take? 

Limitations of current CEP solutions: memory bound, compute bound and black box.  Using CEP, can analyze data in-flight, but have limitations. Other challenge is time series analysis.

A technique available for time series analysis is symbolic aggregate approximation (SAX). 

He was describing the construction of a “SAX word” from a days worth of IBM trading.  Then, search history for that same word, to find a pattern.

Getting closer to solving the high velocity, big data problem.  But, still too much data to process.  So, the next element in cloud event processing is Map/Reduce. 

Still though, need to address the real-time (event-driven) aspect.  Brings us to virtualized resources (cloud).

So, assuming: High velocity, big data = CEP + SAX + Streaming Map/Reduce + virtualized resources, which equals Cloud Event Processing’s Darkstar.

Today, Darkstar is working on Wall Street, doing market surveillance at the exchange.