Observation from the Session, presented by Colin Clark, CTO, Event Cloud Processing :
In many ways, Big Data is what clouds were made for. Computing problems that are beyond the grasp of a single computer—no matter how huge—are easy for elastic platforms to handle. Big data processing pioneer Colin Clark was discussing how to discover hidden signals and new knowledge within in huge streams of realtime data, applying event processing design patterns to events in real time.
Colin was talking about high velocity, big data. Then, gives his Complex Event Processing Criteria:
Domain Specific Language
Continuous Query
Time/Length Windows
Pattern Matching
Example of what Colin was talking about: “Select * from everything where itsInteresting = toMe in last 10 minutes”
How much data does that return? How much processing will it take?
Limitations of current CEP solutions: memory bound, compute bound and black box. Using CEP, can analyze data in-flight, but have limitations. Other challenge is time series analysis.
A technique available for time series analysis is symbolic aggregate approximation (SAX).
He was describing the construction of a “SAX word” from a days worth of IBM trading. Then, search history for that same word, to find a pattern.
Getting closer to solving the high velocity, big data problem. But, still too much data to process. So, the next element in cloud event processing is Map/Reduce.
Still though, need to address the real-time (event-driven) aspect. Brings us to virtualized resources (cloud).
So, assuming: High velocity, big data = CEP + SAX + Streaming Map/Reduce + virtualized resources, which equals Cloud Event Processing’s Darkstar.
Today, Darkstar is working on Wall Street, doing market surveillance at the exchange.
IBM product Infopshere Streams addresses real time event drive processing needs. It is not big on social networking platform or applications, but many of industrial and part of smarter planet initiatives.
ReplyDelete