When Big Data is Too Big: The Value of Real-Time Filtering and Formatting
James Heath, a former colleague now working at Sprint, recently gave me an intriguing definition of “Big Data.” He said, “You know your data is ‘big’ when the cheapest and best way to transport it is to carry it on an airplane.”
It’s a good one, James. Maybe the only thing I would add is the dimension of time. If you need your data analyzed five minutes after it’s left a router, an airplane trip doesn‘t really help. And that’s quite common in telecom: the volume of network traffic is often so huge it outstrips the ability of the analytics engine to process it fast enough.
Rick Aguirre, CEO of Dallas-based Cirries Technologies, is a guy who knows something about managing these extra-extra large data sets. His company has made a business out of helping carriers filter their big data down to a size that can be analyzed in a matter of seconds, minutes, and hours.
Dan Baker: Rick, tell us a little about your firm. When did you get rolling and what’s your mission in the “big data” space?
Rick Aguirre: Sure, Dan. We are a small, but rapidly expanding company and although “big data” is our business, we actually don‘t offer an analytics application. Instead we built a real-time mediation and filtering engine that’s very fast and can quickly get data in any format the client wants. In short, we feed a lot of analytic applications with the right data at the right time.
Thankfully we survived the deepest trough of the recession in 2008/2009, then from 2010 onward we’ve seen fast and profitable growth. Companies like Nokia-Siemens and RedHat have come to us for help and we also serve -- through resellers -- client in places like the Philippines, Mexico and Canada. Some of the large U.S. carriers, of course, are also important clients and we sell to them direct.
How does your system work and how do you add value?
It starts when an operator’s system or analytics platform has a limited throughput or needs to have the data presented in a different manner. Say you’ve captured data off the routers at the rate of 600,000 events a second, but the analytics engine can‘t run more than 15,000 events. How do you get the data down to a volume you can manage?
The answer is our Maestro Data Controller and in this case, filtering the data, one of many data manipulation functions we provide to our carrier clients.
So at the entry point of our real-time data parsing engine, we have “resource adapters” that can collect and transform any data structure -- a protocol, a syslog, a flat file. We then process that data in real-time where we do filtering, mediation, enrichment correlation, or even apply policies to it. In the end, we transform the network data into enhanced information or what we call smart data.
I like to say, “Give us all your data and we’ll filter and transform it so you’re only getting the relevant data.”
And the secret to doing that is the engine we’ve developed that can process a million events a second on a single CPU with a quad-core processor. Combined with Maestro’s distributed data collection capabilities, the results are very high at low cost. For one Carrier Grade Network Address Translation (CGNAT) application, Cirries demonstrated 1 million events a second, while the competitive alternatives were at 15,000 to 250,000 events per second.
Read More – Mediation Systems |