The fast data concepts outlined in this article certainly make sense for certain situations, but it seems like there are numerous situations where customers would just be content to get their hands on big data at a less than glacial pace. The quantity, lack of structure and lack of user friendly analysis tools associated associated with big data makes it rather difficult and time consuming for marketers to get at the data they want to analyze in days and weeks, let alone real time. This will become less of an issue over time as more tools are developed so users can get at big data in more of a self-serve fashion. For instance, Microsoft is working with a Yahoo spinoff called Hortonworks to develop connectors that will allow user friendly Microsoft tools to connect directly to Hadoop data sources for reporting and analysis. This should definitely shorten the time between data collection and the time the information is available for analysis….and this will certainly make marketers happy.
Fast Data hits the Big Data fast lane
By Andrew Brust | April 16, 2012, 6:00am PDT
Summary: Fast Data, used in large enterprises for highly specialized needs, has become more affordable and available to the mainstream. Just when corporations absolutely need it.
-
By Tony Baer
- Wall Street firms routinely analyze live market feeds, and in many cases, run sophisticated complex event processing (CEP) programs on event streams (often in real time) to make operational decisions.
- Telcos have handled such data in optimizing network operations while leading logistics firms have used CEP to optimize their transport networks.
- In-memory databases, used as a faster alternative to disk, have similarly been around for well over a decade, having been employed for program stock trading, telecommunications equipment, airline schedulers, and large destination online retail (e.g., Amazon).
-
So what’s changed?
The usual factors: the same data explosion that created the urgency for Big Data is also generating demand for making the data instantly actionable. Bandwidth, commodity hardware and, of course, declining memory prices, are further forcing the issue: Fast Data is no longer limited to specialized, premium use cases for enterprises with infinite budgets.
- A homeland security agency monitoring the borders requiring the ability to parse, decipher, and act on complex occurrences in real time to prevent suspicious people from entering the country
- Capital markets trading firms requiring real-time analytics and sophisticated event processing to conduct algorithmic or high-frequency trades
- Entities managing smart infrastructure which must digest torrents of sensory data to make real-time decisions that optimize use of transportation or public utility infrastructure
- B2B consumer products firms monitoring social networks may require real-time response to understand sudden swings in customer sentiment
-
But don’t forget, memory’s not the new disk
The movement – or tiering – of data to faster or slower media is also nothing new. What is new is that data in memory may no longer be such a transient thing, and if memory is relied upon for in situ processing of data in motion or rapid processing of data at rest, memory cannot simply be treated as the new disk. Excluding specialized forms of memory such as ROM, by nature anything that’s solid state is volatile: there goes your power… and there goes your data. Not surprisingly, in-memory systems such as HANA still replicate to disk to reduce volatility. For conventional disk data stores that increasingly leverage memory, Storage Switzerland’s George Crump makes the case that caching practices must become smarter to avoid misses (where data gets mistakenly swapped out). There are also balance of system considerations: memory may be fast, but is its processing speed well matched with processor? Maybe solid state overcomes I/O issues associated with disk, but may still be vulnerable to coupling issues if processors get bottlenecked or MapReduce jobs are not optimized.