Big Data is one of the most frequently discussed topics in technology world today, among enterprises or startups, yet it’s also one of the most confusing ones. It’s all about data and sense. So far throughout the industry, there appears to have more data than the sense generated from it.
– Why do we need to worry about Big Data and more data?
– What kind of Big Data are we talking about?
– What sense are we trying to make out of processing large amount of the targeted data in business?
– Are we dealing with a human issue or a machine issue?
These would be the initial questions that we should ask ourselves before talking about the problems and solutions to Big Data. Lately each time TriStrategist listened to a talk about Big Data, it was always about a different problem space of the Big Data, and of course different technical approach by different companies. To understand the core issues, here is our simple process of making sense for ourselves on the subject:
What would these “Big Data” contain?
– Structured vs. unstructured data (Examples of unstructured data include those data from the social media, etc)
– Real time vs. offline (or time-lagged) data
– Dynamic vs. static data
What general senses can we expect from studying Big Data?
– Operation intelligence: Fast real time analysis and real-time responses(from milliseconds to seconds). For example, for high-speed trading, eCommerce, financial transactions, online auctions, online gaming, etc.
– Business intelligence: Data mining, trend analysis with more data and less time (minutes to days or longer)
– Machine learning for human intelligence: For many big crazy ideas through data that were not able to be done or tested before within reasonable efforts and time. This is about the predictive basics using Big Data.
– Advanced Artificial Intelligence: The new capabilities to simulate various human brain cognitive powers with Big Data processing will enable unprecedented development in AI, which in turn will shine new lights into robotic advancements.
– New discoveries: With imagination and originality to look into data and make new senses from the past unknowns.
What are the top technical areas of challenges with Big Data that people are trying to solve today?
1. Data Plumbing – Better system architecture and faster algorithms to handle and process the ever increasing amount of data from all sources, especially unstructured data where relational DB methods deemed unsuitable, into machine-understandable format that can be ready for fast analysis;
2. Processing time – Significantly reduced processing time or faster response time for business operations and intelligence, real-time or offline;
3. Real-time synchronization – Incorporate constant data updates in real-time processing, analysis and response;
4. Analytics- Better analysis design to yield more relevant and accurate business insights, to enable new business possibilities;
5. Data Communication – The accurate, fast, smooth, back and forth interactions and transfers of the data among end-users, devices and systems.
Apparently the field of Big Data is of business, operation, social and academic importance. From technical point of view, today’s solutions to Big Data are far and apart, and the best ones are yet to come. With hardware getting cheaper, many companies are using in-memory processing to compete on the time issues, but that might be a costly approach for gigantic datasets. From software side, parallel computation, Hadoop MapReduce is one of the examples, has resurfaced as a very useful thought (while differs in concept and approach from the past due the availability of cloud and cluster computing). Still current usable algorithms are sparse and limited in capabilities, many times difficult to use as well. New approach needs to be seriously investigated. Who knows, it may quite possibly result in some Google-style successful start-ups or business ventures if some brilliance and team work can land a few true breakthroughs, at the algorithm level, not at the hardware level.