The Current State of Distributed Computing

Distributed Computing has always been a desire in both industries and academics. With the current cloud power, one would assume that distributed computing should become a lot easier. Yet, besides a distributed program can run faster if it would run correctly, nothing has become easier on coding the program to make sense or truly work in distributed ways as intended. Allowing simultaneous leverages on the compute power and different datasets across different locations, hardware, platforms, languages, data sources and formats is definitely a non-trivial task today, almost the same as it was yesterday.

The “why” part of the distributed computing is easy to understand, but even with the help of existing popular cloud platforms and industry money, academic researchers and industry developers still have zero consensus on “how” it should be done. Huge room of creativity remains in this area today and numerous tools mushroomed, especially within open source community, but each highly customized solution results in very little consistency and leveragability. Sustainability and manageability are nightmares for everyone too.

For example, for decades, to utilize extra compute power on many idle machines for large computational tasks has always been a very attractive idea for academics. University of Wisconsin at Madison has been developing an open-source-based distributed computing software solution called HTCondor. Yet, allocating compute tasks to heterogeneous hardware, retrieving data and files in various paths and formats, handling multiple platforms, as well as managing the shared compute states are still huge challenges which involve customer coding. On the other hand, some startups have the right ideas to focus on defining new abstraction architecture to separate data from the mechanics of handling them, and design higher level coding definitions, but all are in infant stage right now.

It appears that this is a definitely a great time and space for some industry deep pockets to step up and come up with more user-friendly and productive software solutions as TriStrategist called out before [See our May 2014 blog on “the Internet of Things]. It needs a straightforward hybrid-cloud-based software architecture and an easy-to-use high-level programming language, with sound abstraction of the tiers of data passing, protocols, pipelines, control mechanisms, etc., and a set of platform-neutral configurable (preferably UI-based) data plumbing tools that are more touchable than the pure developer-driven open-source packages on the market.

Eventually, to truly realize distributed computing scenarios will need machine and code intelligence.