UPDATE: Deadline has been extended to 30 August.
The International Workshop on fault Tolerant Architectures for Reliable Distributed Infrastructures and Services (TARDIS2011) will be held at the 4th IEEE International Conference on Utility and Cloud Computing (UCC 2011).
Twitter hashtag: #TARDIS2011
HPC in the Cloud: "Toward a Fault-Tolerant Cloud"
DSA-Research.org Blog: "Will you let the Sky fall down?"
“Not letting the Sky fall down” [1,2,3]
Cloud Computing has moved the center of gravity of application distributed execution, by exploiting virtualization at different layers and by adding a complexity level to the scheduling problem. While Cloud computing can bring more flexibility in the design of applications, it also arises new research challenges. Compared with the traditional method of dedicating one server to a single application, consolidation through virtualization can boost the resource utilization rate by aggregating workloads from separate machines into a small number of servers: workloads can be now executed in a dense environment using much less machines, in which the impacts of faults can be vastly magnified. For example, any single hardware failure will affect all the virtual servers in that physical machine, or under dynamic workloads, it may be difficult to distinguish real faults from normal system.
The need of this concept revisiting is fundamental when provisioning is left to public Cloud infrastructures, where an optimal budget must be met. Different strategies can be tailored, from hybrid architectures to service distribution across cloud providers. Additionally, cloud providers typically establish Service Level Agreements (SLAs) with their customers, and providers must also enforce the Quality of Service (QoS) in their infrastructures, under an unreliable and highly dynamic environment.
Cloud computing is playing an increasingly important role in current distributed computing, which involves a wide community. The Cloud provides a scalable, computational model where users access services based on their requirements without regard to where the services are hosted or how they are delivered: computing processing power, storage, network bandwidth or software usage can be provided as services over the Internet. In consequence, applications developed over such on-demand infrastructures can be built upon more flexible principles, being more fault tolerant, more resilient and more dynamic. Although fault tolerance in distributed systems has been a matter of research in the past that has generated a wide collection of algorithms for fault detection, identification and correction, these concepts will have to be re-visited in the context of Cloud computing.
Papers on all aspects of Fault tolerance and reliability in private, public and hybrid Clouds are expected.