August 5, 2019, Kyoto, Japan

International Workshop on Failure Resilience
in Emerging Parallel Applications and Systems

Call for Papers for the International Workshop on Failure Resilience in Emerging Parallel Applications and Systems

In conjunction with ICPP 2019
(https://www.hpcs.cs.tsukuba.ac.jp/icpp2019/)

Important dates

Deadlines extended, new dates are below

  • Paper Submission Deadline 24 May 2019 May 10, 2019
  • Paper Acceptance Notification 31 May 2019 May 24, 2019
  • Camera-ready papers 7 June 2019

General Description

Parallel platforms have scaled dramatically in recent years, and many systems have of the order of 10^6 or more processing elements. With the increase in scale however, fault occurrences have increased, in hardware but also in interconnects, and large communicating SPMD/MPMD programs. Traditionally, ABFT (Algorithm Based Fault Tolerance) and CPFT (Checkpoint/Restart Fault Tolerance) have been the mechanisms to address faults, along with other approaches such as replicated execution or naturally tolerant stochastic algorithms. However, current systems, and more importantly emerging applications, are less suited to these schemes for ensuring completion of program execution in the presence of faults. This workshop will explore the nature of current and new parallel platforms (GPU, multicore, grid, and cloud-based) and the nature of emerging applications (AI, machine learning, neural networks, graph, and big data) and discuss approaches to enabling failure resilience in these contemporary parallel applications and systems. Topics will range from failure resilient algorithms in highly distributed algorithms that are data intensive to fault mitigation in clouds and GPU/multicore systems, especially focusing on environments that support both.

The workshop on “Failure Resilience in Emerging Parallel Applications and Systems (FREPAS19)” will bring together (a) participants who study parallel architectures, programming models, and runtime systems in modern parallel platforms and (b) those who develop and support parallel applications in big-data, graphs, AI, machine learning and neural networks. It is intended to be synergistic by each of these constituencies providing ideas to the other to eventually evolve robust parallel systems in emerging application and platform domains.

Topics of Interest:

  • Failure resilience in Cloud and Fog Environments

  • Fault tolerant GPU and multicore algorithms

  • Robust Big Data and Big Compute

  • Machine learning with fault tolerance

  • Neural/learning in unreliable environments

  • Reliability, efficiency, and performance in emerging parallel platforms

  • Modeling and simulation of robustness in parallel systems

  • Self-healing AI Systems

Submission Instructions:

Authors are invited to submit manuscripts reporting original, unpublished research and recent developments in Failure Resilience in Emerging Parallel Systems and Applications. All accepted papers are planned to be published by ACM, and included in ACM digital library if presented at the conference. Papers should not exceed *8* pages (including references) in the ACM Sigconf format located at https://www.acm.org/publications/proceedings-template

Manuscripts of up to 8 pages, written in English and formatted according to https://www.acm.org/publications/proceedings-template should be submitted electronically via EasyChair.

Papers must be based on unpublished original work and must be submitted to FREPAS only. Papers submitted to the parent ICPP 2019 conference which are not accepted to the main conference may be forwarded to the FREPAS workshop by the main conference program committee. Submission implies the willingness of at least one of the authors to register and present the paper.

Deadlines for draft paper submission, notification of acceptance, camera-ready paper submission and registration may be found in the Important Dates section.