Synthesis of Fault-Tolerant Distributed Systems

Rayna Dimitrova and Bernd Finkbeiner

A distributed system is fault-tolerant if it continues to perform correctly even when a subset of the processes becomes faulty. Fault-tolerance is highly desirable but often difficult to implement. In this paper, we investigate fault-tolerant synthesis, i.e., the problem of determining whether a given temporal specification can be implemented as a fault-tolerant distributed system. As in standard distributed synthesis, we assume that the specification of the correct behaviors is given as a temporal formula over the externally visible variables. Additionally, we introduce the fault-tolerance specification, a CTL* formula describing the effects and the duration of faults. If, at some point in time, a process becomes faulty, it becomes part of the external environment and its further behavior is only restricted by the fault-tolerance specification. This allows us to model a large variety of fault types. Our method accounts for the effect of faults on the values communicated by the processes, and, hence, on the information available to the non-faulty processes. We prove that for fully connected system architectures, i.e., for systems where each pair of processes is connected by a communication link, the fault-tolerant synthesis problem from CTL* specifications is 2EXPTIME-complete.

Automated Technology for Verification and Analysis (ATVA 2009).

(pdf) (bib)