ICKP: A consistent checkpointer for multicomputers. There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.
Keywords for this software
References in zbMATH (referenced in 5 articles )
Showing results 1 to 5 of 5.
- Agnetis, Alessandro; Detti, Paolo; Martineau, Patrick: Scheduling nonpreemptive jobs on parallel machines subject to exponential unrecoverable interruptions (2017)
- Agullo, E.; Giraud, L.; Salas, P.; Zounon, M.: Interpolation-restart strategies for resilient eigensolvers (2016)
- Agullo, Emmanuel; Giraud, Luc; Guermouche, Abdou; Roman, Jean; Zounon, Mawussi: Numerical recovery strategies for parallel resilient Krylov linear solvers. (2016)
- Cores, Iván; Rodríguez, Gabriel; Martín, Mará J.; González, Patricia; Osorio, Roberto R.: Improving scalability of application-level checkpoint-recovery by reducing checkpoint sizes (2013) ioport
- Plank, James S.; Li, Kai: ickp: A Consistent Checkpointer for Multicomputers (1994) ioport