A Framework for Adding Real-Time Distributed Software Fault Detection



Dinesh Gambhir
BDD
New York, New York

Michael Post
Bellcore
331 Newman Springs Rd.
Red Bank, New Jersey 07701

Ivan Frisch
Polytechnic University
Brooklyn, New York


Abstract
We consider the problem of fault detection and isolation in systems that consist of real-time distributed cooperating processes. A framework for adding fault detection and isolation capabilities to SNMP-based distributed management systems is presented. The framework revolves around the use of a formal specification model of the cooperating processes which we refer to as "local directed graphs". We describe the local directed graph model, and a fault monitoring and isolation architecture that implements the framework. In doing so, we address the problem of the size of our formal description and also show that this architecture is suited to the management of internetworks. Lastly, we present an example illustrating the operation of the architecture.

Keywords: Distributed systems management; Software fault isolation; SNMP; multi-domain network management.

JNSM: Vol. 2, No. 3, 1994 A Framework for Adding Real-Time Distributed Software Fault Detection [Vol. 2, No. 3, 1994]



NOTE: only abstract of paper available on-line

Back to JNSM main page