A Framework for Adding Real-Time Distributed Software Fault Detection
Dinesh Gambhir
BDD
New York, New York
Michael Post
Bellcore
331 Newman Springs Rd.
Red Bank, New Jersey 07701
Ivan Frisch
Polytechnic University
Brooklyn, New York
Abstract
We consider the problem of fault detection and isolation in systems
that consist of real-time distributed cooperating processes. A framework for
adding fault detection and isolation capabilities to SNMP-based distributed
management systems is presented. The framework revolves around the use of a formal
specification model of the cooperating processes which we refer to as "local directed
graphs". We describe the local directed graph model, and a fault monitoring and
isolation architecture that implements the framework. In doing so, we address the
problem of the size of our formal description and also show that this
architecture is suited to the management of internetworks. Lastly, we present an
example illustrating the operation of the architecture.
Keywords: Distributed systems management; Software fault isolation; SNMP; multi-domain network management.
JNSM: Vol. 2, No. 3, 1994
A Framework for Adding Real-Time Distributed Software Fault Detection [Vol. 2, No. 3, 1994]
NOTE: only abstract of paper available on-line
Back to JNSM main page