Understanding A State-Sensitive Recovery; Understanding A Fault-Resilient Boot - NEC Express5800/320Ma User Manual

Virtual technician module
Hide thumbs Also See for Express5800/320Ma:
Table of Contents

Advertisement

Understanding a State-Sensitive Recovery

If the Express5800/320Ma system stops responding, the
automatically starts a state-sensitive recovery (SSR). An SSR tries to isolate a CPU to
preserve the state of the system and to create dump information that can be used to
diagnose the problem.
In an SSR, the system controller performs three procedures that move through
progressively more-severe recovery levels. After each procedure, the system controller
waits to receive a heartbeat, or message, from the host system before trying a more
invasive procedure.
The system controller logs the system failure in the
each of the following, in order, until the system is restored to normal operation:
1. Initiates a
save the contents of system memory to a dump file, and to restart the operating
system.
2. Issues a
CPU whose state it wants to preserve. The system controller then tries to reboot
the system, while keeping the specified CPU isolated and in a broken state.
3. Performs a full
boot the system. Dump information is lost during this process.
For information about how to retrieve dump files, see the Express5800/320Ma: System
Administrator's Guide or the online Help for ftServer Management Console (ftSMC).
Related Topics

"Understanding a Fault-Resilient Boot"

"When the System Is No Longer Responding"
Understanding a Fault-Resilient Boot
In a fault-resilient boot (FRB), the
attempts to start the Express5800/320Ma system. Each attempt involves restarting
CPU and I/O elements in a fixed order. This configuration is used when you start a
system that has been powered off; it is not used when you simply restart the operating
system.
The system controller tries each boot configuration in the list only once, and does not
repeat the boot process if all of the configurations fail. If one of the boot configurations
is successful, the FRB process stops and the system controller stops trying to restart
CPU or I/O elements. The operating system is responsible for bringing any additional
hardware into operation, including, if possible, the other CPU and I/O elements.
non-maskable interrupt
hard
reset, which changes the system state to "crashed," and isolates the
fault-resilient boot
Understanding a State-Sensitive Recovery
system event
(NMI) to try to avoid rebooting the system, to
(FRB), which takes both CPUs offline and tries to
system controller
makes up to six consecutive
Reviving a System That Is Not Responding
system controller
log, and then tries
3-9

Advertisement

Table of Contents
loading

Table of Contents