Software Component Health Monitoring; System Health Monitoring; Failure And Event Logging - Dell Force10 C150 Configuration Manual

Ftos configuration guide ftos 8.4.2.7 e-series terascale, c-series, s-series (s50/s25)
Hide thumbs Also See for Force10 C150:
Table of Contents

Advertisement

Software Component Health Monitoring

On each of the line cards and the RPM, there are a number of software components. FTOS performs a
periodic health check on each of these components by querying the status of a flag, which the
corresponding component resets within a specified time.
If any health checks on the RPM fail, then the FTOS fails over to standby RPM. If any health checks on a
line card fails, FTOS resets the card to bring it back to the correct state.

System Health Monitoring

FTOS also monitors the overall health of the system. Key parameters like CPU utilization, free memory,
error counters (CRC failures, packet loss, etc.) are measured, and upon exceeding a threshold can be used
to initiate recovery mechanism.

Failure and Event Logging

Dell Force10 systems provides multiple options for logging failures and events.
Trace Log
Developers interlace messages with software code to track a the execution of a program. These messages
are called trace messages; they are primarily used for debugging and provide lower level information than
event messages, which are primarily used by system administrators. FTOS retains executed trace messages
for hardware and software and stores them in files (logs) on the internal flash.
NV Trace Log—contains line card bootup trace messages that FTOS never overwrites, and is stored in
internal flash under the directory NVTRACE_LOG_DIR.
Trace Log—contains trace messages related to software and hardware events, state, and errors. Trace
Logs are stored in internal flash under the directory TRACE_LOG_DIR.
Crash Log—contains trace messages related to IPC and IRC timeouts and task crashes on line cards,
and is stored under the directory CRASH_LOG_DIR.
For more information on trace logs and configuration options, see:
Chapter 60, C-Series Debugging and Diagnostics
Chapter 61, E-Series TeraScale Debugging and Diagnostics
Core Dumps
A core dump is the contents of RAM being used by a program at the time of a software exception and is
used to identify the cause of the exception. There are two types of core dumps, application and kernel.
392
|
High Availability

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents