Cisco Cisco ASR 5500 故障排查指南

下载
页码 7
subsystem for processing subscriber sessions and the VPN subsystem that is responsible for IP address
assignment, routing, and so on. Each subsystem has a controller task that oversees the health of the subsystem
it controls. The controller tasks run on the SMC/MIO card. The session manager and AAA manager tasks are
paired together in order to handle a subscriber's session for control, data traffic, and billing purposes. When
session recovery is enabled in the system, each session manager task backs up the state of its set of subscriber
states with a peer AAA manager task to be recovered in the event of a session manager crash.
What is a crash?
A task in the ASR5x00 can potentially crash if it encounters a fault condition during normal operation. A
crash or software fault in the ASR5x00 is defined to be an unexpected exit or termination of a task in the
system. A crash can happen if the software code attempts to access memory areas that are prohibited (such as
corrupted data structures), encounters a condition in the code that is not expected (such as an invalid state
transition), and so on. A crash can also be triggered if the task becomes unresponsive to the system monitor
task and the monitor attempts to kill and restart the task. A crash event can also be explicitly triggered (as
opposed to unexpected) in the system when a task is forced to dump its current state by a CLI command or by
the system monitor in order to analyze the task state. An expected crash event can also happen when the
system controller tasks restart themselves in order to potentially correct a situation with a manager task that
repeatedly fails.
Effects of a Session Manager Crash
Under normal operation, a session manager task handles a set of subscriber sessions and associated data traffic
for the sessions along with a peering AAA manager task that handles billing for those subscriber sessions.
When a session manager crash occurs it ceases to exist in the system. If session recovery is enabled in the
system, a standby session manager task is made to become active in the same PSC/DPC card. This new
session manager task reinstates the subscriber sessions as it communicates with the peer AAA manager task.
The recovery operation ranges from 50 msec to a few seconds dependent upon the number of sessions that
were active in the session manager at the time of the crash and overall CPU load on the card and so on. There
is no loss in subscriber sessions that were already established in the original session manager in this operation.
Any subscriber session that was in the process of establishment at the time of the crash will likely also be
restored due to protocol retransmissions and so on. Any data packets that were in transition through the
system at the time of the crash can assumed to be associated with a network loss by the communicating
entities of the network connection and will be retransmitted and the connection will be carried on by the new
session manager. Billing information for the sessions carried by the session manager will be preserved in the
peer AAA manager.
When should the operator get concerned?
When a session manager crash occurs, the recovery procedure happens as described previously and the rest of
the system remains unaffected by this event. A crash in one session manager does not impact the other session
managers. As a guidance to the operator, if multiple session manager tasks on the same PSC/DPC card crash
simultaneously or within 10 minutes of each other, there might be loss of sessions as the system might not be
able to start new session managers fast enough to take the place of the crashed tasks. This corresponds to a
double fault scenario where loss of sessions can occur. When recovery is not feasible, the session manager is
simply restarted and is ready to accept new sessions.
When a given session manager crashes repeatedly (such as it encounters the same fault condition over and
over), the session controller task takes note and restarts itself in an attempt to restore the subsystem. If the
session controller task is unable to stabilize the session subsystem and restarts itself continously over in this
effort, the next step in the escalation is for the system to switch over to a standby SMC/MIO card. In the
unlikely event that there is no standby SMC/MIO card or if a failure is encountered in the switchover