Cisco Cisco ASR 5700 Guía Para Resolver Problemas

Descargar
Página de 3
Contents
Introduction
Problem
Solution
Related Cisco Support Community Discussions
Introduction
This article describes the apparent false trigger of the ThreshDNSLookupFailure trap when an
Service Redundancy Protocol (SRP) connection bounce occurs on an SRP standby node.
Infrastructure Domain Name Service (DNS) is used on various nodes in the Long Term Evolution
(LTE) network indirectly as part of the call setup process. On a Packet Data Network Gateway
(PGW) it can be used to resolve any Fully Qualified Domain Names (FQDNs) returned in S6b
authentication, as well as to resolve FQDNs specified as peers in the various Diameter endpoint
configurations. If DNS timeouts (failures) occur on an active node processing calls, then this can
negatively affect call setups depending on what components rely on the DNS functioning properly.
Problem
Starting in StarOS v15 there is a configurable threshold to measure infrastructure DNS failure rate.
In the case where the PGW is implemented with Inter-Chassis Session Recovery (ICSR), there is
the likelihood that if the SRP connection between both nodes goes down for whatever reason, and
the ensuing Standby node goes into pending Active state (but not fully active because the other
node remains fully SRP active assuming no other issues), then the associated DNS alarm/trap is
triggered. This is because in pending active state, the node  attempts to establish the various
diameter connections for the various diameter interfaces in the ingress context in preparation of
potentially becoming fully SRP active. If the configuration for ANY of the diameter connections is
based on specifying peers in the endpoint configuration that are FQDNs instead of IP addresses,
then those peers need to be resolved via DNS with A (IPv4) or AAAA (IPv6) queries. Since the
node is in pending active state, such queries ALL FAIL because the responses to the requests will
be routed to the active node (which will drop the responses), which results in 100% failure rate
which in turn causes the alarm/trap to be triggered. While this is expected behavior in this
scenario, the potential result is an opened customer ticket regarding the significance of the alarm.
Here is an example of such an alarm where Diameter Rf is configured with FQDNs and therefore
requires DNS to resolve.  Shown is an FQDN that needs to be resolved by DNS.
The SRP connection goes down for some reason (external to the pair of PGW nodes and the
reason not important for the purposes of this example) for 7+ minutes, and the SNMP trap
ThreshDNSLookupFailure triggers.
Tue Nov 25 08:43:42 2014 Internal trap notification 1037 (SRPConnDown)
vpn SRP ipaddr 10.211.220.100 rtmod 3 Tue Nov 25 08:43:42 2014 Internal trap notification 120
(SRPActive)
vpn SRP ipaddr 10.211.208.165 rtmod 3 Tue Nov 25 08:51:14 2014 Internal trap notification 1038
(SRPConnUp)
vpn SRP ipaddr 10.211.220.100 rtmod 3 Tue Nov 25 08:51:14 2014 Internal trap notification 121
(SRPStandby)
vpn SRP ipaddr 10.211.208.165 rtmod 9 Tue Nov 25 09:00:08 2014 Internal trap notification 480