Cisco Cisco Process Orchestrator 3.0 User Guide
5-18
Cisco Process Orchestrator User Guide
OL-30196-01
Chapter 5 Managing High Availability and Resiliency
Handling Restarts and Failures
Handling Events During Restarts and Failures
Process Orchestrator adapters resume sending events after the Process Orchestrator server restarts.
While the adapter is still responsible for the implementation in this area, existing Process Orchestrator
adapter implementations attempt to be consistent with this guaranteed delivery design. For example,
when the SAP adapter reads CCMS alerts or the Remedy adapter polls for Incident state, the adapter
stores the last-read record so that it can resume reading records starting with the next entry when the
Process Orchestrator server resumes. For non guaranteed-delivery, network-initiated event technologies
such as SNMP traps, the Process Orchestrator server cannot know about events that occurred while it
was down. If required, many of these technologies can use highly-available intermediaries to persist the
transient events. For example, there are tools to listen for SNMP traps and convert them to persistent
stores such as log files or Windows events.
While the adapter is still responsible for the implementation in this area, existing Process Orchestrator
adapter implementations attempt to be consistent with this guaranteed delivery design. For example,
when the SAP adapter reads CCMS alerts or the Remedy adapter polls for Incident state, the adapter
stores the last-read record so that it can resume reading records starting with the next entry when the
Process Orchestrator server resumes. For non guaranteed-delivery, network-initiated event technologies
such as SNMP traps, the Process Orchestrator server cannot know about events that occurred while it
was down. If required, many of these technologies can use highly-available intermediaries to persist the
transient events. For example, there are tools to listen for SNMP traps and convert them to persistent
stores such as log files or Windows events.
There are two types of event systems in the Process Orchestrator server:
•
Event-based triggers. Because of the adapter implementations (see
), event-based triggers are not lost across server restarts or
failures. Like state management in other areas of the product, trigger submissions are transactional.
When Process Orchestrator adapters send a trigger to the Process Orchestrator server, processes
depending on that trigger are initiated as a part of the submission. As with any other processes, these
process instances are persisted to the database so that after a restart, the triggered processes are
running. With the exception of transient, non-persisted events, such as SNMP traps or performance
thresholds, there should be no time gap where events should be lost such that triggers fail to launch.
When Process Orchestrator adapters send a trigger to the Process Orchestrator server, processes
depending on that trigger are initiated as a part of the submission. As with any other processes, these
process instances are persisted to the database so that after a restart, the triggered processes are
running. With the exception of transient, non-persisted events, such as SNMP traps or performance
thresholds, there should be no time gap where events should be lost such that triggers fail to launch.
•
Correlation. The Correlation feature allows a server to tie together a related series of events. This is
achieved through a caching mechanism to ensure that the event data is available to the Correlate
activities in processes when needed. This cache of events is not retained across server restarts, which
can cause issues with some Correlate activities. Correlate activities can be used to make a process
wait for a certain condition, or branch based on how many events with particular properties have
been received in a particular time frame. Because the cache is not persisted across server restarts,
events received before the restart will not be matched by Correlate activities. These effects can affect
accuracy of a process “decision” made based on the data from a Correlate activity.
achieved through a caching mechanism to ensure that the event data is available to the Correlate
activities in processes when needed. This cache of events is not retained across server restarts, which
can cause issues with some Correlate activities. Correlate activities can be used to make a process
wait for a certain condition, or branch based on how many events with particular properties have
been received in a particular time frame. Because the cache is not persisted across server restarts,
events received before the restart will not be matched by Correlate activities. These effects can affect
accuracy of a process “decision” made based on the data from a Correlate activity.
In practice this has been found to not be an issue for the following reasons:
–
First, correlation is a fairly advanced feature in the product and is very rarely used in a majority
of scenarios.
of scenarios.
–
Second, best practices also dictate that a correlation time frame is configured to be fairly small,
which will minimize both the likelihood and impact of this compound condition. The time
required to restart a server will take up some or all of this time frame if the correlation time
frame coincides with a server restart.
which will minimize both the likelihood and impact of this compound condition. The time
required to restart a server will take up some or all of this time frame if the correlation time
frame coincides with a server restart.
–
Third, these correlations are typically used in diagnostic situations, and often the events repeat
if the problem recurs. Where a problem is still occurring, Process Orchestrator will typically
pick up the condition the next time the diagnostic process launches.
if the problem recurs. Where a problem is still occurring, Process Orchestrator will typically
pick up the condition the next time the diagnostic process launches.
–
Finally, Process Orchestrator automation packs tend to set these processes as non-persistent
anyway so they are not restarted. If the Process Orchestrator server is down, it is usually best to
get a fresh view of the health of the component being monitored rather than depend on the status
before the Process Orchestrator server restart. In many cases, the failure is resolved and it just
creates noise to record the old diagnosis.
anyway so they are not restarted. If the Process Orchestrator server is down, it is usually best to
get a fresh view of the health of the component being monitored rather than depend on the status
before the Process Orchestrator server restart. In many cases, the failure is resolved and it just
creates noise to record the old diagnosis.