Essay on Continuity Recovery Principles

Submitted By Weili-Dai
Words: 1953
Pages: 8

Principles of Continuity

System Continuity Model
• Consider your operation as a “system” of interacting components that produce services:

– An urban system: city operations, agencies, services, etc.
– An enterprise system: businesses, business units, functions, etc.
– A societal system: people, social groups, activities, etc.

• Consider the kind of response models required to address various kinds of unplanned events:
– Proactive or preventative responses
– Reactive and recovery responses
– Combinations of these

• Challenge: embed these responses into your system model






Within services, systems, personnel, information, processes, etc.
Understanding of threats & risks endangering your system
Collaborative preparedness & response
Self-healing to disruption where possible
Loss of life and property damage is prevented or minimized

Event Mechanics
• Unexpected events WILL ALWAYS occur
– Fires, floods, storms
– Security breaches
– Insolvency
– Mishaps
• Disruptions are mainly due to human error
– Mistakes & oversights
– Unpreparedness
– Improper response
• Human error is the ultimate root cause of any disruption! Event Model
Impact

Event

Response Options

Causal
Events

t-1

t

t+1

t+2

t+3

Event is
Monitored
Captured
&
analyzed

Event is reported Response

t+4
Event is evaluated t+5
Further
Response if necessary

t+N

Event Mechanics
Loss

Detection

Containment
Event(s)/
Hidden
Effects

Failover

Slowdown /
Outage
Recovery/Repair

Contingency

Resumption

System Continuity Paradigm
Detection

Event

Classification

Deterrence

Containment

Notification

Response

Remediation

Recovery

Preparedness

Continuum of Recovery Activities
Event

Time
Scale

Failover
Function A
Recovery
Function B
Recovery
Function C
Recovery
Function D
Recovery

Operation
Recovery

Resumption

Faults or Failures







Faults lead to disruption, individually or collectively
– Can be a single or multiple set of events or conditions
Usually a single cause is not enough to create a disaster:
– Almost always requires multiple failures & mistakes to reach fruition.
Usually a combination of unanticipated factors:
– Multiple chains of failure seem improbable as they get complex
– As a system gets more complex and tightly coupled, more combinations of faults can lead to failure.
– Key: little leaks can foretell bigger problems
Examples:
– Failure begins when one weak point begins linking with others
– An accident occurring after a precursor incident
They will ALWAYS happen
– The key is to prevent disruption
– Taking action in early stages breaks the chain of causation
– Barriers can trap a disturbance and keep it from leading to a disaster

Types of Faults
Simplex error

Self healing error

• Single Points of Failure (SPOFs)
• Blind spots
• Trojan effects

– Flawed system/process experiencing unusual circumstances

• Errors or mistakes
Intermittent error

Rolling error

– Oversights or neglect
– Tendency to jump to conclusions in crisis situations:
• Develop a theory and sticking to it
(right or wrong)
• Mis-judgement
• Keeping narrow focus

– Types:

• Simplex, self-healing, intermittent, rolling Single Points of Failure
-Serial Path-

Outage

Single point of failure

• An isolated element upon failure will disrupt service
• “Weakest link”
• Serial paths can appear at any logical level
– logical
– physical
– processes
– tasks
• If you cannot recover a process while in productive service, add redundancy Single Points of Failure
-Redundant PathOutage

Failover

• A redundant solution must:
– Eliminate single point of failure
– Have no single point of failure
– Have an adequate failover process
– Provide equivalent level of service
– Should be diverse
• Beware of false redundancy!
Redundant,
diverse parallel path

Blind Spots
• Problems can happen in any system that has a blind spot
– You can’t see what’s going on or detect a problem

• Show up in various forms, such as







A system whose behavior is hidden from…