What is Network Resilience?
Network resilience is the practice of preparing for an unpredictable future in ways that safeguard a network from impact.
A resilient network is bigger than network redundancy or network survivability which are just small pieces included within a resilient network strategy. A resilient network should be able to respond to anything that comes along.
Sometimes the things that happen are anticipated. These are called knowns. For example, a planned maintenance period impacts your availability during hours of operations.
Other times network engineers experience known unknowns. This is something you are aware of not knowing, like exactly when an aging router will finally fail.
The most challenging disruptions can be unknown unknowns. These are things that we don’t know to anticipate, like a hurricane wiping or a space invasion.
The idea behind network resilience is that setting the proper precautions will allow for a quick bounce back from any attacks, outages, or network events.
Resilience is the capacity to recover quickly from difficult situations.
Network resilience strategies are not unlike personal resilience strategies. Some examples of resilience in action are Doomsday preppers, storing cans, generators, batteries, water, firewood, and gold just in case the apocalypse happens. They often stash things in different places, just in case. If the world ends, they will be better prepared to live 30 days longer than the rest of us.
Another example of resilience in action is the Italian grandmother I see at Little League games. She carries band-aids, snacks, nail clippers, a pocketknife, and holy water in her oversized purse and wires a spare key under her car, just in case. Whatever happens between a hangnail and Satan, she’s prepared. These two examples illustrate redundancy and failure isolation concepts, which we’ll discuss below.
How resilience works with your network architecture is similar.
You want to have all your supplies prepared and strategically located in case anything happens. Resilient networks strive to continue delivering service, despite disruptions.
Characteristics of a Resilient Network
We’ve done the research and come up with the four most common behaviors of organizations with resilient networks. You may notice a hyper-vigilant trend…
Periodic status audits of all network components.
- Turn on automatic updates to make sure your software and hardware are running with the latest patches
- Check your circuit health by running a ping test for upload and download speeds
- Look into how close your routers and firewalls are to end of life and if support expires at that point
- Keep your ethernet cables updated. The standard is stillCat5e even though there are Cat6 and Cat6a. If the cable says Cat5 or a lower number on its side, it’s time to replace the cable
- Check your data center cross-connects – they are just cords and cables and are also subject to aging
- Create reports on network traffic patterns over time to understand baseline network health and indicators of disruptions for predictive interventions
Redundancy and just-in-case stash
- You can build redundancy by using diverse circuits from different service providers that take unique paths. Have an engineer map out the route on a Visio diagram to confirm you aren’t purchasing a white-labeled product taking the same route.
- Store extra routers and switches to avoid delivery delays if something in the network fails
- Overprovision your circuits with additional bandwidth in case of DDoS attacks or other unknown unknowns.
- If you have an in-house server or data center, make sure the needed generators and battery backup units are available to kick in during a power failure
- Schedule daily backups to the cloud ( or better yet, two different clouds) that run after regular business hours
- Distribute business-critical applications over multiple geographically diverse data centers so operations can continue during any data center maintenance period
Failure Isolation
- Decoupling network elements keeps network outages localized to specific areas instead of spreading to other network elements.
- Implement network design principles and frameworks that lead to greater resilience. Maintain separation between critical elements and design in clusters or modules. Follow a decentralized or distributed network model rather than the traditional hub and spoke central architecture.
- Configure TCP / IP network protocol settings that automatically reroute around failed links or routers.
Crisis Communication Plan
- Distribute internal communications of a uniform message with guidelines for communicating with customers in the event you go offline. These prepared statements can help with damage control.
- A resilient network will have downtime procedures and specific points of contact assigned specific roles if an incident occurs. Practice these responses annually, like a fire drill, and work out any kinks.
- Operating manual policies and any critical organization information should be available offline in hardcopy formats for reference.
- Use communication alternatives like employee-only Linkedin, Discord, or Twitter groups to keep an open line of communication.
- Set up an EMNS emergency mass notification system that can be triggered to alert clients and employees to offset flooding the phone lines while you work to remediate the situation.
- Create spreadsheets to track items typically entered into applications like hours worked, calls made, sales closed, or out-of-office schedules.
About LiveAction
At LiveAction, we understand that anything can happen at any time. With the broadest telemetry on the market, we’ve got our eyes open to any network anomalies or unusual IP behavior. Our live topographical map allows for global traffic monitoring, quick QoS tweaking, and rerouting to minimize the impact of network events.
Learn more about how our network and application monitoring and threat detection products can help you prepare for the inevitable “something” and come through it a resilient champion. Talk to us today.