What Happens When the Cloud Fails?

Customers of a major message hygiene vendor, including many law firms of all sizes, experienced an unexpected service outage recently. Affected firms were unable to send or receive email from the internet. The outage had attorneys and staff looking to their IT departments for guidance.

The vendor did not handle this outage well. It provided clients with almost no meaningful information by which to make informed, intelligent decisions. This left IT departments in a bad position. Without any sort of idea of what was happening or when the problem would be fixed, it was very difficult for firms to make decisions about which corrective actions they should take and when to execute them. So what could firms do?

Communicate!

This is the single biggest thing you can do to help the users of your firm.  If an event like this happens, tell them what is happening and what it means to them. Attorneys and staff don’t need the technical details of the outage, but they do need to understand what isn’t working and what, if anything, they can do work around the issues to do their jobs. They also need to be kept informed throughout the duration of the outage with updates as appropriate.

It’s also worth reminding users what they should not be doing. It is likely that many attorneys used their personal email accounts on public email networks (Gmail, iCloud, Outlook.com) during the outage to carry on firm business. While it seems like a good idea at the time, this behavior can expose the firm to risk that is easily avoidable.

Also, make sure the message will actually get to them – don’t be the IT Director who sends out an email to tell the firm that email is down.

Have a fail-back plan ahead of time.

Events tend to move very fast or slow in times of crisis and never in a positive way. Services that depend on one another fall like dominoes and remediation always takes longer than expected. It is important to have a plan put together before an outage like this. Think of it like an emergency drill for fire, earthquake, or tornado. Having a plan to handle the unexpected allows your team to move more quickly and accurately through an outage. Because many of the decisions have been made ahead of time under less intense circumstances, the IT department will understand what they need to do when an outage event happens.

In addition to providing technical guidance in the event of an outage like this, having a plan also provides the firm’s management input on which services are most important to the firm. Providing advance direction on how IT will react to service failures allows it to act confidently without being second-guessing  during or after an outage.

While it would be ideal to have a plan for every service the IT Department offers, a significant portion of the risk can be mitigated by identifying the top services IT provides and starting there. Email, phones, and document management are critical to the operation of any law firm and provide a good starting place for any firm to start a service outage plan.

Know when to fail.

Almost as important as having a plan is knowing how and when to execute that plan. Unfortunately, not all disaster recovery solutions and scenarios are created equal and in some cases the fail-back to your standard service model is a painful one. Understanding the implications of pulling the disaster recovery ripcord is essential, as is communicating the level of services that will be available to end users. When the conditions that forced the emergency no longer exist and you are ready to move your services back to the cloud, do so carefully.

Additionally, be sure that your contingency solution has been tested and is maintained as any critical service is maintained. This should include regular tests to ensure both that your fail-back plan is viable, and that your team knows how to properly execute it. Finding the budget to implement a proper disaster recovery solution can be difficult. If your firm has allocated funds for service resiliency, then they will expect it to work. Having to explain that the plan was not ready to pick up the slack or that your team was unable to execute it is a quick way to find yourself executing your own career fail-back plan.

Do I need additional protection?

A few of our clients managed during the outage without any negative effects because they had purchased an on-premises email hygiene appliance that they use in addition to the affected cloud service.  For these firms, the added cost of purchasing and maintaining this device proved to be a very useful insurance policy on that day.

For services that are truly mission critical to your firm, this may prompt the question, “Am I doing enough to protect this service?” While there are fair answers on both sides of the coin, asking the question allows you to respond to the vocal minority. It allows you to say that yes, you did consider other options and decided that the risk wasn’t worth the cost, but that the IT team did everything it could to minimize the impact of a service outage and restore service as best as it possibly could.

We all depend on a variety of technologies that integrate with each other. Despite our collective best efforts, these services will occasionally fail. It is part of our job in the IT community to be prepared for these failures and to help our users through them.