I came across an interesting blog post over at Finextra which got me thinking about a topic that has been in the back of my mind for a while now… the systemic risks of cloud computing concentration. It seems like everyone has made or is making the move from maintaining big, expensive data centers to letting Amazon, Microsoft, or Google worry about the buildings, infrastructure and hardware. I can’t say I blame them, especially since getting new servers and other hardware has become a much more difficult and time consuming process now that all of our supply chains seem to have been broken.
But there is a downside as well – when one of the big cloud providers is having a bad day, people notice – most of the web sites and services we depend on depend on at least one of these providers being up and running. And there have been some major outages in the past year. So far, these outages have not had a systemic impact on the financial system. So far.
While the big cloud providers have all sorts of options to make systems within their perimeters fault tolerant to a degree, we have seen provider level outages which disrupted the Internet. In order to achieve true resilience when one of these events happens, organizations need to be thinking about true multi cloud solutions – and there are some significant hurdles which need to be surmounted to do this.
The biggest hurdle is the cloud vendors’ tempting managed offerings – managed Kubernetes clusters, databases, serverless services – these are great for standing up new services quickly, but make multi cloud operation difficult, if not impossible. Even if another vendor has the same kind of managed database, it is going to be just different enough from your primary vendor to make porting your systems over expensive and time consuming. This is not a bug – it is a feature. Vendors want to lock customers into their product (and who can blame them?).
In the financial world, regulators are taking notice, and institutions and their service providers (as well as cloud providers) need to be thinking about true multi cloud resilience solutions before the next big outage hits.
If you are at the beginning of your cloud journey and your application is critical, design it to be multi cloud from day one – this will be waaaaaay less expensive and complex than trying to address the issue after you have a million customers.
When making architectural decisions, consider the benefits – and the costs – of adopting core services which are specific to your primary cloud provider. Think about how you would/could replicate them in another provider’s environment BEFORE you get locked in.
Given the increasing automation and speed we are seeing in financial services, it is only a matter of time before there is an event which really galvanizes regulators’ attention; the time to be thinking about diversifying your cloud infrastructure is now.