In this article, I’ll introduce you to Cloud platforms, discuss the services they provide, the cost (not just monetary cost) and the problem of lock-in. I’ll also discuss hybrid systems that can run from the Cloud or where some of their components can run from the Cloud. At the end of this article you should a solid understanding of the current state of Cloud platforms, how to integrate the Cloud into your systems and how to manage the risks.
Why Go to the Cloud?
The number one reason to go to the Cloud is that the Cloud platforms provide so much value that is important even for small companies. If you had to build even the most essential parts you would spend a lot of time and even more time maintaining and addressing all the issues that your half-baked system causes. Today’s systems handle more and more data and have higher expectations in terms of uptime, availability and responsiveness. Even startups in beta must provide reliable service, even if not very rich. Letting the system crash and discover it in the morning with 50 angry user emails is not an option anymore. Now, the Cloud is not a magic panacea. You still have to work hard to put things together and use the Cloud offering intelligently, but all the building blocks, as well as integrated solutions, are available to you.
Compute and Local Storage
At the most basic level, all Cloud platform give you a slew of VMs with different configurations of CPUs and memory and disk space. CPU and memory are fairly straight forward. Most Cloud platform start to experiment with GPUs too for specialized workloads. Disks present another dimension to explore. You can attach volumes, SSD disks and sometimes even local disks (tied to a physical node). The main attraction is the elasticity with which you can start more machines or stop machines base on demand. If you have highly fluctuating demand, weekend or seasonal demand spikes you can match your capacity to your demand and respond quickly to unexpected spikes.
Cloud storage is offered by all Cloud platforms. The interesting fact is that all big Cloud providers have very similar offers and their APIs, as well as requirements, are pretty much compatible. AWS S3 set the standard and everybody else followed suit. Google Cloud Platform is supposed to be fully compatible. Azure Blob Storage is very similar, as is Alibaba Cloud (the largest Chinese Cloud platform).
Containers are the latest hottest technology. All Cloud platforms provide container hosting service. Google’s GKE (based on Kubernetes) leads the pack, but everybody else is following closely with hosting, registries and orchestration solutions based on Kubernetes (Google), Mesos (Microsoft) and Docker swarm (Ali Cloud). AWS has its own orchestration solution.
All Cloud platforms provide an easy way for you to segregate your fleet of machines into independent networks and control access in a very fine-grained manner. This is not trivial to set up on your own. With Cloud platforms, you enjoy the benefit of having the best experts thinking about how to manage your network for you and respond to issues without you even knowing that there was a problem in many cases. Load balancing is another standard feature you get for free and it is easy to customize.
APIs and Monitoring
All that goodness is exposed via APIs that you can control programmatically, in addition to using a Web console. You get fantastic monitoring that lets you understand the state of your infrastructure at several levels.
There are many more services that are more specific, such as notifications, CDN, DNS, special data stores and various machine learning or IoT offerings. Those are less uniform and in different stages of maturity. Pick and choose carefully.
The Dreaded Lock-in
A lot of organizations are concerned about being locked in to their Cloud provider. This is a real concern. One solution is to just embrace it. See how Netflix bet the company on AWS. Another approach is to be conservative in what Cloud services you use. If you rely mostly on compute, Cloud storage and networking it should be relatively easy to switch. Your DevOps automation should try to abstract away the specific Cloud platform as much as possible. I recently moved a small system from Azure to Ali Cloud without a hitch.
Cost is a big concern. The Cloud can be expensive. It is easy to just start a bunch of VMs and let them sit idle or allocate tons of storage you don’t need. You need discipline and active management. All Cloud platforms offer cost calculators and very clear pricing information. But, unlike traditional capacity planning this is an ongoing activity especially when Cloud platform providers keep introducing new configurations and various discounts.
Most systems will not be fully on the Cloud. Even if you start from scratch you may want to be able to run your system or part of it on your laptop. Sometimes you must integrate with an existing system. Migration projects of large systems can take years. During the migration, you will need your existing systems to collaborate with the Cloud.
The Cloud can solve a whole host of difficult problems and allows you to benefit from best-in-class infrastructure. But, the process is big and complex and you need to make sure you’re using the Cloud effectively and economically. The vendor lock-in is not as serious a problem if you plan for it and engineer your system accordingly.