Recently we migrated our cloud infrastructure from Amazon to a different cloud vendor. I won’t get into the details of why we had to do it, but the experience of the migration itself was interesting and I want to provide some guidelines around the things you should consider, particularly around infrastructure automation, if you find yourself in a similar situation.
Going into this migration discussion, we clearly knew that Amazon was a better cloud vendor than the new one. We looked at a comparison site to compare the features of the two cloud vendors. Amazon was a clear winner. This comparison gave us some pointers of where the new cloud vendor would be lacking. But rather than focusing on individual features, we decided to come up with own “requirements spec” for our infrastructure, and then see how the other cloud vendor fares. We knew we would have to make some compromises, but most importantly, we understood where we would not make compromises at any cost.
Our application is a fairly straight-forward Rails app backed by a Postgres database and Memcache, hosted on Amazon virtual machines. We use a lot of Amazon services like S3 (storage), ELB (Load balancer), Route 53 (DNS), SES (Email server), et al.
One of the big things we were concerned about from the get go, was the “ease of automation” to setup our infrastructure with the new cloud vendor. Our existing infrastructure setup is automated to a large extent using Puppet. Our infrastructure setup falls into three steps: Provisioning, Configuration & Deployment. I will explain these in a bit. These steps have different degrees of tolerance for automation and we decided early on which of these should be “should-be-automated” versus “must-be-automated”. Lets talk about the steps:
This is the first step in infrastructure setup which involves creating virtual machines or in technical parlance “provisioning” them. Once you have provisioned an instance, you get a virtual machine with base OS installed on it and an IP address and credentials to access it. For our new cloud provider, this would be a manual step whereas Amazon lets you automate this piece very nicely, if you use the AWS API. We thought this falls under the “should-be-automated” because we did not see ourselves spinning up new machines frequently. Surely we were giving up the capability of “auto-scaling” our infrastructure, but we were ok with it. The way auto-scaling works is that, Amazon is going to monitor the load on your machines and automatically create new machines to handle extra load. It is actually a pretty cool feature, but we thought we did not need it, not at least in the near term.
This is the step where a raw virtual machine is transformed into something that it is meant to be. So for example, a virtual machine that is supposed to be the database server would have the database server software and all the other pieces it needs installed on it. This part is probably the most complicated or rather time-consuming to set up, because it involves configuring a virtual machine to be either a application server, web server, database server, cache server, load balancer, email server, router, et al. We did not automate all of it to start with, like the email server, router, et al because they are pretty much one-time setup activities and we did not find it worth our time. So this step falls somewhere in between “should-be-automated” and “must-be-automated”. As I explained before some of the things like email server, router are one-time setup activities and we would be ok with not automating them. But for things like web server, database server, these fall under “must-be-automated” category because we set them up and tear them down frequently, not just in production but in all the downstream environments like staging, integration, development. The other advantage is, if we were to bring up new servers (web or database) in response to scaling or outage situation, we should be able to do it fairly quickly and most importantly an exact replica of what we had before.
The last step in the process of infrastructure setup is application deployment which falls under the “must-be-automated” category. Deployment means every time we make a change to our code base, an automated process would build the code, run the tests, and deploy it to all the different machines like web server, application server, database server, et al. Having this step automated is the cornerstone of continuous delivery, which is something we highly value. Continuous delivery means being able to deploy changes to an environment quickly and with least manual intervention. This gives us the ability to make rapid changes to production environment, get feedback quickly from users and make changes accordingly. Luckily for us, this step with our new vendor was going to be fully automated, else that would have been a showstopper.
The other things that we considered when moving to the new cloud vendor were:
- How do we migrate data to the new cloud infrastructure?
- What are the data backup solutions available with the new cloud provider?
- Is the new cloud PCI compliant?
- What are the SLAs (Service Level Agreement) for the new cloud? What are the escalation routes? Who will the development team have access to when emergency arises?
- Does the new cloud use OpenStack?
- Does it provide services like email server, load balancer, router, et al or do we have to build these ourselves?
- Does it support encrypted backups?
- What kind of file storage does it provide? Does it provide streaming capability for say video, audio, et al?
- Does it provide identity and access management solution?
- What kind of monitoring solutions does the cloud vendor provide?