Environments gone wild

It is common for projects to have "environments" for different stages of development -- Development, System Integration Testing, SIT; Quality Assurance, QA; Production, Prod).

Environments per stage

Working with larger enterprise clients we often see this taken to the next level, with environments like SIT1, SIT2, SIT3, QA1, QA2, QA3, PenTest, Training, you-name-it. Also, with separate teams working on many different but interconnected components we even see each team having their own environment collections.

The number of environments can get out of control.

Environments per stage

This often leads to confusions on many fronts.

"What versions are running in QA3?"

"Is anyone even using SIT5?"

"Can we use QA5 for this month's Performance Tests?"

(Even worse) "Can we use SIT2 for pre-production training and SIT3 for QA of version 3?"

This isn't an uncommon problem, over the years many enterprises have slowly gotten themselves into this situation in various ways. Many even take actions to avoid some of the above problems. Environment dashboard; and environment and service catalogs making current usage, purpose and deployed versions visible at a glance. My favourite "hack" that inevitably gets implemented is unlinking environment names from their purpose.

"Let's use abstract environment names"

"We can limit the number of environments by repurposing them as needed"

"We'll name our environments after [world capitals, US states, Zodiac signs, mountains of the world, cat breeds, California landmarks], then our teams can book the environments when they need them."

Brilliant, right? Let's see:

"What versions are running in Olympus?"

"Is anyone even using Fuji?"

"Can we use Everest for pre-production training and Kilimanjaro for QA of version 3?"

I don't believe that this solves the problem.

Everything-as-code

I don't know which came first, but in recent years we've seen "Infrastructure as code", "Config as code", and even "Everything as code". "-As Code" tools, like Ansible, Puppet, Chef, and CloudFormation are great at solving the problem of inconsistent environments. Setting up a new production-like environment is easy these days.

If these tools are already being used, then the first piece of the solition is done. With a little more effort, we should be able to add 1. scripted tear down, and 2. scripted restore of environments.

Once we can easily create a new environment, tear it down when not in use, and restore it back to previous state when needed again; can we give the users the same ability?

Get your environments under control

Let's enforce every team to maintain 4 scripts for every environment type.

Launch -- Initial setup and launch)
Stop -- Tear down / Backup and shut down everything
Start -- Restore environment to last known state
Delete -- Shutdown and remove all traces (Optional)

With these in place, we can put together an "Environment Control Center" -- A self service portal where your teams can launch their environments when and as they need them. Furthermore, we can schedule the Stop scripts at the end of each day, and could even - if necessary - schedule the Start scripts each morning Monday to Friday.

Introducing wildcardenv

wildcardenv-ui

wildcardenv is a simple example of an "Environment Control Center" that allows us to start and stop environments on demand. With the help of a simple S3 Static Website, and a Python backend running on Lambda and some cool Route 53 magic.

wildcardenv-arch

From the wildcardenv UI, a user can start and stop the existing environments or launch a new environment from list of CloudFormation templates (loaded from an S3 bucket).

My personal favourite part, is the wildcard DNS record (*.env.example.com) that points to the wildcardenv control center. This is the first half of the "Route53 magic" I mentioned earlier. The other half, is that wildcardenv adds a DNS record for each environment in Route53 when they are created and removes it when the environment is stopped.

The result is, that users can still use the URL of an environment even when the environment is not running. For example, if we visit qa.env.example.com when the corresponding stack has been stopped, the DNS will instead resolve to wildcard record and instead the URL will take us the control center from where we relaunch the QA environment. (To achieve this wildcardenv does require the CloudFormation templates to have a DNS target returned as an Output).

Lastly, it's worth mentioning that the included Launch/Stop/Start/Delete actions are simple create/delete CloudFormation commands. You may have more complicated requirements for each of these actions. If so I encourage you to create a fork of wildcardenv on GitHub to create your own personalised version.