Know Thy[self] Environment
- Apr 27, 2016
- Michelangelo Sidagni
Securing an environment is a constant game of cat-and-mouse. Safety measures of all kinds can (and should) be put in place to protect against malicious actors, downtime, and other business impacting variables. A production stack can be an extremely complex system with dozens of applications, databases, networking security groups, user permissions, etc., so what are some of these safety measures and practices that can be used, and whose responsibility is it to make sure the stack is safe? The developers? Yes. Sys admins? Yep. DevOps? You bet. The point here is that everyone has a role in an organization’s security whether it’s obvious or not. Let’s take a look into the DevOps realm and see what these engineers can do to prevent, detect, and remediate potentially damaging issues.
Let’s first consider DevOps as the gatekeeper of Production – the Gandalf yelling “You shall not pass” at any engineer whose changes don’t meet some criteria. Typically one of these gatekeepers should be looking for a few things and asking some important questions:
Holding your own application code base to a high standard may be tedious sometimes, but the return on investment, even if it’s just peace of mind, is tremendous.
Now we’ll assume changes are in, approved, and ready to move up the environment chain destined for Production greatness and the DevOps engineer’s role is to usher them into the next environment. We could ‘scp’ a .tar.gz over to our host, or run a pull/checkout with our favorite source control tool directly on the host, but then we’ll realize it’s not 2006 anymore. The fundamental problems with these manual approaches are that they are not repeatable to scale, they’re very susceptible to user error (one of the many reasons ‘manual’ is a four-letter word in Ops), and they’re not guaranteed to yield the same results between executions. A better solution is to leverage an automation/orchestration framework such as Ansible, Puppet, or Chef that will give order to your deployments and eliminate the chaos. Chaos introduces uncertainty in an environment, and uncertainty means insecurity. These tools provide us with human-readable “playbooks”, “manifests”, “recipes” and other trendily named configuration mechanisms that will guarantee our deployment coverage for updates. These tools can not only deploy your own application, but keep all operating system packages up to date and effectively get your hosts to exact desired states.
Our updates are now in production with the latest and greatest features. The odds are pretty good that our production environment isn’t just a single host running a static website without any other services supporting it behind the scenes. There’s probably a database to hold some dynamic content, a session caching application to let users log in, maybe an ElasticSearch server for fast search functionalities, and some other auxiliary service applications. Now the job is to stand up and protect Production from the outside world. Someone new to setting up a Production environment may stand all of these applications up on separate hosts, see that everything’s working, and call it a successful day.
Where this falls short is the [implied] lack of consideration for putting services on internet-facing hosts. If your backend database that holds your customer data is accessible to the general internet, you’re going to have a bad time. Very careful knowledge of which applications communicate with which services will help an engineer set up correct private networks and firewall rules to make sure service exposure throughout the stack is safely deployed and configured. An additional environmental strategy that is often overlooked is the creation and use of ‘service users’ on a server to apply even more restrictions on sensitive applications.
Having to actively manage all of these moving parts can seem daunting, but as we discussed earlier, automation is your friend. All of these setups and configurations can be organized and implemented with the orchestration tools mentioned before, and if they are, future update needs can be almost seamless and completed on the order of minutes. Imagine having to update an environment of hundreds of Linux servers to patch Java (not hard, I know) manually. This could take forever, but if instead you use Ansible you can roll that update out automatically to all hosts, in batches, or however you’d like in a repeatable, predictable manner. Personally, I love these tools. They cut down massively on deployments to all environments (including development with Vagrant!), are very easy to use, and help work and iteration move quickly. Though I praise these tools for their power, it’s not to say that if used incorrectly they can’t wreak some irrevocable havoc. After all, there’s still the end-user element, and we mustn’t forget the PEBCAK phenomenon. A real-world scenario was recently reported but found to be a hoax on the serverfault.com forum, but the risk of what was alleged is not too farfetched. User bleemboy claimed to have “rm -rf”-ed his entire company away with Ansible automation. Because these tools are designed to run what you, the user, tell them to, this could happen under the right (or very, very wrong) conditions.
To wrap up, enforcing policies and procedures in an organization geared towards ensuring certainty of code, packages, and permissions, will help mitigate security risk tremendously. Automation and orchestration of these policies will open up a world of heightened control and insight to an environment putting a maintainer in a fantastic spot to deploy and manage source code and packages to respond to security issues easily and efficiently. Now go… automate all the things and such…
References: