Creating a Virtual Clean Room
10 August, 2020
by Mike
Whether it's due to regulatory requirements, for the protection of our customers' data or to prevent any financial fraud, we all have a burden to shoulder for the sake of security. In many of the environments I have managed, this comes in the form of separated responsibilities between development, trading, operations and technical operations. This has traditionally created cascading tiers of administrative power with increasing levels of trust and responsibility centred around one department and, ultimately, one tier of seniority within that department. Not only has the popularity of a DevOps culture has made that pattern untenable, working in a purely remote way and using ethereal infrastructure removes many of the pillars of security one would naturally find in a secure environment.
As a small and remote working engineering team, potentially dealing with trillions of dollars of people's money, we had to find another way. I trust myself and I trust my team, but none of us are Liam Neeson and the particular set of skills we have are centred entirely around IT, not tracking down family members. Moreover, a single user can be phished or subverted (see recent Twitter events) much more easily than multiple simultaneous targets.
So let's look at the usual solution to this sort of problem. An organisation dealing with sensitive data will hopefully have some sort of clean room. Access to this room will be controlled by physical security, with access to the room being granted by somebody other than the person for whom access is requested. That person will be able to enter room only once peripherals such as USB drives/storage, mobile phones and so on have been left behind and they will access to a machine inside which is their interface to the protected data. You would expect access to the room to be logged, access to the machine therein to be logged and probably all activity to be logged also. You would expect the USB ports on that machine to be inactive (in the passport office in the UK, I've seen these disabled with sticky tape - take that attackers!) and the networking to be locked down to only that which is required. Additionally activity in the room may well be video recorded. All this can provide a good balance of restriction and auditing; all too often companies show a strong bias towards audit so incidents can be investigated, rather than prevention.
Whilst the development and other teams have their access restricted, my team need to have this kind of access, but are a distributed team of people, with no office to build a clean room in, and anyway no location which is convenient for quick access to multiple people for us to build one. We are not happy with the "admin party" attitudes that can accompany cloud computing, so it's therefore been necessary for us to build a virtual clean room, using technology to provide equivalent protections. Here I would like to share a little of how that works.
We need to restrict access both to our production systems and to the cloud computing consoles. Thankfully, we are already taking a fully Infrastructure as Code approach and have little need for day to day ssh access to our servers and services, but we do still have the issue that if any one person has access to these elements they can too easily circumvent the protections we put in place. Our production credentials are not held on our server base images but are injected into new instances by an autonomous process. Only this process knows the keys to unlock these credentials and this process requires multiple DevOps personnel to start, after which time it can simply make itself inaccessible, so that part is fairly wrapped up.
So let's talk about our virtual clean room. In an emergency, this service is brought up, immediately patching itself and rebooting to ensure any local exploits in the base image are taken care of. It then starts a VNC session of a virtual X server which is only locally available and a proxy service for the VNC session which records all activity for later replay, and those logs are sent off to storage outside of DevOps control. This is the only open port on the box and the only way to use its facilities. So that's our video camera sorted.
Now let's turn our attention to the USB attack vector. Obviously there's no literal USB drive here, but we're concerned about data being offloaded to other storage. The VCR uses a small and ephemeral file system, effectively wiped clean on each invocation. The user on this box that runs the X session has absolutely no special privileges, so there's no mounting of remote filesystems or attaching of extra cloud storage available. There is no sudo access and root has no valid password or authorised ssh key (and sshd is not running anyway). So that's our disabled USB ports analogue ticked off.
Next up to consider was restricted networking, which was an interesting one. We wanted the VCR instance itself to have fairly free internet access to ensure that security patches and other concerns were taken care of, but to restrict the user to only those websites and ssh destinations authorised by the VCR. We particularly wanted to avoid having two instances, one arbitrating the other, and that felt like needless and messy complexity. Thankfully this is where the iptables module "owner" came in to play, enabling filtering based on the unix user handling network traffic. This means we can force all VCR web traffic through a local proxy to restrict access and provide full activity logs, and restrict ssh access and other traffic to strict requirements.
Next up, we need to restrict access to web consoles. There are a few approaches to this and I will gloss over the one we have chosen, but the most trivial way to visualise it is to ensure that each DevOps member only has access to half of the password and, since multiple people can connect to the VNC session at once, each can type in their half without the other seeing what it is. Of course this doesn't scale terribly well to multiple team members, has issues with logging and so on, so alternatives perhaps involving encrypted client side certificates, password injection, messing around with browser configuration and others may be of interest. Although the user is unprivileged, not every process need be. With access to the machine images restricted to build systems, and checkins verified by multiple eyes, we have been able to create an environment with only the exact tools and facilities that are permitted, that requires a business process to invoke, requires multiple people in different regions to use, where sending data to arbitary websites, storage systems and servers is impossible, a four-eyes policy is enforced, all activity is recorded and can be replayed and access to the environment and the ongoing servers is logged. This keeps our customers' data safe, us safe and meets a host of regulatory requirements. It even has a nice desktop background.