Turbot CTO Newsletter: October 2020
In American football, the prevent defense is a type of defensive play called when a team is leading in the game with very little time remaining. The core idea is to allow the opponent to execute short plays and focus all energy on preventing a very long scoring play. John Madden famously chided teams that used the strategy with this zinger: “All a prevent defense does is prevent you from winning."
Similarly, I believe that the current trend of cloud operations & security teams relying heavily on preventative controls for cloud governance will ultimately undermine the organizations business objectives and create new risks for the organization; but before I get started defending that statement, lets quickly get people up to speed with a couple of definitions.
Business Objective: A business objective is a (hopefully measurable) positive result that an organization hopes to achieve through execution of a business strategy.
To achieve those business objectives every organization needs to manage risk, and as it pertains to public cloud governance the risks typically fall into one of a few key categories:
Reliability and availability of systems
Effectiveness and efficiency of operations
Compliance with internal processes, laws and regulations.
Protection of data and systems from internal and external threats.
Control objective: A control objective is a (hopefully measurable) statement that describes a desired outcome for the organization as it pertains to managing or mitigating risk, and we achieve that objective by implementing one or more controls. Types of controls:
Preventative controls: Mechanisms that are designed to keep undesirable events from occurring.
Detective controls: Mechanisms that are designed to find and raise awareness of undesirable events.
Corrective controls: Mechanisms designed to compensate for and reduce risk when an undesirable event occurs. Side note: By definition, this implies that a detective control was in place and effective as well.
With the advent of new shiny tools in the last few years, it is becoming increasing common for me to see an organization’s entire governance control framework designed around preventative controls. Frequently to the extent that it makes it difficult to actually apply detective and corrective controls to their environment. This often leads to slower cloud adoption, developer dissatisfaction with IT and an overall reduction in the business benefits organizations were seeking when executing a cloud strategy. Cloud advocates inside organizations need to raise the alarm early on the pitfalls of this approach, hopefully I can give you some talking points to create awareness within your organization.
Green field vs. Brown field
By the time that compliance, security and IT organizations get around to implementing governance controls there will already be a significant footprint of applications in your public cloud. Preventative controls cause headaches because they will often be closing the barn door after the horses have got out. The early adopters of cloud in your organization are critical allies to have in building out your governance strategy, so implementing a framework that ignores them completely, or blocks them from executing is not viable.
Exception Management
A hard-fought lesson for many enterprise IT groups is that one size does not fit all. Your marketing organization’s public websites, your R&D organization’s data lake and legacy data center migration workloads need completely different governance controls. Teams that start with a single “best-practice” preventative controls for their whole organization/tenant structure will spend 10x as long tweaking and relaxing those policies as they come in contact with the real-world use cases. The time spent doing that and the back and forth between the app teams and cloud team is non-value added work, that slows organization velocity.
Who’s watching the watchers
Someone in your organization (or contracted to your org) is writing and implementing those preventative controls. Too often I see a very small group or even a single individual that has visibility, technical understanding, and access to change those controls. What happens if they are targeted and compromised by bad actors or just make a mistake?
Even if you are not worried about insider threats, your auditors will be. In our experience organizations completely underestimate the time and resources required to implement elevated access controls and to make changes to your control framework when your organization relies primarily on preventative controls for cloud governance.
Friction with your business teams
Preventative controls (by their very nature) cause friction in someone’s workflow. Many implementations of preventative controls just blatantly deny users from taking an action with no explanation given.
This can often lead to an unexpected consequence that undermines the concept of “least privilege”. When confronted with an error message developers and data scientists will grant higher and higher levels of privilege to their users and services accounts to work around the access denied block. When it is finally discovered that it was a preventative control causing the blockage, how many teams will have the discipline to go back and remove the expanded privileges?
This friction impacts your development teams and data scientists, with upstream impacts to the projects they are working on. Your organization is competing to attract and retain these highly paid professionals, if their productivity is impacted you will see measurable reductions in the organizations business agility. A recent McKinsey study found that organizations that were leading with regard to enterprise agility also had better business outcomes:
What is the leading practice?
One approach does not fit all. I will avoid falling into the trap of saying there is one way to solve these problems for all organizations. The approach will need to change based on your industry, your organizations culture and where you are starting from, but I think these key tenants of implementing a cloud governance strategy are broadly applicable and can serve as a template to start from:
1. Visibility and observability are mandatory. Every cloud resource that has the potential to be misconfigured needs to be cataloged in a cloud-scale CMDB and changes to that resource tracked over time. This should be solved before you put a single control in place.
2. Define a list of services that are acceptable for use in your cloud(s), and a process for evaluating and approving new services as they become available.
3. Define a data classification strategy. Many controls need to behave differently, based on the classification of data they interact with (e.g. PII, Confidential, etc.). If you don’t already have a robust data classification strategy, go build one.
4. Document control objectives for each service. You should have a short list of control objectives for each approved service. (e.g. public access, encryption, tagging, etc.).
5. Implement detective controls for existing and new environments. Your detective controls should be effective in identifying any resource in the CMDB that is not meeting your control objectives. Query your environment and test that your control logic can correctly identify non-adherence and misconfiguration scenarios. Automating this testing now will pay dividends for years to come.
6. Notify and teach. Implement a process for detective controls to trigger a notification to application teams, with the details of the control violation and documentation on how to resolve the issue (or request an exception).
7. Exception process. Implement a process to assess and approve exceptions to controls. Resources with approved exceptions should be flagged in the CMDB to prevent additional alarms from being created on that resource during the time period of the approved exception.
8. Green means go. Governance approval for production release should be tied to proof that the pre-production environment (e.g. dev, test, qa, validation, etc.) is 100% green across all controls. This means that they are either meeting the control or have an approved exception in place.
9. Implement automated corrective controls for your most critical control objectives in production. Relying on detect -> alert -> fix manual workflows in production environments is not good enough for critical controls like public access.
10. Create governance accelerators for new projects. Implement automation and corrective controls that ensure that key foundational capabilities (especially ones that are commonly misconfigured) are created in advance for application teams, this helps them move faster while complying with organizational controls.
11. Strategically implement preventative controls in production for unapproved services and for capabilities where you have created governance accelerators. At this point, we are in great shape, we have all our application teams seeing green from a governance standpoint and we have built accelerators where compliance to organization standards is difficult or time-consuming. Implementing preventative controls at this point becomes a way to increase velocity and add value to your business teams, instead of becoming an invisible barrier to their work.
TL;DR
Preventative controls are necessary but not sufficient for cloud governance. Similar to the “prevent defense” in football, overuse of the approach can cause unexpected and deleterious consequences. When starting to implement a cloud governance strategy you should focus first on getting visibility and observability, communication and detective controls in place before tackling preventative lockdown.
If you enjoy cloud governance topics like this, be sure to subscribe to this newsletter and share it with others! Are you a control freak too? Join the conversation by sharing your comments below or contacting us at cto@turbot.com.