Defining Security Invariants

Sunday, February 18, 2024   Chris Farris   AWS CloudSecurity Technology
Akershus Fortress - Oslo Norway - Feb 2024

I’ve been thinking about where and how to best use Service Control Policies (SCPs) and Auto-Remediation as part of a complete cloud security breakfast.

Slide from RSAC2023
Author's closing slide from RSAC 2023

However I’m still troubled by the Pitfalls of prevention and dangers of auto-remediation. Both are very large hammers that risk impacting production in a way that may not align with the risks being addressed.

Thus, my thinking has evolved: Service Control Policies and Auto Remediation are best left to in-the-moment prevention/remediation of only the organization’s Security Invariants. This isn’t new - the concept of Security Invariants was first introduced to me in the context of SCPs.

A security invariant is a system property that relates to the system’s ability to prevent security issues from happening. Security invariants are statements that will always hold true for your business and applications. - AWS

What are some of the Security Invariants?

When crafting the invariants, you want to ensure they’re written always to be true. “No one can attach and IGW” is an impractical invariant, as you probably do need internet gateways, but rather you wish to control their usage. Thus, saying, “Only the network team can attach an IGW” allows you to craft a policy that will always be true without the need to remove controls when inevitable exceptions occur.

Other examples of Security Invariants might be:

  1. Only the Network Engineering team may create a VPC, alter route tables, or attach an IGW.
  2. Only the Security and Privacy team may make an S3 Bucket Public
  3. Only Procurement may subscribe to or accept an offer in AWS Marketplace.
  4. Port 3389 on a Windows machine may never be exposed to the world.
  5. AMIs and SnapShots may never be shared with all AWS accounts.
  6. Only Cloud Engineering can enable new opt-in regions (after ensuring GRC sign-off and the implementation of appropriate security telemetry and governance controls).

Some statements are very explicit. Unless I’m building a Honeypot, there is no reason to expose 3389 to 0.0.0.0/0. The AMI and Snapshot example above may be a bit too restrictive; some organizations might require the ability to share or publish an image.

Mapping Invariants to Controls

As mentioned above, security invariants are things you want never to happen. Ideally, your builders know not to do these, and your IaC Scanning has warned them this won’t be allowed. However, not every builder is fully aware of cloud security risks, and ClickOps is still a valid way to get things done.

The two ways to enforce invariants are Service Control Policies and event-based Auto-Remediation. SCPs will deny the builder from violating the invariant, while an event-based remediation will immediately shut it down. SCPs are the preferred method, but as previously documented, they aren’t as flexible as needed to cover the list of invariants.

AWS has implemented a possible third form of invariant control, the “Block Public Access” (BPA) settings. So far, BPA is available for S3 Buckets (on a per-bucket and per AWS-Account basis) and for snapshots and AMIs (on a per-account-region basis). BPA is a helpful way to implement invariants, but it requires adding SCPs to ensure the BPA isn’t violated.

Let’s take a few examples of invariants and see how we can implement them.

Root Usage

Invariant: “Only Cloud Engineering, coming from the Atlanta office or VPN, may log in as root”

So, I suppose if I were to be completely pedantic, I’d have to point out that you cannot prevent a root login, you can just deny the root user from doing anything after logging in. And if I put on my threat detection hat, that’s not the end of the world because if a threat actor does log in as root and cannot do anything, that’s an excellent data point to start an incident investigation.

Implementation of this invariant is pretty easy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyRootUsage",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:PrincipalArn": ["arn:aws:iam::*:root"]
        },
        "NotIpAddress": {
          "aws:SourceIp": ["List","of","approved","CIDRs"]
        }
      }
    }
  ]
}

Apply this to the root OU of your AWS organization, and it will protect all the accounts (except the organizational management account).

Marketplace

Invariant: “Only Procurement may subscribe or accept an offer in AWS Marketplace.”"

This is an SCP I implemented at previous organizations after a development team spent a ridiculous amount of money because they wanted to use a different flavor of MQ than the engineering team’s standard.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MarketplaceWriteActions",
      "Effect": "Deny",
      "Action": [
        "aws-marketplace:AcceptAgreementApprovalRequest",
        "aws-marketplace:CancelAgreementRequest",
        "aws-marketplace:RejectAgreementApprovalRequest",
        "aws-marketplace:Subscribe",
        "aws-marketplace:Unsubscribe",
        "aws-marketplace:UpdateAgreementApprovalRequest",
        "aws-marketplace:CreatePrivateMarketplace",
        "aws-marketplace:StartPrivateMarketplace",
        "aws-marketplace:StopPrivateMarketplace",
        "aws-marketplace:DescribePrivateMarketplaceStatus",
        "aws-marketplace:AssociateProductsWithPrivateMarketplace",
        "aws-marketplace:DisassociateProductsFromPrivateMarketplace",
        "aws-marketplace:ListPrivateMarketplaceProducts",
        "aws-marketplace:DescribePrivateMarketplaceProducts",
        "aws-marketplace:ListPrivateMarketplaceRequests",
        "aws-marketplace:DescribePrivateMarketplaceRequests",
        "aws-marketplace:UpdatePrivateMarketplaceSettings",
        "aws-marketplace:DescribePrivateMarketplaceSettings",
        "aws-marketplace:CreatePrivateMarketplaceProfile",
        "aws-marketplace:UpdatePrivateMarketplaceProfile",
        "aws-marketplace:DescribePrivateMarketplaceProfile"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:PrincipalArn": [
            "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_CloudEngineering_*",
            "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_Procurement_*"
          ]
        }
      }
    }
  ]
}

In this example, the condition is excluding a specific AWS Identity Center role that’s dedicated to the procurement team. WildCards in the place of the AWS Account ID and AWS SSO role suffix allows the SCP to apply to all accounts. Marketplace is one of those services that will probably be managed via ClickOps, so using Identity Center roles makes sense.d

Public Buckets

Invariant: “Only the Security and Privacy team may make an S3 Bucket Public”

This invariant leverages both Block Public Access and a Service Control Policy. AWS’s best practice advice is to apply S3 Block Public Access at the account level. This single setting can be applied when an account is created. The issue is that anyone with admin access to the account can disable BPA and make a bucket public. So this invariant comes in two parts:

Step one is to enable BPA at the account level1:

aws s3control put-public-access-block --account-id 123456789012 \
  --public-access-block-configuration BlockPublicAcls=TRUE,IgnorePublicAcls=TRUE,BlockPublicPolicy=TRUE,RestrictPublicBuckets=TRUE

If you already have public buckets in an account, you do NOT want to run the above. As of April 2023, New S3 Buckets will automatically have block public access enabled, but care should be taken when enabling block public access on existing buckets.

Step two is to prevent anyone but the security team from deactivating it via SCP:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PreventPublicBuckets",
      "Effect": "Deny",
      "Action": [
        "s3:PutAccountPublicAccessBlock",
        "s3:PutAccessPointPublicAccessBlock",
        "s3:PutBucketPublicAccessBlock"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:PrincipalArn": [
            "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_CloudSecurity_*"
          ]
        }
      }
    }
  ]
}

This policy prohibits anyone except security from removing or altering the account-wide or per-bucket block public access configuration. The permitted role is an AWS Identity Center role, which implies that all public bucket exceptions will be processed via ClickOps. A more streamlined and governed method should be considered, but I will leave that for a future blog post.

Ransomware Protection

Invariant: “Port 3389 on a Windows machine may never be exposed to the world.”

Based on the invariant as written, the following must all be true:

  • Ingress port is 3389, or a range that contains 3389
  • Source IP must be 0.0.0.0/0
  • The security group in question must be attached to an EC2 instance that is windows based.

This invariant references 3389, but several ports fall into the RDP (ransomware deployment protocol) category. In this example, auto-remediation is the best option. There is no condition key on ec2:AuthorizeSecurityGroupIngress that can filter for the above. As I mentioned above, these invariants should be enforced in an event-based manner. If you look at how the invariant can be violated, it could be:

  1. Security group that’s already attached to an instance is modified to expose 3389
  2. An existing security group with 3389 already exposed is now attached to a Windows instance.

The following CloudCustodian Policy would remove 3389 or 445 permissions if detected. However, it would do it on Windows and Linux machines, and it would potentially alter a security group attached to a Linux instance that had port 0-65535 open (in violation of the Three Laws).

policies:
  - name: ops-access-via
    resource: aws.security-group
    filters:
      - type: ingress
        IpProtocol: "-1"
        Ports: [445, 3389]
        Cidr: "0.0.0.0/0"
      - type: ingress
        Ports: [445, 3389]
        CidrV6:
          value: "::/0"
    actions:
     - type: set-permissions
       # remove the permission matched by a previous ingress filter.
       remove-ingress: matched

Safely leveraging these controls

Preventing security risk comes at the expense of increasing operational risk. It is difficult to get organizational buy-in to restrict what builders can do or alter their environment. Therefore, the security team must consider what security invariants they will enforce and be careful in how they are implemented.

With service control policies, it is possible to review CloudTrail to determine who may be calling the specific actions that will be denied. This would allow for a communications plan or potentially even a temporary exception to the invariant if a team needs time to adjust to a new invariant. If a company’s AWS Organizations OU structure is implemented with production and non-production OUs, consider applying the SCPs to the non-production OUs for a bake-in period before applying them to Production OUs.

Auto Remediation generates even more operational risk. The longer a misconfiguration exists, the more likely its functionality is required. An auto-remediation should occur nearly instantly. Therefore, only event-based (not scheduled) remediations should be considered. Taking inspiration from Issac Asimov, here are the Three Laws2 of Auto Remediation:

Three Laws

  1. A bot must not harm production by taking an irrevocable action (such as deleting stateful resources).
  2. A bot must execute its orders (to secure the environment) in a way that minimizes the risk of harm to production.
  3. A bot must announce its own existence and actions whenever it acts on the first or second law.

Minimizing harm to production comes from reducing the time between the introduction of an issue and the time the issue is fixed. It should take the most minimal action to ensure the environment’s security.

When leveraging auto-remediation, realize there may be a race condition between the IaC deploying the misconfiguration and the auto-remediation event. That is why rule #3 is critical. A human must know and respond when this occurs to fix the underlying issue.

Unlike a mythical general intelligence, these laws are intended to be obeyed by the security engineer deploying auto-remediation. It’s the human who must decide what resources to alter and how they are altered. It’s the human who must ensure there is a path from the bot to the carbon-based lifeforms who need to know an action occurred in their environment.

Conclusion

With these safety mechanisms in place, combined with a broad communications strategy and matching rules to alert IaC authors pre-deployment, implementing these hard controls for Security Invariants can make the cloud a safer place.


  1. I’m not sure why the command requires the account_id, since the calling principal has to be in the account, and there are no resource policies (I’m aware of) that would allow cross-account access. ↩︎

  2. This is a slight alteration of the original three laws I proposed in 2022’s The allure of auto-remediation ↩︎