PrimeHarbor | Defining Security Invariants

Note: This post has been revised to include the new capabilities released by AWS prior to re:Invent 2024.
You can also check out the re:Invent presentation we did with Securosis: “Security invariants: From enterprise chaos to cloud order” slides - video

Security Invariants are a key component of any cloud security or governance strategy. They are strong preventative or reactive controls that enforce the security state of your environment. AWS defines them as:

A security invariant is a system property that relates to the system’s ability to prevent security issues from happening. Security invariants are statements that will always hold true for your business and applications.

There are several ways to enforce Security Invariants:

Service Control Policies (SCPs) - the OG of Security Invariants- define the maximum permissions of identities in your organization.
Resource Control Policies (RCPs) - a new type of Organization Policy that defines the maximum permissions of resources in your organization.
Declarative Policies (DPs) - These policies exist outside of IAM evaluation and enforce specific controls at the AWS Service.
Permissions Boundaries—These IAM Policies don’t grant permissions but rather define the maximum permissions of the principal (IAM User or Role) to which they are attached.
Auto-Remediation - Not every invariant you want to define can be managed via IAM or Declarative controls. Some of your invariants have to be corrected immediately after being detected.

Collectively, SCPs, RCPs, and DPs are Organizational Policies because they are managed and enforced via AWS Organizations. Organizational Policies and Permissions Boundaries will deny the builder from violating the invariant, while an event-based remediation will immediately shut it down. Organizational Policies are the preferred method, but as previously documented, they aren’t as flexible as needed to cover the list of invariants.

What are some of the Security Invariants?

When crafting the invariants, you want to ensure they’re written always to be true. “No one can attach and IGW” is an impractical invariant, as you probably do need internet gateways, but instead, you wish to control their usage. Thus, saying, “Only the network team can attach an IGW” allows you to craft a policy that will always be true without the need to remove controls when inevitable exceptions occur.

Other examples of Security Invariants might be:

Only the Network Engineering team may create a VPC, alter route tables, or attach an IGW.
Only buckets approved by the Security and Privacy team may be made public.
Only Cloud Engineering, coming from the Atlanta office or VPN, may log in as root
Only Procurement may subscribe to or accept an offer in AWS Marketplace.
Port 3389 on a Windows machine may never be exposed to the world.
AMIs and SnapShots may never be shared with all AWS accounts (Optional Clause: “unless shared from the product publishing AWS account”).
Only Cloud Engineering can enable new opt-in regions (after ensuring GRC sign-off and the implementation of appropriate security telemetry and governance controls).
All new S3 Buckets must be created with ACLs disabled.
Only approved External AWS Accounts may assume into our organization

Some statements are very explicit. Unless I’m building a Honeypot, there is no reason to expose 3389 to 0.0.0.0/0. The AMI and Snapshot example above may be a bit too restrictive; some organizations might require the ability to share or publish an image. Thus, we append “unless shared from the product publishing AWS account.”

Let’s take a few examples of invariants and see how we can implement them.

Service Control Policies

Root Usage

Invariant: “Only Cloud Engineering, coming from the Atlanta office or VPN, may log in as root.”

In November 2024, AWS released Centralized Root Management, a capability that allows you to delete the root credentials for member accounts in your organization. However, those credentials can be re-created, and principals with access to the Organizational Management Account (aka Payer) can call sts:AssumeRoot. So, protecting root usage behind an invariant such as this is still necessary.

Now, if I were completely pedantic, I’d have to point out that you cannot prevent a root login; you can just deny the root user access after logging in. But if I put on my threat detection hat, that’s not the end of the world because if a threat actor does log in as root and cannot do anything, that’s an excellent data point to start an incident investigation.

Implementation of this invariant is pretty straightforward:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyRootUsage",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:PrincipalArn": ["arn:aws:iam::*:root"]
        },
        "NotIpAddress": {
          "aws:SourceIp": ["List","of","approved","CIDRs"]
        }
      }
    }
  ]
}

Apply this to the root OU of your AWS organization, and it will protect all the accounts (except the organizational management account).

Marketplace

Invariant: “Only Procurement may subscribe or accept an offer in AWS Marketplace.”

This is an SCP I implemented at a previous organization after a development team spent a ridiculous amount of money because they wanted to use a different flavor of MQ than the engineering team’s standard.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MarketplaceWriteActions",
      "Effect": "Deny",
      "Action": [
        "aws-marketplace:AcceptAgreementApprovalRequest",
        "aws-marketplace:CancelAgreementRequest",
        "aws-marketplace:RejectAgreementApprovalRequest",
        "aws-marketplace:Subscribe",
        "aws-marketplace:Unsubscribe",
        "aws-marketplace:UpdateAgreementApprovalRequest",
        "aws-marketplace:CreatePrivateMarketplace",
        "aws-marketplace:StartPrivateMarketplace",
        "aws-marketplace:StopPrivateMarketplace",
        "aws-marketplace:DescribePrivateMarketplaceStatus",
        "aws-marketplace:AssociateProductsWithPrivateMarketplace",
        "aws-marketplace:DisassociateProductsFromPrivateMarketplace",
        "aws-marketplace:ListPrivateMarketplaceProducts",
        "aws-marketplace:DescribePrivateMarketplaceProducts",
        "aws-marketplace:ListPrivateMarketplaceRequests",
        "aws-marketplace:DescribePrivateMarketplaceRequests",
        "aws-marketplace:UpdatePrivateMarketplaceSettings",
        "aws-marketplace:DescribePrivateMarketplaceSettings",
        "aws-marketplace:CreatePrivateMarketplaceProfile",
        "aws-marketplace:UpdatePrivateMarketplaceProfile",
        "aws-marketplace:DescribePrivateMarketplaceProfile"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:PrincipalArn": [
            "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_CloudEngineering_*",
            "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_Procurement_*"
          ]
        }
      }
    }
  ]
}

In this example, the condition excludes a specific AWS Identity Center role dedicated to the procurement team. Using WildCards instead of the AWS Account ID and AWS SSO role suffix allows the SCP to apply to all accounts. Marketplace is one of those services that will probably be managed via ClickOps, so using Identity Center roles makes sense.

Resource Control Policies

Resource Control Policies apply to the resources in your organization and can control any principal, even those outside of your control. Only a few services support RCPs at this time: S3, KMS, SQS, SecretsManager, and IAM Roles (via STS).

Public Buckets

Invariant: “Only buckets approved by the Security and Privacy team may be made public.”

There are several ways this could be done. A previous version of this invariant leveraged S3 Block Public Access and SCPs. However, leveraging the new RCPs may be the best way to govern public buckets while centrally managing exceptions to the “no public buckets rul.e”

This RCP defines a list of approved public buckets while blocking access from outside the organization to all others:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3DataPerimeterWithApprovedExceptions",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "NotResource": [
                "arn:aws:s3:::my-public-bucket/*"
            ],
            "Condition": {
                "StringNotEqualsIfExists": {
                    "aws:PrincipalOrgID": "MY_ORG_ID"
                },
                "BoolIfExists": {
                    "aws:PrincipalIsAWSService": "false"
                }
            }
        }
    ]
}

The RCP denies any principal outside the organization unless it’s an AWS service or the Resource is in the list of NotResource ARNs. We use the ARN notation here to ensure that even public buckets cannot be misconfigured to the point that outsiders can alter the bucket itself.

KMS Key Ransomware protection

Another Ransomware protection revolves around protecting how KMS can be used:
Invariant: “Only the Cloud Engineering Team may Delete a custom key store or imported key material, and the deletion must be done from the office.”

This can be implemented by either an SCP or an RCP. As an RCP, it would look like:

{
    "Version":"2012-10-17",
    "Statement":[
       {
          "Effect":"Deny",
          "Principal":"*",
          "Action":[
             "kms:DeleteCustomKeyStore",
             "kms:DeleteImportedKeyMaterial"
          ],
          "Resource":"*",
          "Condition": {
            "ArnNotLike": {
               "aws:PrincipalArn": "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_CloudEngineering_*"
            },
            "NotIpAddress": {
               "aws:SourceIp": [ "List", "of", "approved", "CIDRs" ]
            }
          }
       }
    ]
 }

Declarative Policies

Public Resources

Invariant: “AMIs and SnapShots may never be shared with all AWS accounts.”

Previous methods for enforcing this invariant required setting Block Public Access in each account and region and then applying an SCP to prevent anyone from turning off those declarative controls. With Declarative Policies, we can enforce this across the entire organization.

{
  "ec2_attributes": {
    "exception_message": {
      "@@assign": "Sharing of AMIs is denied by Organizational Policy"
    },
    "image_block_public_access": {
      "state": {
        "@@assign": "block_new_sharing"
      }
    },
    "snapshot_block_public_access": {
      "state": {
        "@@assign": "block_all_sharing"
      }
    }
  }
}

If your business involves shipping software to customers as an AMI, you can add an optional clause to the invariant: “unless shared from the product publishing AWS account”. Unlike authorization policies like SCPs and RCPs, these Management Policies can override directives at a higher level in the organization. That means we can apply the above policy to the root OU and apply this policy directly to a dedicated AMI sharing account:

{
  "ec2_attributes": {
    "image_block_public_access": {
      "state": {
        "@@assign": "unblocked"
      }
    },
    "snapshot_block_public_access": {
      "state": {
        "@@assign": "unblocked"
      }
    }
  }
}

Permissions Boundaries

Permissions Boundaries, like SCPs, define the maximum permissions of an IAM User or Role. However, where SCPs are managed by AWS Organizations and apply to entire accounts, Permissions Boundaries are policies inside an AWS Account and apply to principals individually. You can read more about how these can fit into your security invariant strategy in our blog post Implementing Security Invariants in an AWS Management Account

Auto Remediation

Auto Remediation generates some operational risk. The longer a misconfiguration exists, the more likely its functionality is required. Any auto-remediation should occur nearly instantly. Therefore, only event-based (not scheduled) remediations should be considered. Taking inspiration from Issac Asimov, here are Farris’s Three Laws¹ of Auto Remediation:

A bot must never harm stateful data or allow stateful data to come to harm.
A bot must act with utmost haste so functionality doesn’t become dependent on a misconfiguration.
A bot must announce its existence and tell a carbon-based life form what it did and why.

Minimizing harm to production means reducing the time between an issue’s introduction and its fix. To ensure the environment’s security, it should take the least amount of action.

When leveraging auto-remediation, realize there may be a race condition between the IaC deploying the misconfiguration and the auto-remediation event. That is why rule #3 is critical. A human must know and respond when this occurs to fix the underlying issue.

Unlike mythical general intelligence, these laws are intended to be obeyed by the security engineer deploying auto-remediation. The human must decide what resources to alter and how they are altered. The human must ensure there is a path from the bot to the carbon-based lifeforms who need to know an action occurred in their environment.

Ransomware Protection

Invariant: “Port 3389 on a Windows machine may never be exposed to the world.”

Based on the invariant as written, the following must all be true:

Ingress port is 3389, or a range that contains 3389
Source IP must be 0.0.0.0/0
The EC2 Instance has a public IP address
The security group must be attached to a Windows-based EC2 instance.

This invariant references 3389, but several ports fall into the RDP (ransomware deployment protocol) category. In this example, auto-remediation is the best option. However, no condition key on ec2:AuthorizeSecurityGroupIngress can filter for the above. As I mentioned above, these invariants should be enforced in an event-based manner. If you look at how the invariant can be violated, it could be:

The security group that’s already attached to an instance is modified to expose 3389
An existing security group with 3389 already exposed is now attached to a Windows instance.

The following CloudCustodian Policy would remove 3389 or 445 permissions if detected.

policies:
  - name: ops-access-via
    resource: aws.security-group
    filters:
      - type: ingress
        IpProtocol: "-1"
        Ports: [445, 3389]
        Cidr: "0.0.0.0/0"
      - type: ingress
        Ports: [445, 3389]
        CidrV6:
          value: "::/0"
    actions:
     - type: set-permissions
       # remove the permission matched by a previous ingress filter.
       remove-ingress: matched

However, the above would remediate Windows and Linux machines, potentially altering a security group attached to a Linux instance with port 0-65535 open.

Safely leveraging these controls

Preventing security risk comes at the expense of increasing operational risk. Getting organizational buy-in to restrict what builders can do or alter their environment is difficult. Therefore, the security team must consider what security invariants they will enforce and be careful in how they are implemented.

With service control policies, it is possible to review CloudTrail to determine who may be calling the specific actions that will be denied. Resource Control Policies are more complicated to preview because you’ll need to enable very expensive CloudTrail DataEvents and review the vast data produced. For some RCPs, you can probably use IAM Access Analyser to determine what resources would be impacted, then either exclude them or conduct a more focused impact analysis.

As of this writing, only a few Declarative Policies are available. You can determine the impact with a simple CSPM review for public images and snapshots. For IMDSv2 enforcement, you can review the CloudWatch Metric MetadataNoToken to see how many API calls still use the old system.

All of these would allow for a communications plan or potentially even a temporary exception to the invariant if a team needs time to adjust. If a company’s AWS Organizations OU structure is implemented with production and non-production OUs, consider applying the SCPs to the non-production OUs for a bake-in period before applying them to Production OUs.

Conclusion

With these safety mechanisms in place, combined with a broad communications strategy and matching rules to alert IaC authors pre-deployment, implementing these hard controls for Security Invariants can make the cloud a safer place.

The PrimeHarbor aws-organizational-policies repo contains a more comprehensive collection of security and governance invariants and sample policies. If you need help defining and deploying your own invariants, contact us. Cloud Governance is our happy place, and we’re always keen to help AWS customers be more secure.

This is a slight alteration of the original three laws I proposed in 2022’s The allure of auto-remediation . ↩︎