Skip to main content

Cloud Hardening as a Proactive Defense Against Adversarial AI

Cloud architecture · field guide · 15 min read · Updated May 2026
Talk Delivered at the AWS Meetup, May 2026 · Recording coming soon

A walk-through of a live AWS breach driven by an LLM, the four shifts that made it possible, and the three architectural gaps it exploits — tenancy, perimeter, and blast radius. Then the AWS-native primitives that close them, mapped to a five-phase ladder you can climb on Monday morning.

The 90-second adversarial AI playbook

It's a Tuesday in 2026. An attacker has 90 seconds.

A stolen access key falls into the hands of an LLM-driven agent. Not a human red-teamer working a week-long engagement — an autonomous loop that calls AWS APIs, reads the responses, decides what to do next, and never gets tired.

  1. 0:00Stolen access key dropped into the agent.
  2. 0:15Agent enumerates IAM roles, parses every trust policy.
  3. 0:35Identifies a cross-account sts:AssumeRole into a sibling production account.
  4. 0:55Generates the assume-role chain. Tests it.
  5. 1:20Lateral move complete. Reads customer data from S3 in the production account.
  6. 1:30GuardDuty hasn't fired yet. Five AWS API calls. No exploit.

This isn't science fiction. The cast below is the actual recording — real AWS accounts, real IAM, no exploit. Open-source pentest frameworks paired with frontier models can do this work at this pace today.

The same attack, recorded

Cross-account privilege escalation. ~30 seconds. Zero exploits.

Played at 0.7× speed so you can follow along. The agent enumerates IAM, finds the cross-account assume-role gift, performs the lateral move, and reads customer data — all valid AWS API calls.

Click play. The defender's trust boundary is the misconfiguration, not the malware.

What changed in 18 months

Four things broke at once.

Three of these are about who's doing the work — attacker, defender, the people inside the org. The fourth is the consequence. Detection stops keeping up.

  1. Shift 1

    Attackers got AI.

    Reconnaissance work that used to take a red team a week — enumerating IAM paths, parsing trust chains, finding misconfigurations — now happens in minutes. LLMs are excellent at parsing JSON IAM policies and chaining valid AWS API calls.

    Frontier models from Anthropic and OpenAI have demonstrated something more dangerous: stitching multiple low-severity findings into a complete kill chain. A wildcard role here, a permissive bucket policy there, an over-broad cross-account assume — individually shrugged off in a pentest report; collectively a breach. The agent doesn't get tired and doesn't lose the thread.

  2. Shift 2

    Defenders got more credentials.

    Non-human identities — service roles, automation, third-party integrations, AI agents — now outnumber human principals roughly 45 to 1 in a typical enterprise (synthesis from CyberArk and Astrix industry data). Every new agent is a new credential. Humans become a rounding error in your principal count.

    Most NHIs are over-permissioned at creation and never reviewed. Each one is a potential adversary you've already authenticated.

  3. Shift 3

    Every employee is a developer now.

    Coding assistants, AI co-pilots, and low-code AI tooling have collapsed the line between "developer" and "everyone else". Marketing builds workflows. Finance builds analytics jobs. Sales builds chatbots. Each of these creates new credentials, new IAM roles, new third-party integrations — outside the change-management process security teams built for the old developer cohort.

    The attack surface that used to live in code review now lives in a thousand small pieces of generated config that nobody reviews.

  4. Shift 4 · the consequence

    Detection economics broke.

    Mean-time-to-compromise has collapsed. Mean-time-to-detect, in most organizations, is flat — minutes-of-attack against hours-of-response. The gap between them is what attackers live in. Trying to out-detect AI-speed adversaries with human-speed response is a losing race.

    Which is why the rest of this guide isn't about better detection. It's about closing the architectural gaps that make the breach possible in the first place.

The architecture under it

Cloud has three structural problems. AI didn't create them — it weaponized them.

If you trace any modern cloud breach back through its stages, you'll find the same three architectural gaps showing up — alone or in combination. The cast you just watched touched all three.

Problem 1

Tenancy

Where is the trust boundary? Cloud accounts are a thin form of isolation — a developer's personal sandbox and an attacker's account look identical to AWS unless you tell it otherwise.

Problem 2

Perimeter

Network perimeters protect compute. They don't protect the cloud control plane. Cloud APIs are public endpoints; valid credentials work from anywhere.

Problem 3

Blast Radius

What can a compromised identity actually do? AI moves this from an external-attacker problem to an insider one — every NHI is a principal you've already authenticated.

Problem 1 · Tenancy

Where is your trust boundary?

In an on-prem world, tenancy is straightforward — you own the building. Networks isolate by physical reality. In the cloud, tenancy is logical, multi-layered, and almost entirely up to you to enforce.

A 12-digit AWS account number is the only thing AWS sees. Whether that account is a sibling team, an attacker, or a developer's weekend sandbox — they're structurally identical at the API layer. Without explicit organization boundaries (SCPs, RCPs, aws:PrincipalOrgID conditions on assume-role), you treat them all the same.

Three accounts. Same shape.

Attacker
acct 111111111111
12-digit account · IAM · S3 · etc.
Different org. Different ownership.
Identical primitives.
Your company
acct 222222222222
12-digit account · IAM · S3 · etc.
Inside your org.
Same primitives.
Developer's personal
acct 333333333333
12-digit account · IAM · S3 · etc.
Same employee. Personal billing.
Same primitives.

Structurally identical. AWS sees a 12-digit account number. Whether that's an attacker, a sibling team, or a dev's weekend project is up to you to enforce — through SCPs, RCPs, and explicit org-boundary conditions on every cross-account trust.

Problem 2 · Perimeter

Cloud APIs are public endpoints. Your network perimeter doesn't apply.

When teams move to the cloud, they bring on-prem network thinking with them. They build VPCs. They split public and private subnets. They put WAFs in front of load balancers. All of that protects compute — the workload network surface. It does nothing for the cloud control plane, where most of the data actually lives.

Same internet hits both surfaces. Only one has a firewall.

It helps to draw the asymmetry explicitly. Your AWS estate has two halves of network attack surface, both reachable from the same public internet:

Public internet · no perimeter
Attacker laptop · Dev personal AWS · Engineer's coffee-shop wifi
↓  valid creds → API call → straight in
↓  blocked by firewall · WAF · NACLs
AWS-managed · no customer perimeter
Control plane & PaaS
Control plane:sts · iam · organizations · sso · cloudtrail
PaaS:s3 · dynamodb · lambda · sqs · sns · kms · secretsmanager · bedrock

Where most of your data actually lives. Reached via credentials, not via your network.

Customer-managed · your perimeter
IaaS · your VPC
Network:security groups · NACLs · WAF · ALB · NAT · firewall
Compute:EC2 · ECS · EKS · Lambda-in-VPC

Where you spent most of the network-security budget. The smaller of the two surfaces.

Same threats, two paths. An attacker with valid creds bypasses the right side entirely — there's nothing to bypass; your firewall doesn't sit on the path. A compromised EC2 inside the right side reaches out via the same AWS-managed endpoints — your perimeter sees nothing. Either direction, the AWS-managed surface is where the data sits, and it has no perimeter you control unless you build one.

A token works equally well from your prod VPC, your developer laptop, or an attacker's coffee shop. This is the credential portability problem, and it's why a leaked key from a CI runner is a breach the moment it's leaked, not the moment it's used.

Problem 3 · Blast Radius

AI made this an insider problem.

Pre-AI, cloud security focused on external attackers. Detection was about catching the unauthenticated outsider. Once an attacker landed, blast radius was a secondary concern — most NHIs lived inside trusted boundaries and got the benefit of the doubt.

Post-AI, your NHI count exploded — every AI agent, every automation pipeline, every third-party integration is a new credential. Each one is a potential adversary you've already authenticated. The threat moved inside the perimeter without anyone deciding to let it in.

Which means the question shifts from "how do we keep attackers out?" to "what damage can each principal actually do, and how do we structurally cap that?"

Plus the configuration tax

350+ services. 17,000+ IAM actions. 1,500+ resource types.

"Most cloud breaches aren't exploits.
They're misconfigurations."

AI just makes finding them faster. The AWS API surface keeps growing — dozens of policy types, conditions that interact in non-obvious ways, a steady drip of new permissions every quarter. You don't need a CVE to compromise an AWS account; you need an over-broad role, a wildcard, and patience.

Tracked at aws.permissions.cloud

Compromised credential, end to end

How four configuration gaps become one breach.

Each step in the cast you watched maps to one of the architectural problems above. The attack isn't a single mistake — it's the chain.

Step 1
Credential exfiltrated

An EC2 instance with an IAM role is compromised — SSRF, malicious dependency, leaked GitHub key. The session token leaves the environment.

Perimeter
Credential is portable

The attacker calls the AWS API from a laptop on a coffee-shop network. AWS accepts — there's no condition on source network, source VPC, or source IP. The control plane treats every signed request the same.

Blast Radius
Privilege escalation

The compromised role has iam:PassRole and sts:AssumeRole. The attacker chains into a more privileged role meant for the deploy pipeline. Now they're effectively admin.

Tenancy
Cross-account data access

The privileged role can assume a role in a sibling production account — no organization-level boundary blocks it. The attacker reads customer data from S3 in prod. They never touched prod's perimeter directly; tenancy never enforced one.

Step 5
GuardDuty fires. Too late.

Detection eventually catches the anomalous behavior. The data is already gone. The clean-up is a six-figure incident response bill plus regulatory disclosure.

Three of the four steps are architectural. None of them require an exploit. They require configuration — or rather, the absence of configuration that would have made them impossible.

The trust gap, named

Six dimensions of trust. None bounded by AWS by default.

Every AWS API call asks three trust questions — about the identity making the call, the resource being touched, and the network the call comes from. Each one cuts two ways. That's six concrete control objectives — what AWS calls a data perimeter. The framework names where the gap lives. Closing each cell is your job.

Identity trust
Only trusted identities can access my resources
Default:any principal in any AWS account on Earth can call your buckets, queues, KMS keys.
Only trusted identities allowed from my network
Default:a VPC endpoint accepts any external IAM identity that signs the request.
Resource trust
My identities access only trusted resources
Default:a leaked role can read or write any S3 bucket on Earth.
Only trusted resources accessed from my network
Default:a host inside your VPC can exfiltrate to any external bucket — invisible to network controls.
Network trust
My identities access resources only from expected networks
Default:stolen creds work from any laptop on any network.
My resources only accessed from expected networks
Default:every AWS API endpoint is on the public internet.

Framework: aws.amazon.com/identity/data-perimeters-on-aws

Interactive · click each scenario

Six concrete scenarios. Watch which gate blocks each one.

Each scenario maps to one of the six trust cells above. The diagram shows you which AWS policy mechanism — SCP, RCP, or VPC endpoint policy — closes that specific cell.

Where to start · climb one phase at a time

A five-phase ladder, with example SCPs you can copy.

You don't need to ship every control at once. The ladder below is a practical sequence — most organizations have phase 1 done implicitly via Control Tower, then stall. Each row links to a directory of working example SCPs you can adapt.

Phase What it does Solves Example SCPs
1 · Foundation Protect security services from being disabled (CloudTrail, GuardDuty, Config) All three Deny-changes-to-security-services
2 · Scope Deny non-approved regions, services, public-by-default resources Tenancy, Perimeter Region-controls
3 · Hygiene Mandatory tags, encryption, IMDSv2, no IAM users, no long-lived keys Tenancy Service-specific-controls
4 · Depth Block privilege escalation primitives, restrict destructive actions Blast Radius Privileged-access-controls
5 · Perimeter Identity / resource / network trust boundaries (Data Perimeter) Tenancy, Perimeter data-perimeter-policy-examples

Phases 1–4 link to subdirectories of aws-samples/service-control-policy-examples. Phase 5 lives in the dedicated data-perimeter-policy-examples repo.

Every phase is implementable with AWS-native primitives — Service Control Policies, Resource Control Policies (GA late 2024), VPC endpoint policies, IAM conditions. No third-party tooling required to start.

Take-home checklist

Ten things to check tonight.

  1. 01Run IAM Access Analyzer org-wide. Look at the public-resource and cross-account-access findings.
  2. 02Audit how many IAM users exist. Plan to delete them.
  3. 03Audit assume-role trust policies for Principal: * or wide patterns. Add aws:PrincipalOrgID conditions.
  4. 04Check whether CloudTrail can be disabled by any non-admin role. Ship the SCP that says no.
  5. 05List every active AWS region. Deny the ones you don't use.
  6. 06Require IMDSv2 on EC2 launches. SSRF + IMDSv1 is exactly how Capital One happened.
  7. 07Block S3 buckets from being made public via SCP — even if a misconfig tries.
  8. 08Identify your privilege-escalation IAM actions and deny them outside your IAM admin role.
  9. 09Inventory every NHI. For each, ask: does it ever call AWS from outside your VPC? If no, scope it to your VPCs.
  10. 10Ship a Data Perimeter — start with the AWS sample policies above.
Further reading

References.

About this guide

Adapted from a talk delivered at the AWS Meetup, May 2026 — recording coming soon. Written by the team at InstaSecure. We build a guardrails platform for AWS that operationalizes the techniques in this guide — but the techniques themselves are AWS-native and vendor-neutral. Everything here is implementable without InstaSecure. If you'd like help operationalizing it across a real AWS organization, reach out.