In a AWS Security interview, structure beats memorisation — when a question stretches you, reason out loud from fundamentals instead of guessing. Use the visual cheat-sheets below to lock in the diagrams interviewers love, and note that every answer ends with a 👉 Interview tip giving the exact line to say.
Visual cheat-sheets — the whiteboard answers
Network Security & VPC (9)
L11. What is the difference between a Security Group and a Network ACL? Cover stateful vs stateless and instance vs subnet level.
Both control traffic inside a VPC, but at different layers. A Security Group (SG) is a firewall at the instance / ENI level and is stateful — if you allow inbound traffic, the return traffic is automatically allowed without a matching outbound rule. SGs are allow-only (there is no deny rule), and all rules across all SGs attached to an instance are evaluated together as a union.
A Network ACL (NACL) is a firewall at the subnet level and is stateless — return traffic is not auto-allowed, so you must add explicit inbound and outbound rules. NACLs support both allow and deny rules and are evaluated in order by rule number, lowest first, stopping at the first match.
Think of an SG as a doorman who remembers everyone he let in; a NACL is a turnstile that checks every direction independently and never remembers.
Interview tip: Lead with one line — SG = stateful, instance-level, allow-only, all-rules-evaluated and NACL = stateless, subnet-level, allow+deny, numbered/first-match. That sentence answers the whole question.
L12. What is the difference between a public subnet and a private subnet, and what makes a subnet public?
A subnet is just a range of IP addresses inside your VPC — AWS has no checkbox labelled public. What makes a subnet public is its route table: a public subnet has a route like 0.0.0.0/0 -> Internet Gateway (IGW). A private subnet has no route to an IGW.
Resources in a public subnet that also have a public IP can talk directly to the internet — ideal for load balancers and bastion hosts. Private subnets hold databases and app servers that should never be directly reachable from outside.
Think of the IGW route as the only door to the street: public subnets have that door; private subnets are inner rooms.
- To be reachable from the internet a resource needs both an IGW route and a public/Elastic IP.
- A private subnet reaches the internet for outbound only, via a NAT Gateway.
Interview tip: The single deciding factor is the route table's IGW route — never the subnet name.
L23. Why is a Security Group allow-only while a Network ACL supports both allow and deny rules? When does that distinction matter?
Because they solve different problems. A Security Group is stateful and per-instance — it answers "who is allowed to talk to this resource?" Anything not explicitly allowed is implicitly denied, so an explicit deny would be redundant and could create contradictory rules. That is why SGs are allow-only, and why multiple SGs on an instance combine as a union of allows.
A NACL is stateless and per-subnet, acting as a coarse perimeter. To block a specific bad IP or CIDR while still allowing everyone else, you need an explicit DENY rule at a low rule number — something an SG simply cannot express.
The distinction matters when you must blacklist an attacker's IP range, enforce a subnet-wide guardrail, or add defense-in-depth that an application team cannot accidentally undo by editing an SG.
Interview tip: One line wins it — SGs whitelist; the NACL is the only place you can blacklist a CIDR.
L24. What is a NAT Gateway and why would a private subnet need one? What are the security implications?
A NAT Gateway is a managed AWS service placed in a public subnet that lets instances in private subnets make outbound connections to the internet — OS patches, package downloads, calling external APIs — while blocking all unsolicited inbound connections from the internet.
Think of it as a one-way mailbox: your private servers can send letters out and get the replies, but strangers cannot initiate contact.
Security implications:
- Benefit: servers stay unreachable from outside yet can still be patched — a strong default posture.
- Risk: egress is wide open, so a compromised host can exfiltrate data or reach command-and-control servers. Pair it with egress filtering, VPC Endpoints for AWS services, and DNS/firewall controls.
Interview tip: Say it permits outbound only — and add the senior point: it does not make a subnet "safe," because unrestricted egress is itself a real exfiltration vector.
L25. Design a secure VPC for a three-tier web application. Cover subnets, route tables, NAT, endpoints, and tiered Security Groups.
Use a multi-AZ VPC with three tiers spread across at least two Availability Zones:
- Public subnets: hold only the Application Load Balancer and the NAT Gateway. Route table:
0.0.0.0/0 -> IGW. - Private app subnets: web/app servers. Route table:
0.0.0.0/0 -> NAT Gatewayfor outbound patching only. - Private data subnets: RDS / databases. No internet route at all.
Add VPC Endpoints (Gateway for S3 and DynamoDB, Interface/PrivateLink for ECR, Secrets Manager, etc.) so traffic to AWS services stays off the internet.
Tiered Security Groups, chained by reference rather than CIDR:
- ALB-SG: allow 443 from the internet.
- App-SG: allow the app port only from ALB-SG.
- DB-SG: allow 3306/5432 only from App-SG.
The result is a layered fortress: the public can only knock on the ALB, never on the database.
Interview tip: Reference SGs by their ID, not IP ranges — the rules self-adjust as instances scale in and out.
L26. What is the difference between a Gateway VPC Endpoint and an Interface (PrivateLink) endpoint, and when do you use each?
Both let your VPC reach AWS services privately, without an Internet or NAT Gateway, but they work differently.
- Gateway Endpoint: a target you add to your route table. It supports only S3 and DynamoDB, is free, has no ENI and no private IP, and cannot be reached from on-prem or peered/other-Region VPCs.
- Interface Endpoint (PrivateLink): an elastic network interface (ENI) with a private IP in your subnet, reached via DNS. It supports most AWS services plus your own and partner services, works across VPCs and from on-prem (VPN / Direct Connect), and is billed hourly plus per GB.
Analogy: a Gateway Endpoint is a signpost added to the road map; an Interface Endpoint is a private phone line plugged straight into your subnet.
Interview tip: S3/DynamoDB only and want it free, in-Region, route-based -> Gateway. Anything else, cross-account, or on-prem reachability -> Interface/PrivateLink. (S3 also offers an Interface endpoint when you need those PrivateLink features.)
L37. How would you use VPC Endpoints and egress controls to prevent data exfiltration to the public internet?
Goal: workloads can reach the AWS services they need, but cannot ship data to an attacker's server. Build it in layers:
- Remove the open door: drop the NAT/IGW route for sensitive subnets so there is no path to
0.0.0.0/0at all. - VPC Endpoints: reach S3, DynamoDB, ECR, etc. privately. On the endpoint add an endpoint policy, and on S3 buckets use the
aws:SourceVpce/aws:SourceVpccondition keys so data can flow only to your approved buckets, not arbitrary ones. - Egress filtering: use AWS Network Firewall or Route 53 Resolver DNS Firewall to allow only an approved domain list and block known-bad destinations.
- SCPs / IAM: deny actions that would disable these guardrails (SCPs require AWS Organizations).
Think of it as sealing every exit except a few inspected, allow-listed gates.
Interview tip: The killer control is the S3 bucket policy with aws:SourceVpce — it blocks copying data into an external account's bucket even from inside your VPC.
L18. What are VPC Flow Logs and how do they help during a network forensic investigation?
VPC Flow Logs capture metadata about IP traffic going to and from network interfaces in your VPC — they record the who, where, and whether-allowed, not the packet contents. Each record includes source/destination IP, source/destination port, protocol, byte and packet counts, start/end timestamps, and the ACCEPT or REJECT action. You can enable them at the VPC, subnet, or ENI level and publish to CloudWatch Logs, S3, or Kinesis Data Firehose.
Think of them as an itemised phone bill for your network: you see who called whom and for how long, but never what was said.
In a forensic investigation they help you:
- Confirm whether a suspicious external IP actually connected (ACCEPT) or was blocked (REJECT).
- Spot data exfiltration via unusually large outbound byte volumes.
- Build a timeline and trace lateral movement between instances.
Interview tip: Stress that Flow Logs are metadata only — no payload; for full packet capture you use VPC Traffic Mirroring.
L39. Map AWS networking and identity services (Security Groups, PrivateLink, IAM, Verified Access) to the pillars of a Zero Trust architecture.
Zero Trust means "never trust, always verify," enforce least privilege, and assume breach. Mapping AWS services to its pillars:
- Identity (verify the principal): IAM roles and policies plus IAM Identity Center issue per-request, least-privilege, short-lived credentials instead of standing trust.
- Device + context-aware access: AWS Verified Access evaluates user identity and device posture — via third-party trust providers such as Jamf, CrowdStrike, or JumpCloud — on every request before granting app access, with no broad VPN-wide trust.
- Micro-segmentation (network): Security Groups referenced by ID give per-workload, identity-like network segmentation so one tier cannot freely reach another.
- Private connectivity (remove the public surface): PrivateLink exposes services over private ENIs, keeping traffic off the internet.
Put together: IAM and Verified Access verify who, Security Groups limit where, and PrivateLink removes the open path.
Interview tip: Frame Verified Access as Zero Trust for application access (it replaces the VPN) and Security Groups as micro-segmentation.
Data Protection & Encryption (9)
L110. How do you secure a publicly exposed S3 bucket? Walk through the remediation steps.
A public S3 bucket is like leaving your filing cabinet open on the street. Remediate methodically, not by panic-deleting:
- Confirm exposure using
aws s3api get-bucket-acl,get-bucket-policy, andget-public-access-block; cross-check with IAM Access Analyzer (which flags public/cross-account buckets) and Trusted Advisor. - Assess data sensitivity and review CloudTrail data events plus S3 server access logs to see who already read it (breach scope).
- Contain immediately: enable Block Public Access at the bucket level (and account level if safe) to instantly neutralize any public ACLs and bucket policies.
- Fix the root cause: remove
Principal:"*"statements, drop public-read ACLs, and replace them with least-privilege IAM/bucket policies. Note new buckets have ACLs disabled by default since April 2023. - Serve legitimate public content via CloudFront + Origin Access Control (OAC), keeping the bucket private.
- Add guardrails: enforce default encryption and versioning, and apply an SCP that mandates Block Public Access org-wide.
Interview tip: Say "contain first (Block Public Access), then fix the root cause, then add preventive guardrails" — that order signals incident-response maturity.
L111. What is S3 Block Public Access, and how does it interact with bucket policies and ACLs at the account and bucket level?
S3 Block Public Access (BPA) is a safety switch that overrides any policy or ACL granting public access. Think of it as a master kill-switch: even if someone writes a careless Principal:"*" policy, BPA stops it from taking effect. It has four independent settings:
BlockPublicAcls— rejects new public ACLs (and new public-granting ACL writes).IgnorePublicAcls— ignores any existing public ACLs.BlockPublicPolicy— rejects new public bucket policies.RestrictPublicBuckets— restricts access on buckets with public/cross-account policies to AWS service principals and authorized users in the same account only.
BPA applies at both the account and bucket level. Account-level settings win: if blocked there, it applies to every bucket regardless of bucket-level config. Since April 2023, new buckets have BPA enabled by default (and ACLs disabled by default).
Interview tip: Stress that BPA is evaluated as an overriding restriction on top of policy/ACL evaluation — it behaves like an explicit deny that always wins, which is why it's the fastest remediation.
L212. Compare SSE-S3, SSE-KMS, and SSE-C for S3 encryption at rest. When would you choose each?
All three encrypt objects at rest with AES-256; the difference is who controls the keys — like choosing between the hotel keeping your safe key, a bank vault you co-control, or you carrying the only key.
- SSE-S3 (
AES256): AWS owns and manages the keys end-to-end. Zero key management, no extra cost. This is the default for new objects when no encryption is specified. Use for general data with no compliance/audit-key requirement. - SSE-KMS (
aws:kms): keys live in AWS KMS. You get key policies, CloudTrail audit logs, rotation, and per-key access control. Slight cost and KMS request limits (mitigate with S3 Bucket Keys). Use when you need auditability, separation of duties, or compliance. For the highest assurance, DSSE-KMS applies two independent layers of KMS encryption. - SSE-C: you supply the key on every request over TLS; AWS uses it and never stores it. Maximum control, but you bear all key storage/loss risk — lose the key, lose the data. Note S3 now disables SSE-C by default on new general-purpose buckets (deployed April 2026), so it's an explicit opt-in for policy-mandated customer-held keys only.
Interview tip: Say "SSE-KMS is the default best practice — auditability and rotation without managing the key material yourself."
L213. What is the difference between an AWS-managed KMS key and a customer-managed CMK, and why does key policy control matter?
Both are KMS keys, but they differ in who controls them. Think of an AWS-managed key as a rental car (AWS sets the rules) and a customer-managed key as your own car (you set everything). Note AWS now calls these "KMS keys" rather than the older term CMK, though interviewers still use CMK interchangeably.
- AWS-managed key (e.g.
aws/s3): auto-created per service, AWS controls the key policy, rotation is automatic (yearly), and you cannot change permissions or delete it on demand. No monthly charge. - Customer-managed key (CMK): you create it, own the key policy, choose rotation (and the rotation period), set aliases, enable/disable, schedule deletion, and grant cross-account access. Costs ~$1/month plus request charges.
Key policy control matters because in KMS the key policy is the root of authority — IAM permissions alone don't grant access unless the key policy allows it (directly, or by delegating to IAM). This enables true separation of duties: a security team can hold key-admin rights while developers only get encrypt/decrypt, so even a powerful IAM user can't read data without key access.
Interview tip: Emphasize "the KMS key policy is authoritative — IAM alone cannot grant key access unless the key policy permits it."
L214. Explain envelope encryption and how KMS uses data keys to encrypt large objects.
Envelope encryption means encrypting your data with a data key, then encrypting that data key with a master key (the KMS key) — like locking your documents in a box, then locking the box's key inside a stronger safe. You ship the locked box and the locked key together.
Why? KMS won't encrypt large payloads directly (4 KB limit on direct Encrypt), and round-tripping gigabytes through KMS would be slow and costly. So the flow is:
- App calls
GenerateDataKey; KMS returns a plaintext data key and an encrypted (wrapped) copy of it. - App encrypts the large object locally with the plaintext data key (fast, no size limit).
- App stores the encrypted object alongside the encrypted data key, then deletes the plaintext key from memory.
- To decrypt: send the wrapped key to KMS via
Decrypt, get the plaintext key back, decrypt the object locally.
Interview tip: Mention the master key never leaves KMS, and rotating the master key only changes which key version wraps future data keys — existing ciphertext still decrypts, so you never re-encrypt the actual data.
L215. How do you enforce encryption in transit? Explain the role of ACM, TLS termination at ELB/CloudFront, and the aws:SecureTransport condition.
Encryption in transit protects data on the wire so it can't be sniffed — like sending mail in a sealed, tamper-proof envelope instead of a postcard. AWS layers three things:
- ACM (AWS Certificate Manager) issues and auto-renews free public TLS certificates, so you never manually rotate certs. It supplies the certs that ELB and CloudFront present to clients.
- TLS termination at ELB/CloudFront: clients connect over HTTPS, and the load balancer or CDN decrypts there. You can re-encrypt to the backend (end-to-end TLS) for sensitive traffic, or use HTTP internally inside a trusted VPC.
- aws:SecureTransport is a global IAM/resource-policy condition key that's
trueonly for requests sent over TLS. You add an explicit Deny whenaws:SecureTransportisfalse, forcing all S3 access over HTTPS.
Together: ACM provides certs, ELB/CloudFront terminate TLS, and policy conditions reject any plaintext request.
Interview tip: Quote the pattern — an Effect:Deny statement with a Bool condition of aws:SecureTransport set to false — that one statement enforces HTTPS-only on a bucket.
L116. What is the difference between AWS Secrets Manager and SSM Parameter Store, and when would you pick each for storing credentials?
Both store secrets encrypted with KMS, but they target different needs — Parameter Store is a config/secret store (free standard tier); Secrets Manager is a managed credential vault with a rotation engine built in.
- SSM Parameter Store: stores config and secrets (
SecureStringvia KMS). Standard tier is free; an advanced tier (larger values, more parameters, higher throughput) is paid. No native rotation (you'd wire your own Lambda). Great for app config, feature flags, and low-churn secrets where cost matters. - Secrets Manager: purpose-built for credentials with native automatic rotation (RDS, Aurora, Redshift, DocumentDB via managed templates), cross-region replication, and fine-grained resource policies. Costs ~$0.40/secret/month plus API calls.
Pick Secrets Manager when you need automatic rotation, database credential management, or compliance-grade lifecycle. Pick Parameter Store for static config, API endpoints, or budget-sensitive secrets that rarely change.
Interview tip: The one-line differentiator interviewers want: "Secrets Manager has built-in rotation; Parameter Store does not."
L317. Design a system to store and process customer PII securely end to end. Cover encryption, access, key management, logging, and DLP.
Treat PII like cash in a bank: locked at every step, every touch recorded, and only tellers with a reason can open the vault. A layered design:
- Encryption at rest: S3/RDS/DynamoDB with SSE-KMS using customer-managed keys; envelope encryption for large objects; client-side or field-level encryption for the most sensitive columns.
- Encryption in transit: TLS everywhere via ACM; deny non-TLS requests with a policy condition on
aws:SecureTransportset tofalse. - Access: least-privilege IAM roles, no long-lived keys; separation of duties via KMS key policies; isolate PII in a dedicated account/VPC with private (PrivateLink/gateway) endpoints; tokenize or pseudonymize where possible.
- Key management: per-domain customer-managed keys, automatic rotation, scoped key policies, CloudTrail on key use.
- Logging/monitoring: CloudTrail (management + data events), VPC Flow Logs, S3 server access logs, GuardDuty, Security Hub; alarms on anomalous
Decryptactivity. - DLP: Amazon Macie to discover/classify PII; IAM Access Analyzer for exposure; SCPs to block public buckets and constrain cross-region/cross-account exfiltration.
Interview tip: Frame it as defense-in-depth mapped to a compliance regime (GDPR/PCI DSS) — encryption, least-privilege, auditability, and detection working together.
L318. How would you implement automatic secret rotation for an RDS database password without breaking running applications?
The trick is rotating without a window where the old password is dead but the new one isn't live yet. AWS Secrets Manager solves this with an alternating-users rotation strategy — like having two house keys so you're never locked out while changing locks.
- Store the RDS credential in Secrets Manager and attach a rotation Lambda (AWS provides managed RDS rotation templates).
- Rotation runs four steps:
createSecret(new password staged inAWSPENDING),setSecret(apply it to a second, alternate DB user),testSecret(verify a real connection works),finishSecret(promoteAWSPENDINGtoAWSCURRENT). - Apps fetch the secret at runtime (e.g. via the Secrets Manager caching client or the Lambda/agent extension) instead of hardcoding it, so they always read
AWSCURRENT— no redeploy needed.
The alternating-users approach means a live, tested credential always exists, so in-flight connections never break. (The simpler single-user strategy exists but briefly risks failed connections, so alternating-users is preferred for zero downtime.)
Interview tip: Name the four Lambda steps and stress "apps read the secret at runtime + alternating-users rotation = zero downtime."
Roles, STS & Org-Scale Governance (9)
L119. What is AWS STS and what does AssumeRole give you that long-lived access keys do not?
AWS STS (Security Token Service) is the AWS service that hands out temporary credentials. When you call AssumeRole, STS returns a short-lived bundle: an access key ID, a secret access key, and a session token, all of which expire together (default 1 hour, configurable from 15 minutes up to 12 hours).
Think of long-lived access keys like a permanent house key that never changes: if someone copies it, they own your house forever. AssumeRole is like a hotel keycard, it works only for a few hours, then becomes useless.
What you gain over long-lived keys:
- Auto-expiry, a leaked token dies on its own.
- No secrets to store, EC2 and Lambda fetch credentials at runtime.
- Cross-account access without sharing keys.
- CloudTrail attribution via the role session name.
Interview tip: Say STS gives temporary, scoped, auditable credentials.
L220. What is an EC2 instance profile and why is it preferred over storing access keys on the instance?
An EC2 instance profile is a container that attaches an IAM role to an EC2 instance. Once attached, any application on that instance can fetch temporary credentials automatically from the Instance Metadata Service (IMDS) at http://169.254.169.254, so no keys ever touch the disk.
It is like a company badge that the building issues you each morning and revokes each night, versus taping a master key under the keyboard where anyone with access to the machine could grab it.
Why it beats hardcoded keys:
- No secrets on disk, nothing to leak in code, logs, or a stolen AMI.
- Auto-rotation, IMDS refreshes the credentials before they expire.
- Easy revocation, detach or edit the role centrally.
Always enforce IMDSv2 (session-token-based, now the default on new launches) to block SSRF attacks that try to steal these credentials.
Interview tip: Mention IMDSv2, interviewers love that detail.
L221. How does cross-account access work with IAM roles? Walk me through the trust policy and the permission policy.
Cross-account access uses an IAM role in the target account that a principal in the source account assumes. Two policies work together:
- Trust policy (the who): attached to the role, it names who is allowed to assume it, for example a
Principalofarn:aws:iam::SOURCE_ACCOUNT:rootwith the actionsts:AssumeRole. This answers "who may walk through the door?" - Permission policy (the what): also attached to the role, it grants the actual permissions, for example read access to a specific S3 bucket. This answers "what can they do once inside?"
The source side must also grant its user or role permission to call sts:AssumeRole on the target role ARN. Both sides must agree, like a visitor pass: their building lets them leave, yours lets them in.
Harden it with an ExternalId condition (stops the confused-deputy problem) and an MFA condition such as aws:MultiFactorAuthPresent.
Interview tip: Trust = who, permission = what, and both sides must allow it.
L222. What is a permission boundary, and how does it differ from an SCP? Can a permission boundary grant new permissions?
A permission boundary is an IAM policy attached to a user or role that sets the maximum permissions that identity can ever have. The effective permissions are the intersection of the boundary and the identity's own policies, never the union.
No, a permission boundary cannot grant anything. It only caps. If the boundary allows S3 but the identity policy grants nothing, the identity gets nothing. Think of it as a ceiling: it limits how high you can jump but never lifts you off the ground.
Difference from an SCP (Service Control Policy):
- SCP applies to whole accounts and OUs in AWS Organizations, an org-level guardrail that even affects the account root user.
- Boundary applies to a single IAM principal, often used so developers can create roles without escalating their own privileges.
Interview tip: Both are filters, not grants. Neither ever adds permissions.
L323. Walk through the IAM policy evaluation logic when an SCP, a permission boundary, an identity policy, and a resource policy all apply to the same request.
AWS evaluates a request in layers, and an explicit Deny anywhere always wins. The flow:
- Explicit Deny check, scan every policy type. Any Deny ends it immediately.
- SCP, must Allow the action (deny-by-default at the org level). If no SCP permits it, the request is denied even when IAM allows it.
- Resource Control Policy (RCP), if one applies, it must also Allow (newer org-level guardrail on resources).
- Permission boundary, must Allow the action (it caps the identity).
- Identity policy, the user or role policy must Allow.
- Resource policy, for example an S3 bucket policy, may also Allow.
For same-account access, the request needs an Allow in either the identity policy or the resource policy, but it must still pass SCP, RCP, and the boundary. True cross-account access requires an Allow on both the identity side and the resource side.
Picture a row of gates: any guard can slam its gate shut (Deny), and you must get a green light from every layer to pass.
Interview tip: Lead with "explicit Deny always wins, default is implicit deny."
L224. What are Service Control Policies, and why are they deny-only guardrails rather than a way to grant access?
Service Control Policies (SCPs) are AWS Organizations policies attached to the org root, an OU, or an account. They define the maximum set of permissions any IAM principal in those accounts can use, including most actions by the account root user.
They are called guardrails because they never grant anything. An SCP that "Allows" S3 does not give anyone S3 access, it only keeps S3 within the allowed boundary. A principal still needs an IAM identity policy to actually use S3. The SCP just decides whether that grant is even possible.
Think of it as a fence around a property: the fence defines where you can walk, but you still need a key to enter each room. The fence alone opens no doors.
Common use: a Deny that blocks leaving approved regions or disabling CloudTrail, applied org-wide so even account admins cannot override it.
Interview tip: SCP = ceiling on what IAM can grant, never the grant itself.
L325. How would you detect and prevent an IAM privilege-escalation path, such as a role that can attach policies to itself?
Detect: Hunt for dangerous permissions that let an identity rewrite its own access. The classic ones are iam:AttachRolePolicy, iam:PutRolePolicy, iam:CreatePolicyVersion, iam:PassRole, and sts:AssumeRole with a wildcard resource. Run an open-source escalation scanner like PMapper or Cloudsplaining, and review IAM Access Analyzer findings (including its unused-access findings). CloudTrail plus AWS Config rules catch changes at runtime.
The risk: a role with iam:PutRolePolicy on itself can attach AdministratorAccess, like an employee who can edit their own job permissions and promote themselves to CEO.
Prevent:
- Apply a permission boundary so self-modification cannot exceed the cap.
- Scope
iam:PassRoleto specific role ARNs, never*. - Add an SCP that denies IAM write actions outside a central admin role.
- Enforce least privilege and review with Access Analyzer's unused-access findings.
Interview tip: Name iam:PassRole and iam:PutRolePolicy explicitly.
L326. Design a multi-account governance strategy using AWS Organizations, Control Tower, and IAM Identity Center. What goes where?
Build a landing zone with each tool doing one job:
- AWS Organizations, the foundation. Create OUs (Security, Infrastructure, Workloads/Prod, Workloads/Dev, Sandbox) and apply SCP guardrails per OU (deny disabling CloudTrail, block non-approved regions); add RCPs for resource-level org guardrails where needed.
- Control Tower, the automation layer on top of Organizations. It deploys the landing zone, vends new accounts via Account Factory with baked-in controls, and centralizes logging into a dedicated Log Archive account plus an Audit/Security account.
- IAM Identity Center (formerly AWS SSO), the human access layer. Connect your IdP (Okta or Microsoft Entra ID), define permission sets (ReadOnly, PowerUser, Admin), and assign them to groups across accounts, with no per-account IAM users.
Think of it as a planned city: Organizations draws the zoning, Control Tower is the city planner enforcing codes, Identity Center is the single ID office issuing resident passes.
Interview tip: Separate Log Archive and Audit accounts, that is the AWS-recommended pattern.
L227. What does IAM Access Analyzer tell you about external and unused access, and how would you act on its findings?
IAM Access Analyzer has two main capabilities:
- External access findings, it uses automated reasoning (provable security) to flag any resource (S3 bucket, IAM role, KMS key, SQS queue, and more) that can be accessed from outside your account or organization. It names exactly which principal and which policy statement opened the door.
- Unused access findings, it reports roles, users, access keys, and permissions that have not been used within a configurable window, your over-privileged or stale identities.
Acting on them: for an external finding, confirm whether the sharing is intended. If yes, create an archive rule to suppress it as a known-good baseline; if no, tighten the policy. For unused access, remove the stale permission or deactivate the key to shrink your blast radius. It is like a home security audit that lists every unlocked window and every spare key you forgot about.
Interview tip: Mention archive rules to suppress known-safe external access.
Troubleshooting & Real Scenarios (10)
L228. A set of IAM access keys was leaked in a public GitHub repo. Walk me through your incident response runbook.
Move fast, bots scrape public keys within minutes.
- Contain, immediately deactivate the access key (do not delete it yet, you need it for tracing). If it belongs to a user, also attach an explicit Deny-all policy.
- Investigate, query CloudTrail for every action by that
accessKeyId: new IAM users, EC2 launches (crypto-mining), data exfiltration, role assumptions. Check GuardDuty forUnauthorizedAccessand credential-exfiltration findings. - Eradicate, delete any resources the attacker created, revoke roles they assumed, and rotate every credential they could have reached.
- Recover, issue a fresh key (or better, switch to IAM roles), and purge the secret from Git history with
git filter-repo, since deleting the file alone is not enough. - Prevent, enable secret scanning and push protection, move secrets to AWS Secrets Manager, and adopt IAM roles instead of long-lived keys.
It is like a lost credit card: freeze it first, dispute the charges, then reissue.
Interview tip: Say deactivate-before-delete to preserve the forensic trail.
L229. GuardDuty alerts that an EC2 instance is communicating with a known crypto-mining endpoint. How do you isolate, investigate, and recover?
Isolate (do not terminate, you would lose evidence):
- Attach a forensic security group with no inbound or outbound rules (or one allowing only a forensic subnet) to cut the command-and-control channel.
- Remove the instance from any load balancer and Auto Scaling group so it is not replaced or serving traffic.
- Disable or quarantine the attached IAM instance role (for example, attach a Deny-all session policy or revoke active sessions) so stolen credentials are useless.
Investigate: Snapshot the EBS volume and capture memory before touching the OS. Review the GuardDuty CryptoCurrency finding, VPC Flow Logs for the mining IPs, and CloudTrail for how it got in (leaked key, SSRF on IMDS, exposed RDP/SSH).
Recover: Treat the host as compromised and rebuild from a known-good AMI, never "clean" it. Rotate all credentials it touched, patch the entry vector, enforce IMDSv2, and tighten the security group.
It is like quarantining a sick patient: isolate, run tests, then bring in a healthy replacement.
Interview tip: Isolate, never terminate, evidence first.
L230. An application suddenly cannot reach an S3 bucket after a security change. How do you troubleshoot whether it is the bucket policy, an SCP, a VPC endpoint, or IAM?
Work the layers methodically, narrowing from identity outward:
- Read the error, an
AccessDeniedpoints to policy; a timeout or connection error points to networking (endpoint, route, or security group). - IAM policy simulator or the new IAM Access Analyzer policy checks, test the app's role against the exact S3 action and bucket ARN. This confirms whether the identity policy allows it.
- Bucket policy, check for a new explicit
Deny, a tightenedaws:SourceVpceoraws:SourceIpcondition, or a removed principal. - SCP or RCP, look in CloudTrail for an SCP/RCP-implied deny in the error message, or check the OU's policies for a new S3 restriction.
- VPC endpoint, if the app is in a private subnet, confirm the S3 gateway (or interface) endpoint exists, its endpoint policy allows the bucket, and the route table or DNS points to it.
CloudTrail is your fastest oracle, it logs the deny and often hints which layer caused it. It is like tracing a power outage: check the appliance, the breaker, then the street line.
Interview tip: AccessDenied = policy; timeout = networking.
L331. You need to take a forensic EBS snapshot of a live production instance without disrupting it. How do you do this and preserve chain of custody?
Capture without disruption: EBS snapshots are designed for live volumes, you can call CreateSnapshot (or CreateSnapshots for multi-volume, crash-consistent capture) while the instance keeps running, with no downtime. For the cleanest memory image, capture RAM first with a forensic agent before snapshotting disk.
Preserve chain of custody:
- Copy the snapshot into an isolated forensics account so analysts never touch production.
- Encrypt it with a dedicated KMS key and lock down its permissions.
- Record an immutable audit trail, timestamp, operator, snapshot ID, and the source instance, ideally written to a WORM store (S3 Object Lock in compliance mode) so it cannot be altered.
- Compute and store a cryptographic hash of the exported image to prove integrity.
- Apply tags like
Forensics=DoNotDeleteand restrict access to investigators only.
Chain of custody is the evidence diary: who touched it, when, and proof it was never tampered with, like sealing and signing an evidence bag.
Interview tip: Isolated account + immutable logs + hashing = court-defensible.
L232. Two EC2 instances in the same VPC cannot talk to each other. Walk through how you debug Security Groups, NACLs, and route tables.
Debug from the instance outward, checking each layer:
- Security Groups (stateful), the usual culprit. The target's inbound SG must allow the source's traffic (allow by source SG ID, not just CIDR). Because SGs are stateful, return traffic is automatic, you only set the request direction. Also confirm the source's outbound rules allow it (the default SG allows all outbound).
- NACLs (stateless), subnet-level. Here you must allow both directions explicitly, inbound for the request and outbound for the reply on ephemeral ports (typically 1024-65535). A missing ephemeral-port rule is a classic gotcha.
- Route tables, instances in the same VPC share the
localroute, so intra-VPC routing usually just works; verify only if they are in different subnets with custom routing.
Then check the OS-level firewall (iptables, nftables, ufw, or Windows Firewall) and use VPC Reachability Analyzer to pinpoint the blocking hop automatically.
Security Groups are like a bouncer who remembers you; NACLs are a checkpoint that forgets, so you stamp both ways.
Interview tip: SG stateful, NACL stateless, ephemeral ports trip people up.
L333. Design an automated, human-out-of-the-loop response to a GuardDuty finding using EventBridge and Lambda or SSM. What guardrails prevent it from causing an outage?
The pipeline: GuardDuty publishes findings to EventBridge. An EventBridge rule filters on finding type and severity, then triggers a Lambda (or an SSM Automation runbook) that remediates, for example swapping a compromised EC2 to an isolation security group, disabling a leaked key, or revoking a session, and posts to Slack/Telegram plus a ticket.
Guardrails so automation never causes an outage:
- Severity and type filtering, only auto-act on high-confidence findings; low-severity goes to humans.
- Tag-based scoping, never quarantine anything tagged
Critical=ProdorNoAutoRemediate; alert instead. - Reversible actions, prefer isolation or quarantine over termination so nothing is destroyed.
- Idempotency and rate limiting, prevent a finding storm from disabling half your fleet.
- Least-privilege Lambda role scoped to the exact actions.
- Approval gate for higher-blast-radius steps via Step Functions.
It is like an automatic sprinkler, fast on a real fire, but zoned so it does not flood the whole building on a false alarm.
Interview tip: Stress tag exclusions and reversible actions.
L234. Macie flags that an S3 bucket contains unencrypted PII that should not be there. What is your containment and remediation sequence?
Contain first, stop the exposure:
- Lock down access, enable S3 Block Public Access, remove any public or over-broad bucket policy or ACL, and confirm there is no external access via IAM Access Analyzer.
- Assess scope, use the Macie finding to identify which objects and what PII type, then review CloudTrail data events and S3 server access logs to see if anyone already read them (a confirmed read may trigger breach-notification obligations).
Remediate:
- Encrypt the data, apply SSE-KMS (re-copy objects to apply the new key, and set default bucket encryption going forward).
- If the PII should not be there at all, move it to the correct classified or restricted bucket, or securely delete it per policy.
- Tighten the bucket policy to least privilege and enforce
aws:SecureTransport(HTTPS-only).
Prevent recurrence: enforce default encryption via an SCP or AWS Config rule, keep Macie scanning scheduled, and add data classification at ingestion.
Think containment like stopping a leak before mopping the floor.
Interview tip: Check access logs, the exposure may be a reportable breach.
L335. A user reports they suddenly lost access to a resource they had yesterday, with no IAM policy change on their user. How do you trace the cause across SCPs, boundaries, and resource policies?
If the user's own IAM policy did not change, the deny is coming from another layer. Trace top-down:
- CloudTrail, find the failed call. The
errorCodeanderrorMessageoften state the source, for example "with an explicit deny in a service control policy" versus a resource-based or boundary deny. This alone usually pinpoints the layer. - SCP or RCP, check whether a new org policy was attached to the account or OU, or an existing one edited (a tightened condition or an added Deny). Org-level changes silently affect every user.
- Permission boundary, see whether a boundary on the role or user was modified to exclude the action.
- Resource policy, inspect the resource (S3 bucket policy, KMS key policy, VPC endpoint policy) for a new explicit Deny or a tightened
aws:SourceVpce, IP, oraws:PrincipalOrgIDcondition.
Also check time-based conditions and IAM Identity Center permission-set changes. Use the IAM policy simulator to confirm the fix.
It is detective work, the gate that closed is not always the one nearest the user.
Interview tip: CloudTrail's deny message names the policy layer, read it first.
L236. After an EC2 compromise, you must rebuild cleanly. Walk through eradication and recovery from a known-good AMI plus the guardrails you add afterward.
Eradicate, assume the host is fully untrusted:
- Confirm forensics are captured (EBS snapshot plus memory), then do not clean in place, malware and persistence may hide in cron, systemd units, kernel modules, or backdoored binaries.
- Terminate the compromised instance, revoke its IAM role, and rotate every credential, key, and secret it could have touched.
Recover from a known-good AMI:
- Launch a fresh instance from a hardened, patched golden AMI (built by your pipeline, not the victim's image).
- Re-deploy application code from version control or an artifact repository, and restore data from a clean backup taken before the compromise.
- Validate with vulnerability and integrity scans before returning it to service.
Guardrails afterward: enforce IMDSv2, tighten security groups to least privilege, patch the entry vector, enable GuardDuty and Inspector, use SSM Session Manager instead of open SSH, and add detective AWS Config rules.
It is like rebuilding a house after a break-in, new locks, not just a swept floor.
Interview tip: Rebuild, never clean, and restore from a pre-incident backup.
L337. Describe a real situation where you balanced a security control against cost or developer velocity, and how you reached the final decision. (STAR format)
Use the STAR structure so the answer stays crisp:
- Situation, our dev teams wanted broad wildcard (
*) IAM permissions in the sandbox account to move fast, but security wanted strict least-privilege everywhere, which was slowing releases and generating constant access tickets. - Task, I had to protect production-grade data without becoming the bottleneck developers blamed.
- Action, I segmented by risk. In the isolated sandbox (no real data, capped by SCP guardrails and a budget alarm) I allowed wide permissions for velocity; in prod I enforced least privilege, permission boundaries, and CI/CD-only deploys. I automated access via IAM Identity Center permission sets so requests were self-service rather than tickets.
- Result, release friction dropped, prod stayed locked down, and we passed audit. Cost stayed flat because SCPs blocked expensive regions and services in the sandbox.
The lesson: security and velocity are not opposites, you tier the controls to the risk.
Interview tip: Pick one real story, quantify the result, and show a trade-off you owned.
Shared Responsibility & IAM Fundamentals (9)
L138. Explain the AWS Shared Responsibility Model. What does AWS secure versus what the customer secures?
The Shared Responsibility Model splits security between AWS and you. The easy way to remember it: AWS secures of the cloud, you secure in the cloud.
- AWS (security OF the cloud): the physical data centres, hardware, the hypervisor, the networking backbone, and the managed-service software. You never patch their servers or guard their buildings.
- Customer (security IN the cloud): your data, IAM users and permissions, guest-OS patching (on EC2), security group and firewall rules, encryption choices, and application code.
Think of AWS as the landlord who secures the building, locks, and walls; you are the tenant who locks your own flat and decides who gets a key. A misconfigured S3 bucket or a leaked access key is your fault, not AWS's.
Interview tip: Always say the split shifts depending on the service — the more managed the service, the more work moves to AWS.
L139. What is the difference between an IAM user, an IAM group, and an IAM role?
All three control access, but they work differently:
- IAM user: a permanent identity for one person or workload, with long-lived credentials (a console password and/or access keys). Example: a developer named
priya. - IAM group: a collection of users that share the same permissions. You attach a policy once to the group instead of to each user. Example: an
Adminsgroup. A group is not an identity — nothing can sign in as a group, and groups cannot be nested. - IAM role: a temporary identity with no permanent credentials. It is assumed by a trusted entity (an EC2 instance, a Lambda function, a federated user, or another account), handing out short-lived credentials via STS.
Analogy: a user is your named office ID badge, a group is a department everyone in it inherits access from, and a role is a temporary visitor pass you borrow for a specific task.
Interview tip: Best practice — give EC2/Lambda roles (and prefer human access via federation/IAM Identity Center), never hardcode user access keys.
L140. What are the main components of an IAM policy JSON document? Walk me through Effect, Action, Resource, and Condition.
An IAM policy is a JSON document made of one or more statements. Each statement has these core elements:
- Effect: either
AlloworDeny— does this statement grant or block access? - Action: the API operations being controlled, like
s3:GetObjectorec2:StartInstances. Wildcards likes3:*are allowed but risky. - Resource: the ARN(s) the action applies to, e.g.
arn:aws:s3:::reports/*. - Condition: optional rules that must evaluate true, like
aws:SourceIporaws:MultiFactorAuthPresent.
Read it as a sentence: Effect (allow) this Action (read) on this Resource (these objects) only when the Condition is met (from the office IP). Remember the evaluation order: an explicit Deny always overrides any Allow.
Interview tip: Mention Sid and Principal too — Principal appears only in resource-based (and trust) policies, never in identity-based ones.
L241. What is the difference between an identity-based policy and a resource-based policy? Give an example of each.
Both grant permissions, but they are attached to different things and answer different questions.
- Identity-based policy: attached to an IAM user, group, or role. It answers what can this identity do? It has no
Principalelement because the principal is whoever the policy is attached to. Example: a policy on userpriyaallowings3:GetObjecton the reports bucket. - Resource-based policy: attached to a resource (S3 bucket, SQS queue, Lambda function, KMS key). It answers who can touch this resource? and must include a
Principal. Example: an S3 bucket policy allowing account222222222222to read objects.
Analogy: identity-based is your ID badge listing rooms you may enter; resource-based is the guest list pinned to a specific room's door. Resource-based policies also enable cross-account access without the caller having to assume a role.
Interview tip: For cross-account access, a resource-based policy avoids an extra role-assumption hop. Note KMS key policies are mandatory — a key cannot be used unless its key policy permits it.
L242. How does the Shared Responsibility Model shift between EC2, RDS, S3, and Lambda? Where does the customer carry the most responsibility?
The more managed the service, the less you handle. The model slides along that scale:
- EC2 (IaaS): heaviest customer load. You patch the guest OS, configure security groups, manage encryption, and harden the app. Most customer responsibility lives here.
- RDS (managed DB): AWS patches and backs up the database engine and the underlying OS. You still handle network access, encryption settings, credentials, and parameter groups.
- S3 (managed storage): AWS runs the storage platform; you control bucket policies, Block Public Access, and encryption. Most breaches here are misconfiguration.
- Lambda (serverless): AWS owns the OS and runtime entirely. You only secure your code, the execution role, and your dependencies.
Think of it as renting a car (EC2: you drive and fuel) versus taking a taxi (Lambda: just say where to go).
Interview tip: Say "as you move from IaaS to serverless, responsibility shifts toward AWS — but your data and IAM are always yours."
L243. What is the difference between an inline policy and a managed policy, and when would you prefer one over the other?
Both are identity-based policies, but they differ in reusability and lifecycle.
- Managed policy: a standalone object with its own ARN that you attach to many users, groups, or roles. Two kinds: AWS-managed (maintained by AWS, e.g.
AmazonS3ReadOnlyAccess) and customer-managed (you create and version it). One edit updates everyone attached, and it supports versioning and rollback (up to 5 stored versions). - Inline policy: embedded directly inside a single user, group, or role. It has a strict 1-to-1 lifecycle — delete the identity and the policy is deleted with it. No reuse, no separate ARN, no versioning.
Prefer managed policies for almost everything — reusable, auditable, versioned. Use inline only when you need a permission tightly bound to one identity that must never leak elsewhere (e.g. a strict 1:1 service role).
Interview tip: Say managed = default best practice; inline = enforce a strict one-to-one relationship.
L144. Why is the root account a special risk, and what controls would you put around it?
The root user is the email-based identity created when you open the AWS account. It has unrestricted power — IAM identity policies cannot limit it, and a few actions (closing the account, changing the account name/email, some billing/support changes) require it. If root credentials leak, the attacker owns everything: it is an account-level master identity, not just a normal admin.
Controls to lock it down:
- Enable MFA on root — ideally a hardware security key; AWS now supports multiple MFA devices per root user.
- Never use root for daily work — create IAM admin roles (via IAM Identity Center) instead.
- Delete root access keys entirely (root should have none).
- Use a strong, unique password and a monitored email; alert on root sign-in via CloudTrail and EventBridge.
- In AWS Organizations, apply Service Control Policies (and, since 2024, centrally managed root access) to restrict and even remove member-account root credentials.
Analogy: root is the master key to the whole building — you store it in a safe and use everyday keys for daily work.
Interview tip: Stress "MFA on root + no root access keys + alert on root sign-in," and mention Organizations centralized root management for member accounts.
L245. Walk me through how you would explain the principle of least privilege to a developer team that keeps requesting wildcard (*) permissions.
I'd start with the why, not the no. Least privilege means each identity gets only the permissions it needs to do its job — nothing more. A wildcard like s3:* or Action: "*" means that if the key or role is ever leaked, the blast radius is your entire account, not one bucket.
Then I'd make it easy to comply:
- Show the data: use IAM Access Analyzer policy generation to build a policy from the role's actual CloudTrail activity — they keep only what they really call.
- Start broad in dev, tighten before prod: begin permissive, then scope down using IAM last-accessed (Action Last Accessed) data.
- Use guardrails: permission boundaries and SCPs cap the maximum, so even a wildcard request can't escape the box. IAM Access Analyzer can also flag overly permissive or unused access.
Analogy: I won't give you the master key to the whole building just because you need to enter one room.
Interview tip: Name IAM Access Analyzer policy generation, unused-access findings, and last-accessed data as the practical tooling.
L346. Compare AWS shared responsibility against Azure or GCP. What stays constant across clouds and what differs?
All three hyperscalers use the same core idea: the provider secures the underlying infrastructure, and the customer secures their data, identities, and configurations. What stays constant: the provider always owns physical security and the hypervisor; the customer always owns their data, identity and access management, and how they configure services. Data classification and IAM never leave the customer.
What differs is mostly framing and the identity model:
- Azure publishes a clearer per-tier matrix (on-prem / IaaS / PaaS / SaaS) and centres identity on Microsoft Entra ID (formerly Azure AD); SaaS like Microsoft 365 shifts more to Microsoft.
- GCP frames it as shared fate — Google leans in with secure-by-default blueprints, posture tooling, and risk-transfer programs, rather than handing you a cold matrix.
- AWS ties the split tightly to each service's IaaS-to-serverless position.
Interview tip: The killer line: "the model is conceptually identical everywhere — identity, data, and configuration are always the customer's; only the boundary lines and the shared-fate framing shift."
Detective & Edge Protection Controls (9)
L147. What does CloudTrail record, and what is the difference between management events and data events?
AWS CloudTrail is the account's audit log — it records the API calls made in your account: who did what, when, from which IP, and whether it succeeded. It's your who-did-it evidence trail for security and compliance.
It splits events into two main types:
- Management events (control plane): operations that manage resources — creating an EC2 instance, changing a security group, attaching an IAM policy. The first copy of management events is logged free in Event history (last 90 days) and via your first trail.
- Data events (data plane): high-volume operations on the data inside resources —
s3:GetObject,s3:PutObject, LambdaInvoke, DynamoDB item access. These are not logged by default (volume and cost) and must be enabled explicitly.
(CloudTrail also has a third type, Insights events, which flag unusual API-call-rate patterns.) Analogy: management events are CCTV of who entered which room; data events are CCTV of every file someone opened inside the room.
Interview tip: If asked "why didn't we see who read the S3 object?" — the answer is that S3 data events weren't enabled.
L148. What does Amazon GuardDuty do, and what data sources does it analyze to detect threats?
Amazon GuardDuty is AWS's managed threat-detection service. It continuously watches your account for malicious or suspicious activity — things like crypto-mining, credential compromise, communication with known-bad IPs, or reconnaissance — and raises findings with a severity score. It uses machine learning, anomaly detection, and curated threat intelligence, with no agents to install.
By default it analyzes log sources you already produce (called foundational data sources):
- VPC Flow Logs — network traffic patterns (GuardDuty consumes these independently; you don't have to enable Flow Logs yourself).
- CloudTrail management events — suspicious API and IAM behaviour.
- DNS query logs — calls to malicious domains.
Optional protection plans extend coverage: S3 Protection (CloudTrail S3 data events), EKS/Kubernetes Protection, RDS Protection (login activity), Lambda Protection (network activity), Malware Protection (EBS and S3 objects), and Runtime Monitoring for EC2/EKS/ECS. Think of it as a smart burglar alarm that reads the logs you already generate.
Interview tip: Memorize the three foundational sources — VPC Flow Logs, CloudTrail management events, and DNS logs — that's the classic answer.
L149. What is the difference between AWS WAF and AWS Shield? Which layer does each protect?
They protect different layers and different threats:
- AWS WAF (Web Application Firewall): works at Layer 7 (HTTP/HTTPS). It inspects the content of web requests and blocks application attacks — SQL injection, cross-site scripting, bad bots — using rules. It guards CloudFront, Application Load Balancer, API Gateway, AppSync, Cognito, App Runner, and Verified Access.
- AWS Shield: protects against DDoS attacks, mainly at Layers 3 and 4 (network/transport floods like SYN or UDP floods). Shield Standard is free and automatic for all customers; Shield Advanced (paid) adds Layer-7 DDoS protection, access to the Shield Response Team (SRT), and DDoS cost protection.
Analogy: Shield is the bouncer stopping a stampede crowd from jamming the door (volume); WAF is the security guard checking each guest's ID for forged or malicious content (per-request inspection). They're complementary, not either/or — Shield Advanced even includes WAF at no extra WAF charge.
Interview tip: WAF = L7 app attacks; Shield = L3/4 DDoS (Advanced extends to L7). Say they're layered together.
L250. How does AWS Config differ from CloudTrail? Explain configuration history versus API audit logging.
They answer different questions and complement each other.
- CloudTrail = who did what, and when? It's an API audit log. It records the action — e.g. user
priyacalledAuthorizeSecurityGroupIngressat 10:42 from this IP. It captures the event, not the resulting state. - AWS Config = what does this resource look like now, and how did it change over time? It records the configuration state of resources as point-in-time snapshots and a change timeline, and evaluates them against Config rules (e.g. "is this S3 bucket public?"). It captures the state, not who triggered it.
Analogy: CloudTrail is the CCTV showing the action of someone opening a window; Config is the before/after photo of the room showing the window is now open and flagging that it breaks policy.
Interview tip: Pair them — CloudTrail tells you who caused a drift, Config tells you what drifted (and Config can even reference the CloudTrail event that caused the change).
L251. How does AWS Security Hub fit alongside GuardDuty, Config, Macie, and Detective? Describe the detective-controls stack as one story.
Think of it as a security operations team where each service has one job and Security Hub is the manager who reads everyone's reports.
- GuardDuty — the threat detector, flagging malicious activity from logs.
- AWS Config — the compliance auditor, checking resource configurations against rules and benchmarks.
- Amazon Macie — the data-privacy specialist, discovering sensitive data (PII) in S3.
- Amazon Detective — the investigator, building visual link-analysis graphs for root-cause analysis of a finding.
Each generates its own findings. Security Hub aggregates and normalizes all of them (plus partner tools) into one place using the AWS Security Finding Format (ASFF) — and now also supports the open OCSF schema — runs continuous security-standard checks, and gives a prioritized view. The flow: GuardDuty/Macie/Config detect → Security Hub aggregates and prioritizes → Detective investigates → you remediate.
Interview tip: Say "Security Hub is the single pane of glass; the others are specialized sensors feeding it via ASFF."
L252. What is CloudTrail log file integrity validation and why does it matter for forensics?
Log file integrity validation is a CloudTrail feature that lets you prove your log files were not tampered with, deleted, or modified after AWS delivered them. When enabled, CloudTrail creates a SHA-256 hash of each delivered log file and periodically writes a digest file that is digitally signed (SHA-256 with RSA) using a private key. Each digest also references the previous one, forming a chain, so a missing or altered file is detectable.
Why it matters for forensics:
- Court-grade evidence: during an incident you must prove the logs are authentic — an attacker who gains access often tries to wipe their tracks.
- Detects deletion: the chained digests reveal if a whole log file went missing.
- Chain of custody: you can verify with the AWS CLI (
aws cloudtrail validate-logs).
Analogy: it's a tamper-evident seal on each evidence bag — break the seal and everyone can tell. (Pair it with an S3 bucket that uses Object Lock / MFA Delete to harden storage further.)
Interview tip: Mention SHA-256 hashing + signed, chained digest files + protecting against insider/attacker log tampering.
L253. How would you design WAF rules to mitigate SQL injection, XSS, and an application-layer flood? Cover managed, custom, and rate-based rules.
I'd layer three rule types in a single WAF web ACL, ordered carefully since WAF evaluates rules by numeric priority.
- Managed rule groups (first line): enable AWS Managed Rules — the SQL database rule group for SQL injection, plus the Core rule set (CRS) and Known Bad Inputs for XSS and common exploits. These are maintained by AWS and cover the common patterns instantly.
- Custom rules (tuning): write app-specific rules — block requests where a field doesn't match an expected pattern, restrict by geo, allowlist URI paths, and add custom string/regex match conditions managed groups miss. Use these to fix false positives via scope-down statements too.
- Rate-based rules (flood): for an application-layer (L7) flood, a rate-based rule counts requests per source IP over a sliding window and blocks IPs exceeding a threshold (e.g. 2000 requests per 5 minutes), optionally scoped to a path like
/login. You can also aggregate by custom keys (e.g. header or cookie) and add a CAPTCHA action.
I'd deploy each new rule in Count mode first to tune, then switch to Block.
Interview tip: Always mention testing in Count mode before Block to avoid blocking real users.
L354. Position Security Hub plus Config as a continuous CSPM/CNAPP capability against CIS Benchmarks. How would you auto-detect and auto-fix misconfigurations at scale?
Together, AWS Config + Security Hub form a native Cloud Security Posture Management (CSPM) layer. Config continuously records resource state and evaluates it against rules; Security Hub runs packaged standards — the CIS AWS Foundations Benchmark, AWS Foundational Security Best Practices (FSBP), NIST 800-53, PCI DSS — and produces a normalized, scored compliance view across all accounts.
To auto-detect and auto-fix at scale:
- Detect: enable Security Hub standards org-wide through a delegated administrator account; Config rules continuously flag drift (public S3, open security group, unencrypted EBS).
- Auto-fix: route findings via EventBridge to SSM Automation runbooks or Lambda for remediation (close the port, enable encryption, block public access), or use Config remediation actions for direct rule-to-fix mapping. AWS also ships pre-built Automated Security Response (ASR) playbooks.
- Scale: deploy via AWS Organizations plus Config Conformance Packs so every new account inherits the baseline.
For full CNAPP, fold in GuardDuty (runtime threats), Inspector (vulnerability scanning), and Macie (sensitive data).
Interview tip: The pattern to name is finding → EventBridge → SSM/Lambda auto-remediation.
L355. As your environment grows, GuardDuty and Security Hub generate too many findings. How do you reduce false positives without losing real signal?
The goal is to cut noise without muting a real attack. I'd tackle it in layers:
- Suppress, don't ignore: in GuardDuty create suppression rules for known-benign patterns (e.g. an approved vulnerability scanner's recon), and in Security Hub use automation rules to auto-set workflow status or severity for expected findings — they're still recorded, just not alerted on.
- Filter and prioritize: use trusted IP lists in GuardDuty so your own scanners aren't flagged; in Security Hub filter by severity and resource criticality, and route only high/critical findings to the SOC.
- Consolidate: aggregate cross-account findings into one delegated-administrator account (and a single aggregation Region) so duplicates collapse.
- Tune continuously: review suppressed categories monthly so a suppression doesn't quietly hide a real incident; alert on volume spikes and on any change to suppression rules.
Analogy: you don't unplug the smoke alarm — you stop it triggering on toast while keeping it loud for a real fire.
Interview tip: Stress "suppress and document, never disable detection" — auditors check that.
20-minute drill: Pick one question from each section, set a 90-second timer, and answer out loud. If you can sketch the key AWS Security diagram from memory and land each 👉 Interview tip, you’re interview-ready.