Most engineers think…
Most engineers think a security group is the VPC firewall — "I locked down the SG, so my VPC is firewalled." So they picture one control doing every job: blocking inbound, blocking outbound, and catching attacks.
Wrong — and this gap is exactly what gets exploited. A security group is a stateful allow-list on ONE elastic network interface; a NACL is a stateless allow/deny list on a subnet boundary. Neither does deep inspection, IPS signatures, or domain-based egress filtering. When an attacker is already inside and quietly beaconing out to a random domain, only a real IPS-grade, stateful, VPC-level control catches it. That control is AWS Network Firewall — a managed Suricata engine that sits in its own subnet and inspects traffic the SG and NACL never see in detail.
① The VPC security layers — SG, NACL & Network Firewall
Meet Sneha, an L1 cloud engineer at Flipkart. Her VPC 10.20.0.0/16 hosts a payments app. On day one she's told "the VPC is secured — the security groups are tight." Then an incident lands: a compromised container is quietly sending data out to data-collector.unknown-site.example over port 443. The security group allowed all outbound (the default), so nothing stopped it. This is the lesson in one story: the three VPC layers do different jobs, and most leaks happen in the gap between them.
Layer one is the security group (SG). It's attached to an ENI (an instance, a load balancer, an RDS endpoint). It is stateful — if you allow an inbound request, the reply is allowed automatically — and it only has allow rules. Think of it as the guard standing at your individual flat's door. The classic default mistake: outbound is left wide open (0.0.0.0/0 allow all), so a host can talk to anything on the internet.
Layer two is the network ACL (NACL). It sits on the subnet boundary, is stateless (you must write both the inbound AND the matching outbound/ephemeral-port rule yourself), and supports explicit deny. Think of it as the society's main gate register — it sees everyone entering or leaving the whole building block, but it only checks IP, port and protocol numbers. It still can't tell that one HTTPS flow is a legitimate API call and another is exfiltration to a strange domain.
Layer three is AWS Network Firewall. It is a managed service — AWS runs the engine, scales it, and patches it — that you deploy as firewall endpoints inside dedicated subnets of your VPC. Under the hood it runs a Suricata-compatible engine, so it can match attack signatures (true IPS), filter by domain name (the SNI in a TLS handshake or the HTTP Host header), and — if you enable it — do TLS inspection to look inside encrypted traffic. This is the layer that catches Sneha's exfiltration.
The three layers, one tap each
Tap each card. If you can rattle these off, half the SCS-C02 Infrastructure Security domain stops being scary.
Stateful, attached to an ENI, allow-only. Reply traffic auto-allowed. So: per-server guard, IP/port only — blind to content.
Stateless, on the subnet, allow + deny, both directions written by hand. So: cheap broad deny — still IP/port only.
Managed, stateful, VPC-level Suricata IPS. Domain filtering + optional TLS inspection. So: the only layer that reads what's inside.
Use all three — SG for the host, NACL for blunt subnet deny, firewall for inspection + egress. So: a leak must beat every layer.
Picture a Bengaluru apartment society. The security group is the lock on your individual flat — it decides who gets into your door. The NACL is the society's main-gate register that notes everyone entering or leaving the whole block, but only checks the vehicle number (IP/port). The Network Firewall is the trained guard who actually opens the delivery box, checks WHAT is being carried out, and recognises the courier's company name (the domain). A thief can have a valid vehicle number and still get caught at the third check — which is exactly why you need all three.
Rahul at Infosys says: "My EC2 has a tight security group and the subnet has a NACL. Am I protected against an infected host beaconing out to a malicious domain on 443?" Best answer?
Pause & Predict
Predict: a security group is stateful but a NACL is stateless. If you ALLOW inbound HTTPS (443) on a NACL, will the reply traffic flow back automatically? Type your guess.
② Architecture patterns — inspection VPC, TGW & the route-table dance
One firewall per VPC works for a single app, but real estates have dozens of VPCs across many accounts. You don't deploy a firewall in each — you build a central inspection VPC and route everyone's traffic through it. The hub that makes this possible is the Transit Gateway (TGW): every spoke VPC attaches once, and the TGW route tables steer traffic into the inspection VPC before it goes anywhere else.
Inside the inspection VPC, AWS requires a specific subnet layout. In each Availability Zone you put two subnets: one for the TGW attachment and a separate one that holds only the firewall endpoint. That firewall subnet must contain no other resources — Network Firewall cannot inspect traffic that originates in the same subnet as its own endpoint. You also deploy an endpoint per AZ for resilience: if you only put one in AZ-a, an AZ-a outage takes your inspection offline.
Now the part that trips everyone up: appliance mode. A stateful firewall must see both directions of a flow on the same endpoint. Without appliance mode, the TGW can hash the return traffic to a firewall endpoint in a different AZ — the firewall sees half a conversation, decides it's asymmetric, and drops it. You enable appliance mode on the inspection-VPC's TGW attachment so the TGW pins each flow to one endpoint for its whole life.
The "route-table dance" is just making the next hop be the firewall. In the inspection VPC, the TGW-attachment subnet's route table sends 0.0.0.0/0 to the firewall endpoint; the firewall subnet's route table then sends the inspected traffic onward (to a NAT gateway for egress, or back to the TGW for east-west). If ReceivedPackets on the firewall stays at zero, it's almost always a route table that doesn't actually point at the endpoint — not a rule problem. Compare this to a third-party NGFW appliance: with a vendor NGFW you run EC2 instances behind a Gateway Load Balancer and own the patching/scaling; with Network Firewall, AWS owns all of that — you only own the rules and routes.
Symptom one: the firewall shows healthy but the ReceivedPackets CloudWatch metric sits at zero — the route table never actually sends traffic to the firewall endpoint, so nothing reaches it. Fix: point the source subnet's route (and the IGW ingress route, for inbound designs) at the firewall VPC endpoint ID. Symptom two: traffic flows but a chunk gets dropped for no obvious reason across AZs — that's asymmetric routing because appliance mode is off on the inspection TGW attachment, so return packets land on a different-AZ endpoint. Fix: enable appliance mode on that attachment.
Priya at HCL builds a central inspection VPC with firewall endpoints in three AZs. East-west traffic works most of the time but randomly drops on long-lived connections. Single most likely root cause?
Pause & Predict
Predict: why does AWS insist the firewall endpoint live in its OWN subnet with nothing else in it? Type your guess.
③ Rule groups + egress control — stateless, stateful, Suricata & domains
A firewall does nothing until you give it a firewall policy, which bundles two kinds of rule groups. Stateless rule groups run first and look at each packet alone using the 5-tuple (source/destination IP, source/destination port, protocol). Their job is a quick triage: pass obviously-fine traffic, drop obvious junk, and — crucially — forward everything else to the stateful engine. If you forget to forward, the stateful rules never see the traffic.
Stateful rule groups are where the real inspection happens. They run on the Suricata engine and come in three flavours: a 5-tuple standard group, a domain list group (allow or deny by domain), and raw Suricata rule strings for full control. You also choose an evaluation order: strict order (recommended — rules run in the priority you set, and you declare default actions for non-matching packets) versus the older default action-order.
Here's the egress-control move that stops data exfiltration. A stateful domain list rule group lets you write an allow-list: only let traffic out to .amazonaws.com, your partner APIs, and your package mirrors — everything else is dropped. For HTTPS the firewall reads the SNI in the TLS handshake; for HTTP it reads the Host header. By default the domain rule group only inspects traffic from the VPC's own CIDR — to inspect other CIDRs (a centralized design) you must widen the HOME_NET variable. A freshly-shipped 2025 feature even lets domain rules use Reject and Alert actions, not just drop.
You don't have to write every attack signature yourself. AWS publishes managed threat-signature rule groups (botnet command-and-control domains, known malware signatures) that AWS keeps updated — you just add them to the policy. And every decision can be logged: Network Firewall writes FLOW logs (a record of every flow) and ALERT logs (only traffic that matched a drop/alert rule) to one of three destinations — an S3 bucket, a CloudWatch Logs group, or a Kinesis Data Firehose. Logging is what turns the firewall from a silent gate into evidence you can hunt through.
# A raw Suricata rule string in a stateful rule group (strict order). # Drops any HTTPS to the bad SNI and writes an alert log line. drop tls $HOME_NET any -> $EXTERNAL_NET any (tls.sni; \ content:"data-collector.unknown-site.example"; \ msg:"Blocked exfil to unknown domain"; sid:1000001; rev:1;) # Same idea as a domain-list ALLOW (egress allow-list — deny everything else): # Type: Domain list | Action: Allow # Domains: .amazonaws.com, .flipkart-internal.example, .ubuntu.com
# CloudWatch ALERT log (JSON, trimmed) when the EC2 host tries the bad domain:
{"event_timestamp":"1749600000","event":{
"src_ip":"10.20.10.5","dest_ip":"203.0.113.45","app_proto":"tls",
"tls":{"sni":"data-collector.unknown-site.example"},
"alert":{"action":"blocked","signature":"Blocked exfil to unknown domain","signature_id":1000001}}}▶ Watch one egress packet run the firewall policy
An EC2 host at 10.20.10.5 opens HTTPS to an unknown domain. Follow it through the stateless triage, the stateful engine and the domain check. Press Play for the healthy path, then Break it to see the failure.
Three quick proofs. (1) In CloudWatch, the firewall's ReceivedPackets metric should be climbing — if it's zero, your routes don't point at the endpoint. (2) Generate a test hit to a domain that's NOT allow-listed and confirm an ALERT log line appears in your S3/CloudWatch destination. (3) Check the firewall's status is READY and the policy shows your stateful rule group attached. If all three hold, traffic is genuinely being inspected — not just routed past a firewall that's doing nothing.
Karthik at Zomato faces this
Karthik, an L2 analyst, gets an alert that an EC2 worker is talking to a suspicious domain. He adds a stateful domain DENY rule for it, but the next day the same host reaches the domain again — the rule didn't bite.
The traffic came from a different VPC CIDR than the firewall's own VPC. By default the stateful domain rule group's HOME_NET is set to the firewall VPC's CIDR, so traffic sourced from other CIDRs (the spoke VPCs in his centralized design) was never matched by the domain inspection at all.
He checks the FLOW logs and sees the worker's source IP is in a spoke CIDR (10.30.x) that isn't inside the firewall VPC's CIDR (10.100.x). The domain rule simply skipped it.
Amazon VPC Console → Network Firewall → Firewall policies → (policy) → Rule variables → HOME_NETHe widens the HOME_NET rule variable to include every spoke CIDR being inspected (10.20.0.0/16, 10.30.0.0/16, …) alongside the firewall VPC CIDR, so the domain list inspects traffic from all of them.
He re-tests from the spoke worker → the domain is now DROPPED and an ALERT log line appears; FLOW logs confirm the spoke CIDR's egress is being evaluated.
Neha at Airtel writes a stateless rule that says 'pass' for all traffic and a great set of stateful domain rules. Users report the malicious domains are NOT being blocked. Why?
Pause & Predict
Predict: domain filtering reads the SNI from the TLS handshake without decrypting. So when would you actually need to turn on full TLS inspection? Type your guess.
④ Putting it together — a leak-proof multi-tier VPC & cheat-sheet
Now we assemble everything into a secure multi-tier VPC. Three tiers, three subnet types: a public subnet (only the load balancer and NAT gateway live here), a private app subnet (your EC2/containers, no direct route to the internet gateway), and a data subnet (RDS/databases, the most locked-down, no internet path at all). App and data tiers reach the internet only outbound, via the NAT gateway — and that egress is forced through the Network Firewall first. There is no inbound path from the internet gateway to the app or data tiers.
Wrap it with telemetry: enable VPC Flow Logs on the VPC (to CloudWatch or S3) so you have a record of every accepted/rejected flow, and turn on the Network Firewall's own FLOW + ALERT logging. Now you have two complementary trails: Flow Logs say who talked to whom, and the firewall ALERT logs say what we blocked and why. Together they're what an incident responder actually needs at 2 a.m.
The worked example that ties it off: a "block exfil to unknown domain" egress policy. You create a stateful domain list ALLOW group naming only the handful of domains your app legitimately needs (your AWS service endpoints, your package mirror, your partner API), set the firewall policy's stateful default action to drop, and widen HOME_NET to cover every CIDR you inspect. Result: a compromised host can try to beacon out all it wants — anything not on the short allow-list is dropped and logged. That single pattern defeats the most common cloud-breach step: quiet data exfiltration to attacker infrastructure.
aws network-firewall create-rule-group \
--rule-group-name egress-allowlist-prod --type STATEFUL --capacity 100 \
--rule-group '{"RulesSource":{"RulesSourceList":{
"Targets":[".amazonaws.com",".ubuntu.com",".flipkart-internal.example"],
"TargetTypes":["TLS_SNI","HTTP_HOST"],"GeneratedRulesType":"ALLOWLIST"}}}'
aws network-firewall describe-firewall --firewall-name prod-inspection-fw \
--query 'FirewallStatus.Status'{
"RuleGroupResponse": {
"RuleGroupArn": "arn:aws:network-firewall:ap-south-1:123456789012:stateful-rulegroup/egress-allowlist-prod",
"RuleGroupName": "egress-allowlist-prod",
"Type": "STATEFUL",
"Capacity": 100
}
}
"READY"For your certification path, this lesson maps onto the AWS Certified Security – Specialty (SCS-C02) blueprint — specifically Domain 3: Infrastructure Security (20%), the single largest scored domain, which is all about network segmentation, edge protection and secure VPC design. Knowing SG vs NACL vs Network Firewall, the inspection-VPC pattern, and egress domain filtering covers a real slice of it, and the FLOW/ALERT logging ties into Domain 2: Security Logging & Monitoring (18%). This is high-yield exam material, not trivia.
Picture a building that only lets out parcels going to verified addresses. The domain allow-list is that approved address book — a courier (your app) can hand over a parcel only if the destination is on the list, exactly like an Aadhaar-style verification at the gate. Anything addressed to an unknown place is held and a note is logged (the ALERT log). Even if an insider tries to smuggle data out, the gate checks the destination name, not just the vehicle — which is precisely how the firewall's domain egress control beats a thief who has a 'valid' (open) port.
Cold, in 30 seconds: name the three layers and their scope (SG = ENI, NACL = subnet, Network Firewall = VPC inspection); say why an inspection VPC needs appliance mode (return path symmetry); explain the egress allow-list move (domain list Allow + default drop) and why it stops exfiltration; and list the two log types (FLOW and ALERT) and three destinations (S3/CloudWatch/Kinesis). If you can do that without notes, you've got the Infrastructure Security core of SCS-C02.
An interviewer asks Arjun: "In one sentence, what's the single most valuable thing AWS Network Firewall does that a security group and NACL can't?" Best answer?
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from AWS docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: In one line, why does an egress domain allow-list with a default 'drop' action stop data exfiltration that a tight security group never would? Then compare to the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- AWS Network Firewall
- Managed, stateful, VPC-level firewall running a Suricata-compatible IPS engine, with domain filtering and optional TLS inspection. Deployed as endpoints in dedicated subnets.
- Security group (SG)
- Stateful virtual firewall on an ENI. Allow rules only; return traffic is auto-permitted. Judges by IP/port/protocol.
- Network ACL (NACL)
- Stateless allow/deny list on a subnet boundary. You must write both directions by hand. IP/port/protocol only.
- Inspection VPC
- A dedicated VPC that hosts firewall endpoints and inspects traffic for many spoke VPCs in a centralized design.
- Transit Gateway (TGW)
- A cloud router connecting many VPCs and on-prem networks hub-and-spoke; the hub that routes traffic into the inspection VPC.
- Appliance mode
- A TGW attachment setting that pins a flow to one firewall endpoint/AZ for its life, keeping request and reply symmetric. Required for stateful cross-AZ inspection.
- Firewall endpoint
- The VPC endpoint Network Firewall creates per AZ. Traffic must be routed to it; its subnet must hold nothing else.
- Stateless rule group
- First-pass rules evaluating each packet alone on the 5-tuple. Actions: pass, drop, or forward-to-stateful.
- Stateful rule group
- Suricata-engine rules evaluating a packet in its flow context. Home of IPS signatures, 5-tuple, domain lists and raw Suricata strings.
- Domain list rule group
- A stateful group that allows or denies traffic by domain (TLS SNI / HTTP Host). The egress allow-list tool.
- HOME_NET
- Suricata variable defining your 'internal' network. Defaults to the firewall's VPC CIDR; widen it to inspect other (spoke) CIDRs.
- FLOW vs ALERT logs
- FLOW logs record every flow; ALERT logs record only traffic matching a drop/alert rule. Sent to S3, CloudWatch Logs, or Kinesis Data Firehose.
📚 Sources
- AWS Network Firewall Developer Guide — "Network Firewall stateless and stateful rules engines" + "Working with stateful rule groups" (Suricata-compatible engine; stateless 5-tuple triage forwards to stateful; strict order; pass/drop/reject/alert actions). docs.aws.amazon.com/network-firewall/latest/developerguide/firewall-rules-engines.html · docs.aws.amazon.com/network-firewall/latest/developerguide/stateful-rule-groups-ips.html
- AWS Network Firewall Developer Guide — "Stateful domain list rule groups" + "URL and Domain Category Filtering" (domain allow/deny via TLS SNI and HTTP Host; HOME_NET defaults to the deployment VPC CIDR and must be widened to inspect other CIDRs). docs.aws.amazon.com/network-firewall/latest/developerguide/stateful-rule-groups-domain-names.html · docs.aws.amazon.com/network-firewall/latest/developerguide/rule-groups-url-filtering.html
- AWS Whitepaper — "Building a Scalable and Secure Multi-VPC AWS Network Infrastructure": centralized network security with Transit Gateway, the inspection VPC two-subnets-per-AZ layout, appliance mode for symmetric routing, and NAT-gateway centralized egress. docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/centralized-network-security-for-vpc-to-vpc-and-on-premises-to-vpc-traffic.html
- AWS re:Post + AWS Network Firewall Developer Guide troubleshooting — "Troubleshoot issues with Network Firewall rules" / "Configure Network Firewall rules for specific domains" (ReceivedPackets=0 means routes don't point at the endpoint; firewall subnet must contain only the endpoint; no asymmetric routing; stateless must forward to stateful). repost.aws/knowledge-center/network-firewall-troubleshoot-rule-issue · docs.aws.amazon.com/network-firewall/latest/developerguide/troubleshooting-rules.html
- AWS What's New (Sept 2025) — "AWS Network Firewall adds Reject and Alert actions for stateful domain list rule groups" (recent feature giving domain rules more granular actions than drop alone). aws.amazon.com/about-aws/whats-new/2025/09/aws-network-firewall-reject-alert-domain-list-rule-groups/
- AWS Certified Security – Specialty (SCS-C02) Exam Guide — Domain 3 Infrastructure Security (20%, the largest scored domain: network segmentation, edge protection, secure VPC design) and Domain 2 Security Logging & Monitoring (18%). d1.awsstatic.com/training-and-certification/docs-security-spec/AWS-Certified-Security-Specialty_Exam-Guide_C02.pdf
- AWS Networking & Content Delivery Blog — "Deployment models for AWS Network Firewall" (distributed vs centralized inspection, route-table design, comparison with third-party NGFW behind a Gateway Load Balancer). aws.amazon.com/blogs/networking-and-content-delivery/deployment-models-for-aws-network-firewall/
What's next?
Your firewall now blocks and LOGS the bad egress — but who reads those alerts and connects them to a real attack? Next we wire detection and posture management: GuardDuty for threat findings and Security Hub to aggregate, score and act on them.