Why architecture matters — in interview rooms and at 2 AM
Picture an Indian Speed Post sorting centre. The old way: a parcel crosses eight conveyor stations — one weighs it, the next x-rays it, the next checks the label, the next scans the barcode, and so on. Each station re-reads the parcel from scratch. Now picture a single counter where one trained operator opens the parcel, weighs it, x-rays it, reads the label, and routes it — all in one motion. Same work, one pass, no duplicated effort.
That second picture is Palo Alto's SP3 architecture. It's why a PA-5220 can deliver 9 Gbps of threat-prevention throughput with App-ID, IPS, URL filtering, and AV all enabled simultaneously — while legacy "UTM" boxes from the same era topped out around 1–2 Gbps with the same features turned on.
You'll be asked about this in the first round of nearly every Palo Alto interview. More importantly, when you're on a 2 AM bridge call and the customer says "throughput tank ho gayi PA-5220 pe", the diagnosis lives in this same architecture — usually in session-offload behavior we'll cover further down.
The three planes — management, control, data
A PAN-OS firewall is one chassis but three independent planes running on separate CPU and memory. This separation is the most important architectural fact to commit to memory.
The management plane runs the WebUI you log into, the SSH CLI, the XML API, the commit process, the routing daemon (yes — OSPF / BGP run on MP, not DP), and ships logs to Panorama or syslog. It runs on a regular Intel CPU with its own RAM.
The dataplane is where the firewall actually does its job. It's purpose-built — network processors handle Layer 1–4, security processors do signature matching and SSL crypto, and a flow engine offloads established sessions. On larger models (PA-5450, PA-7050, PA-7080) you get multiple dataplanes per chassis — flows are distributed across them.
Between them sits a small control plane — the messaging fabric that lets MP push config updates to DP and lets DP send "I saw this user-to-IP mapping" lookups back to MP. On smaller boxes (PA-400 series) the control logic is folded into the management CPU; on larger boxes it's distinct silicon.
📍 Scenario · Rahul, NOC engineer at TCS Mumbai
Rahul commits a Security Policy change to the active PA-5220 at 14:30 IST — peak hours. The business team panics: "trading session band ho jayegi?" Reality: the commit takes 47 seconds on MP. During those 47 seconds, the DP keeps forwarding traffic using the previous ruleset. When the commit completes, DP atomically switches to the new ruleset. Existing sessions keep flowing per the old rules until they age out; new sessions match the new rules. Net production impact: zero. This separation is why MP-only changes are safe during business hours — but watch out for the exceptions in the Pro Tips below.
SP3 — single pass + parallel processing, without the marketing
Palo Alto Networks coined the term Single Pass Parallel Processing (SP3) in their original whitepaper. Marketing strips it down to "fast and secure" — which tells you nothing. Here's what it actually means.
Single pass software
On every other firewall of the early 2010s, a packet got parsed by the IPS engine, then re-parsed by the URL filter, then re-parsed by AV, then re-parsed by the DLP module. Each engine ran in series. Each ran its own decode. A 1500-byte HTTP packet got fully reassembled and re-walked four times.
PAN-OS instead does one stream-based decode. The packet is reassembled once into an application stream. Then App-ID, URL category, AV signatures, IPS signatures, file-type checks, and DLP patterns all match against the same stream, in parallel, in a single hardware pass. The hardware then makes a single allow/drop/log decision.
This is why enabling URL filtering on a PA-5220 costs ≈10% throughput — not 40%. The URL match is happening in the same hardware pass as everything else.
Parallel processing hardware
Inside the DP, Palo Alto uses dedicated silicon for different jobs running concurrently:
- Network Processors (NPUs) — Layer 1–4 work: parse headers, look up sessions in the session table, decrement TTL, apply NAT, transmit.
- Security Processors (SPUs) — Layer 7 work: pattern match against signature databases for AV / IPS / Anti-Spyware, decode App-ID protocols, run regex.
- Crypto Engines — SSL/TLS encrypt + decrypt offload, IPSec ESP encrypt + decrypt. Without these, decryption-heavy boxes would melt their CPUs.
- Flow Engine / Offload — once a session is fully classified (App-ID done, Content-ID scanning settled), the flow engine can take over packet forwarding entirely, bypassing the SPU for subsequent packets in that session.
That last bullet — the flow engine offload — is where most production performance lives or dies. We'll dissect it below.
Day in the life of a packet — the six stages
Every Palo Alto knowledge-base article and every PCNSE study guide refers back to one diagram: the day in the life of a packet. Memorize this. Recite it under interview pressure.
Here's the full table — print this and pin it above your desk for the first month on the job:
| Stage | What happens | What gets dropped here |
|---|---|---|
| 1 · Ingress | NIC RX → packet header parsed → ingress zone determined from interface | Malformed packets, MAC anti-spoof drops |
| 2 · Slowpath (new) | FIB route lookup → NAT policy match → Security policy match (without App-ID) → session created in the session table | No-route, no matching sec rule, zone-protection drops |
| 3 · Fastpath (existing) | 5-tuple session lookup → decrement TTL → apply NAT → enqueue for L7 inspection (unless offloaded) | Session expired, TCP out-of-state, mismatched zone |
| 4 · App-ID | Read application bytes → match against App-ID signatures → re-evaluate sec policy with the now-known application | Application deny, App-ID shift drops session |
| 5 · Content-ID | Stream-decode → run AV / IPS / Anti-Spyware / URL category / File-blocking / DNS Security / WildFire forwarding in parallel | Threat-prevention block, URL category deny, file-blocking, sinkhole |
| 6 · Egress | Egress route lookup → apply egress NAT → ARP / next-hop → transmit | No-egress-route, MTU drops, QoS drops |
If a packet is killed silently in production, the diagnosis always traces back to one of these six stages. Knowing which stage = knowing which CLI command to run. We'll build the full diagnostic ladder in Blog 15 — Traffic Not Passing; for now, just remember the order.
Slowpath vs Fastpath — why the first packet costs more
"Slowpath" doesn't mean slow as in lagging. It means the firewall does everything from scratch for the first packet of a flow: route lookup, NAT policy match, security policy match, session creation. For a TCP three-way handshake, only the SYN goes through slowpath; the SYN-ACK and ACK already have a session.
"Fastpath" is the lookup-and-forward path. The firewall finds the session row, applies the stored NAT translation, decrements TTL, and ships the packet out the egress interface. No policy re-evaluation. No App-ID recomputation. That's why a PA-5220 can do millions of packets per second on fastpath but only thousands of session creations per second on slowpath.
The interview punchline: Connections-per-second (CPS) is a slowpath metric; throughput is a fastpath metric. A box rated 6 Gbps throughput with 130,000 CPS will choke if you slam it with 200,000 short HTTP transactions a second — even at 100 Mbps total bandwidth.
Session offload — the hidden performance lever
Here's where senior engineers separate themselves from L1 troubleshooters. Even fastpath is not free — every packet still hits the SPU for re-evaluation against active App-ID and Content-ID inspectors. To go faster, PAN-OS supports session offload to the Flow Engine.
unknown-tcp or incomplete — can never be offloaded. It will stay on the SPU until it ages out, consuming DP cycles per packet for its entire life.Three conditions must hold simultaneously for a session to offload:
- App-ID has reached a final verdict. Not "in progress", not "unknown-tcp", not "incomplete". A concrete application label.
- Content-ID inspectors are settled. All the security profiles attached to the session's matched rule have finished their early-stream work.
- No deep inspection still required. If SSL decryption is active or a WildFire file submission is mid-flight, the session stays on the SPU.
Per the official Palo Alto KB on session offload: "A session where APP-ID has not been completed cannot be offloaded. A session where content inspection is not yet finished cannot be offloaded."
📍 Scenario · Sneha, NOC engineer at Infosys Bangalore
Sneha's PA-5220 throughput drops from 9 Gbps to 4.2 Gbps overnight. Nothing changed on her side. show running resource-monitor minute shows DP-CPU pegged at 88%. She runs show session info — session table at 1.8M of 2M max. Then she greps the top applications: show session all | match unknown-tcp | wc -l returns 460,000. A misbehaving Java service in the application tier is opening short-lived TCP connections to a new partner endpoint that PAN-OS can't fingerprint — every session sticks at unknown-tcp, never offloads, eats DP cycles until it ages out. Fix: a custom App-ID for the partner's signature, or an Application Override for that specific 5-tuple. Throughput climbs back the moment offload starts working again.
📍 Scenario · Priya, security operations at HCL Noida
Priya's PA-3260 hits 95% DP-CPU during a 10-minute window every Tuesday at 22:00. ACC shows traffic is mostly being denied by an interzone-deny rule — a script-kiddie scanner hammering port 8080. Priya assumes "denied traffic is free" and ignores it. Wrong. Every denied packet still walks slowpath: route lookup, NAT eval, policy match. With no session to offload to, denied flood-traffic costs more per-packet than allowed established flows. Fix: drop the scanner at the perimeter Zone-Protection profile (SYN flood rate-limit) before the policy lookup runs. DP-CPU returns to 30%.
PA-Series hardware — slots, line cards, and where the silicon lives
Palo Alto's hardware family ranges from the desk-sized PA-410 to the chassis-class PA-7080. For job-hunting, you only need to remember three families:
- PA-400 / PA-440 / PA-460 series — branch / SMB. Single dataplane, fixed config, no line cards. Throughputs 800 Mbps – 3 Gbps.
- PA-3200 / PA-5200 / PA-5400 series — campus / mid-enterprise / data centre edge. Multiple dataplanes (PA-5220 has 3, PA-5260 has 3, PA-5450 modular). Throughputs 5 – 40 Gbps with App-ID + threat prevention.
- PA-7000 / PA-7050 / PA-7080 — chassis-class core. Modular line cards — each Network Processing Card (NPC) is effectively its own dataplane equivalent. Up to 10 NPCs per chassis. Multi-hundred-Gbps throughput.
The PA-5450 introduced a new architecture: Management Processor Card (MPC), Networking Card (NC), and Data Processor Card (DPC) are now physically replaceable blades. If a DPC fails on a PA-5450, you swap that single card — no full chassis RMA. This is per the official PA-5400 hardware reference guide.
VM-Series — dataplane cores in the cloud
On VM-Series (Palo Alto's virtual NGFW for AWS, Azure, GCP, KVM, ESXi), the same plane separation exists — but now you choose how many vCPUs go to MP vs DP. The licence ties to total vCPU count; the split between MP and DP is configurable on supported tiers.
set system setting dp-cores 6 commit
admin@PA-VM> show plugins vm_series mode Mode: standalone DP cores: 6 MP cores: 2 Total cores: 8 Status: Reboot required if dp-cores changed.
The default on a fresh 8-vCPU deployment is 2 MP + 6 DP — sane for most workloads. If you're seeing high MP CPU because of heavy log forwarding to a Panorama collector, reduce DP to 5 and bump MP. If you're a decryption-heavy edge box, keep MP at 2 and push everything else to DP. The official VM-Series customization docs cover the per-licence-tier maximums.
Verification — the commands you'll actually run
These four commands are your first-response toolkit when someone reports a "Palo Alto throughput / CPU" issue. Memorize the expected-output shape — pattern-matching against it is faster than reading every field.
show running resource-monitor minute
Resource monitoring sampling data (per minute):
DP 0:
CPU load (%) during last 60 seconds:
core 0 1 2 3 4 5
28 31 29 34 27 30 (avg ~30%)
MP:
CPU load (%) during last 60 seconds:
core 0 1
18 22 (avg ~20%)
show session info
Number of sessions supported: 2097152 Number of allocated sessions: 421,308 Number of active sessions: 418,742 Number of active TCP sessions: 312,890 Number of active UDP sessions: 104,221 Number of active ICMP sessions: 1,631 Session table utilization (%) : 20% Hardware session offload : True Hardware UDP session offload : True Number of sessions in offload : 287,994
show session all filter application unknown-tcp | match "id "
(no lines returned — 0 sessions stuck at unknown-tcp) # If you see thousands of "id N" lines here, App-ID is not concluding # on a major flow — investigate the source app / endpoint.
show counter global filter severity drop packet-filter yes delta yes
Global counters: Elapsed time since last sampling: 1.234 seconds name value rate severity category aspect description ---------------------------------------------------------------------------------------------- flow_policy_deny 1,847 1497 drop flow pktproc Session setup: denied by policy flow_no_session_match 92 74 drop flow pktproc No session match flow_action_close 31 25 info flow pktproc TCP sessions closed via RST
The show counter global command is the single most powerful PAN-OS diagnostic. Every drop has a named counter; every named counter tells you which stage killed the packet. We'll build the full counter map in Blog 15.
Common mistakes that cost you the L1 / L2 interview
It isn't. Denied packets still walk slowpath: route lookup, NAT eval, security-policy match. With no session ever created, every packet pays the full cost. If a scanner is hammering you with 10,000 packets per second of denied traffic, your DP is spending real cycles on it. Drop attackers at Zone Protection (SYN flood, packet-based attacks) before the security policy lookup.
Most commits are MP-only and have zero data-plane impact. But a few do reset sessions: DNS Proxy enable/disable, GlobalProtect gateway changes, vsys creation, certain HA configuration changes, MTU changes on egress interfaces. Always schedule these after-hours. Reference: PAN-OS Configurations that Require Reboot KB.
Datasheet numbers separate throughput (Gbps — fastpath metric) from new sessions per second (slowpath metric) for a reason. A PA-3260 rated 5 Gbps does ~57,500 CPS. If your application opens 100,000 short-lived TCP connections per second (think micro-services with no keepalive), you'll hit the CPS wall at 10% bandwidth utilisation. Architecture choice matters — that's a PA-5220 problem, not a PA-3260 one.
The datasheet "Threat Prevention Throughput" number assumes SSL decryption is OFF. Enable decryption and effective throughput drops 40–60%. Enable URL filtering on top and you lose another 10%. Always size for the feature mix you'll actually run in production, not the raw datasheet number. PA's official sizing tool factors this in — use it.
Pro tips from the field
Add show session all filter application unknown-tcp | match "id " | count to your weekly health check. Any sustained climb above a few hundred is an early indicator of a new app the firewall can't fingerprint — either a custom App-ID or an Application Override is needed before that flow blows out your offload ratio.
When users say "the firewall is slow", check show running resource-monitor minute first. If DP is at 30% and MP is at 90%, your problem is log forwarding, GP authentication storms, or a hung commit — not packet processing. Fix MP issues with MP-side tools; don't waste time tuning DP.
If you suspect offload is misbehaving (rare — but happens after certain content updates), you can temporarily disable hardware session offload with set session offload no to confirm. CPU will rise; problem-flow may resolve. Re-enable as soon as you've identified the bad flow. Never run a box in offload-disabled mode long-term.
📋 Quick reference — burn this into memory
| Concept | One-liner you can repeat in an interview |
|---|---|
| SP3 | Single Pass software (one decode for all engines) + Parallel Processing hardware (NPU + SPU + crypto in parallel). |
| MP vs DP | Separate CPUs, separate RAM. A commit on MP cannot drop DP traffic. |
| 6 stages | Ingress → Slowpath (new) / Fastpath (existing) → App-ID → Content-ID → Egress. |
| Slowpath cost | First packet of a new session — full policy + NAT + session creation. |
| Fastpath cost | Every subsequent packet — 5-tuple lookup, NAT apply, forward. |
| Offload prereq | App-ID concluded + Content-ID settled + no pending decryption. |
| unknown-tcp | Never offloads. Always burns DP. Custom App-ID or Override required. |
| Denied traffic | Still costs DP cycles. Drop floods at Zone Protection, not at policy. |
| CPS vs Gbps | CPS = slowpath capacity. Gbps = fastpath capacity. Don't conflate. |
| VM-Series cores | Default 2 MP + 6 DP on 8-vCPU. Tunable with set system setting dp-cores N. |
📚 Sources cited inline
- Palo Alto Networks — Single-Pass Parallel Processing Architecture whitepaper. paloaltonetworks.com/resources/whitepapers/single-pass-parallel-processing-architecture
- Palo Alto Knowledge Base — Packet Flow Sequence in PAN-OS (Article ID 56081). knowledgebase.paloaltonetworks.com
- Palo Alto Knowledge Base — Session Offload, Fastpath and Slowpath behavior.
- Palo Alto Networks — PA-5400 Series Hardware Reference Guide.
- Palo Alto Networks Docs — VM-Series Customize Dataplane Cores. docs.paloaltonetworks.com/vm-series
- Palo Alto Networks — PCNSE Exam Blueprint (current). paloaltonetworks.com/content/dam/pan/en_US/assets/pdf/datasheets/education/pcnse-blueprint.pdf
- Threat Filtering blog — Packet Flow and Order of Operations in PAN-OS. threatfiltering.com
📝 Check your understanding
10 scenario questions — same depth interviewers and the PCNSE exam will use. Pick one answer per question. You need 70% (7 of 10) to mark this lesson complete on your profile. Each question is tagged with its Bloom level so you know what skill it's testing.
What's next?
Now you know how a packet moves through PAN-OS. Next we'll learn where it enters the firewall: zones, interface types (L3 / L2 / vwire / tap / sub-IF), and virtual routers — the forwarding skeleton every security policy sits on top of.