The mental model — before you touch a single command
"Traffic not passing" is the single most common ticket on every Palo Alto firewall in production. Sneha at Infosys gets one at 11:47 AM on a Tuesday: "users can't reach the SAP server, fix it now." She has two choices — guess (open the GUI, eyeball rules, hope) or run the ladder. The ladder is faster, repeatable, and works the same on every PAN-OS version from 9.1 to 11.2.
One sentence to memorise: "Match the rule → match the NAT → find the session → read the counter → capture the packet → look at ACC → check threat logs." Seven steps. That's the entire blog in one line. The rest is "exactly how" and "exactly when."
Each step eliminates one possible cause. Step 1 rules out policy mis-match. Step 2 rules out NAT. Step 3 confirms the firewall actually saw a session. Step 4 tells you which dataplane stage dropped a packet. Step 5 proves the packet is or isn't arriving at all. Step 6/7 finds the silent killers — App-ID re-eval and Security-Profile resets. Skip a step and you'll loop back to it in 20 minutes anyway. Run the ladder once, finish the ticket.
① The canonical 7-step ladder — animated
Press Play. Each stage lights up as Sneha walks the ladder for a real ticket — Infosys user 10.10.10.5 can't reach sap.infosys.in (resolves to 10.50.5.20) on TCP 443.
▶ Sneha runs the ladder live
7 commands, 90 seconds. Most tickets resolve before step 5.
test security-policy-match from trust to dmz source 10.10.10.5 destination 10.50.5.20 destination-port 443 protocol 6
test nat-policy-match from trust to dmz source 10.10.10.5 destination 10.50.5.20 destination-port 443 protocol 6
show session all filter source 10.10.10.5 destination 10.50.5.20
show counter global filter severity drop delta yes
delta yes zeros between runs. Watch for flow_policy_deny, flow_fwd_l3_noroute, flow_fwd_zonechange.debug dataplane packet-diag set filter match source 10.10.10.5 destination 10.50.5.20
application = incomplete or insufficient-data.
session-end-reason=threat means a Security Profile (AV / Anti-Spy / Vulnerability / URL / WildFire / DNS-Sec) reset the session — find which signature.
Steps 1 & 2 in real CLI
test security-policy-match from trust to dmz source 10.10.10.5 destination 10.50.5.20 destination-port 443 protocol 6 application ssl
"Allow-Trust-to-DMZ" {
from trust;
source [ corp-clients ];
source-region none;
to dmz;
destination [ dmz-servers ];
destination-region none;
user any;
category any;
application/service [ssl/tcp/any/443];
action allow;
icmp-unreachable no;
terminal yes;
}
"interzone-default" {
action deny;
...
}
If you see interzone-default or intrazone-default in the match, your custom rule never fired. Common culprits — wrong zone, wrong service (TCP/443 vs TCP/8443), wrong application (rule says ssl but client speaks web-browsing), or the rule is below another rule that already matched. Pre-NAT source & destination always — that's a top-5 mistake on day one.
test nat-policy-match from trust to untrust source 10.10.10.5 destination 8.8.8.8 destination-port 443 protocol 6
NAT-Internet-Outbound {
from trust;
to untrust;
source [ corp-clients ];
destination [ any ];
service any;
nat-type ipv4;
to-interface ethernet1/2;
source-translation: dynamic-ip-and-port (interface-address)
}
"No matching NAT policy" is the most common bug on a freshly-rebuilt outbound rule. If you expected NAT and don't get a match, the rule's source/destination zone is likely flipped — Palo Alto NAT rules use PRE-NAT zones for both directions. Did this lesson already? Skim back to Blog 4 (NAT Deep-Dive) for the zone gotcha.
Rahul at TCS runs test security-policy-match for a flow that should match his "Allow-DB-Replication" rule. Output says "interzone-default" with action deny. What's the FIRST thing he should check?
interzone-default means no custom rule matched. 95% of the time it's a zone mismatch (ingress zone is not what Rahul thought), wrong service (TCP/3306 vs 33060), or the rule uses application=mysql while the actual flow appears as web-browsing during App-ID identification. Use test security-policy-match iteratively — change one field, re-test, watch the match.② Symptom → next command — the decision tree
Pick a symptom Priya at HCL hears on Monday morning. The widget tells you the exact next command and what to expect.
▶ Symptom-driven command picker
Select a real symptom. Press Play — widget reveals diagnosis path.
Karthik at Flipkart sees a session in show session all with c2s bytes = 4,328 but s2c bytes = 0. The session is in state ACTIVE. What's the most likely cause?
show session id <id> for full details, then SSH the server and confirm with ss -lntp.③ The 10 production root causes (in order of frequency)
Across PAN-OS 9.1 to 11.2, support tickets, and Live Community threads — these 10 cover ~90% of "traffic not passing" incidents. Memorise them as a checklist.
Rule says from trust to dmz but real ingress is from corp. test security-policy-match instantly catches it. Top cause for L1 escalations.
Forgot to update NAT after adding a new zone / subnet. test nat-policy-match says "No matching NAT". 80% of new-rule outages.
Rule uses application=ssl + service application-default. App runs on TCP 8443. Rule never matches. Set service to any or add custom service.
Counter: flow_fwd_l3_noroute or flow_fwd_l3_noarp. Fix: show routing route + add static route or fix dynamic protocol.
Counter: flow_fwd_zonechange. SYN went fw1, SYN-ACK arrived at fw2. Fix: enable HA3, or Active/Passive, or PBF with Symmetric Return.
Traffic log: action=allow but session-end-reason=threat. AV / Anti-Spy / Vuln / URL / WildFire / DNS-Sec killed it. Check Threat log.
Counter: nat_dyn_port_xlat_full. New sessions silently dropped. Fix: increase DIPP oversubscription (2x → 4x → 8x) or add IPs to pool.
Counter: flow_ipfrag_recv_err. IPSec / GRE egress with no MSS-adjust drops payloads > 1400B. Fix: tcp-mss-adjust on tunnel zone interface.
session-end-reason=decrypt-error / decrypt-cert-validation. Forward Proxy hit an untrusted cert. Fix: add to No-Decrypt list or fix the upstream cert chain.
After server NIC swap, firewall ARP cache still has old MAC. clear arp interface ethernet1/3 fixes immediately; check show arp first.
Step 4 deep-dive — the global-counter buckets you must recognise
The show counter global filter severity drop delta yes output is your fingerprint. Each counter maps to one specific stage of the dataplane pipeline. The ones below cover ~85% of real-world drops.
| Counter | What it really means |
|---|---|
flow_policy_deny | Explicit Security-policy deny matched. Fix: re-run test security-policy-match, edit the rule. |
flow_policy_nofwd | No destination zone resolved from FIB lookup. Route missing or VR misconfigured. |
flow_fwd_l3_noroute | FIB has no route for that destination. Add static or fix dynamic protocol. |
flow_fwd_l3_noarp | Route exists but ARP for next-hop is failing. Check L2 / cable / VLAN. |
flow_fwd_zonechange | Existing session sees a packet arriving from a different zone — classic asymmetric-routing fingerprint. |
flow_tcp_non_syn_drop | Mid-flow TCP packet with no matching session — usually the return half of an asymmetric flow. |
nat_dyn_port_xlat_full | DIPP port pool exhausted. Raise oversubscription or add public IPs. |
flow_action_close / flow_action_reset | Firewall injected RST. Threat profile or App-ID block fired. |
flow_ipfrag_recv_err | Bad/missing fragments. MTU or asymmetric path. Adjust MSS. |
flow_parse_l4_cksm | L4 checksum bad. Usually NIC offload bug on VM-Series — disable offload at hypervisor. |
flow_host_service_deny | Packet to firewall's own management service blocked (no permitting Management Profile on interface). |
Karthik at Flipkart runs show counter global filter severity drop delta yes during a 2-minute test. Counter flow_fwd_zonechange increments by 184. What's the most likely root cause?
flow_fwd_zonechange means an existing session saw a packet arriving from a zone different from the one it originally established on. Top causes: (1) Active/Active HA without HA3 packet forwarding, (2) misconfigured VR with overlapping subnets, (3) downstream device load-balancing return traffic via a different uplink. Fix: enable HA3 on A/A pairs, or switch to A/P, or use PBF with Symmetric Return.Step 5 deep-dive — the 4 packet-diag stages
The debug dataplane packet-diag capture splits a packet's journey into 4 stages. The combination of which stages have packets — and which don't — tells you where the drop happened, even before you read the counter.
▶ Packet-diag stage decision matrix
Aditya at Wipro captures rx + fw + tx + drop. The pattern tells him where to focus.
show counter global filter severity drop delta yes — the counter tells you WHY.
debug dataplane packet-diag set filter match source 10.10.10.5 destination 10.50.5.20 debug dataplane packet-diag set filter on debug dataplane packet-diag set capture stage receive file rx.pcap debug dataplane packet-diag set capture stage transmit file tx.pcap debug dataplane packet-diag set capture stage drop file drop.pcap debug dataplane packet-diag set capture on ! ... reproduce the issue from the client (2-3 attempts is enough) ... debug dataplane packet-diag set capture off debug dataplane packet-diag set filter off debug dataplane packet-diag clear all ! Now download via SCP: scp from mgt admin@10.10.10.5:/tmp/ to-name rx.pcap from rx.pcap
Forgotten captures on busy firewalls fill the disk and stall mgmt-plane. Always pair set capture on with set capture off + clear all in your runbook. Auto-stop after 10 minutes is a good safety habit — set a Slack reminder.
Sneha at Infosys runs packet-diag and sees packets in stages RX and FW, but TX is empty and DROP has 12 packets. What's the next command?
flow_policy_deny → fix rule; flow_action_reset → check Threat log; flow_ipfrag_recv_err → adjust MSS.④ Session-end-reason — the field that solves 70% of "allow but doesn't work"
Step 7 of the ladder is the single most-overlooked log field on the entire firewall. The session-end-reason column in Monitor → Logs → Traffic tells you why a session ended — even when action says allow. Pearl-string-of-evidence: an "allow" action with a non-clean end-reason is the firewall saying "I allowed it, but something else killed it."
| Value | What's really happening |
|---|---|
aged-out | Idle timer expired (normal for UDP and short flows). Suspect on long TCP flows that had no FIN — possible silent hang downstream. |
tcp-fin | Clean three-way close. Both sides agreed. Healthy. |
tcp-rst-from-client | Client sent RST. Endpoint problem — chase the user's app, not the firewall. |
tcp-rst-from-server | Server sent RST. Service unavailable, port closed, or backend rejected. |
tcp-reuse | New SYN on an existing 5-tuple. App-ID is being re-evaluated mid-flow. Often benign. |
policy-deny | Security policy denied. Can occur on an "allow" rule when application-default service mismatches the actual port (top trick question on PCNSE). |
threat | A Security Profile (AV / Anti-Spy / Vuln / URL / WildFire / DNS-Sec) reset the session. Open the Threat log — same session ID — to find which signature. |
decrypt-error / decrypt-cert-validation / decrypt-unsupport-param | SSL decryption pipeline killed it. Add the destination to No-Decrypt or fix the cert chain. |
resources-unavailable | Session table full, DIPP pool exhausted, or proxy memory exceeded. Capacity issue — not a config bug. |
unknown / n/a | Session not torn down cleanly, or PAN-OS < 7.1 (legacy). |
"A Traffic log entry shows action=allow but session-end-reason=policy-deny. How is this possible?" Answer: the security rule used application=ssl with service application-default (TCP/443 by default). The actual flow used SSL on TCP/8443. App-ID matched ssl early, but the service-default check failed once the firewall confirmed the port — resulting in a late policy-deny on what looked like an allow rule. Fix: change service to any or define a custom service. This pattern appears once per PCNSE cycle.
🤖 Ask the AI Tutor
Tap any question — instant context-aware answer. No login, no waiting.
Pre-curated answers from PAN-OS docs + Live Community real-customer threads. For complex prod issues, paste your show counter global + show session id output into chat.techclick.in.
📝 Wrap-up — six more
You've already answered 4 inline. Six left. 70% (7 of 10) total marks the lesson complete on your profile. Tap Submit all answers at the end.
📚 Sources
- Palo Alto Docs — Test Policy Matches (PAN-OS 11.0+ CLI Quick Start). docs.paloaltonetworks.com
- Palo Alto KB — Mitigating abnormal increase in flow_policy_deny. knowledgebase.paloaltonetworks.com
- Palo Alto KB — PAN-OS 7.1 new session-end-reason values. knowledgebase.paloaltonetworks.com
- Network Direction — Troubleshooting Palo Alto Firewalls (global counter table). networkdirection.net
- LIVECommunity — Active/Active L3 — asymmetric routing with NAT (thread 63341). live.paloaltonetworks.com
- LIVECommunity — Application = Incomplete — leading causes. live.paloaltonetworks.com
- LIVECommunity — action=allow but session-end-reason=policy-deny. live.paloaltonetworks.com
- Cordero.me — Incomplete / Not-applicable / Insufficient-data — field meanings. cordero.me
What's next?
Now that you can find where a packet died, the next blog teaches the session-table internals — how sessions are born, offloaded, aged, and predicted (FTP / SIP). When you understand the session lifecycle, troubleshooting becomes routine.