TTechclick All lessons
Palo Alto · Troubleshooting · L3 GoldmineInteractive · L1 / L2 / L3

Traffic Not Passing — The 7-Step PA Diagnostic Ladder

Ticket says "user can't reach the app." You SSH the firewall. What's the first command? The third? In what order do you NOT waste 40 minutes? Build the muscle memory here in 14 minutes — symptom → next CLI command → expected output → decision.

📅 2026-05-25 · ⏱ 14 min · 7-step interactive ladder · 🏷 10-Q assessment + AI Tutor inline

Pick where you're stuck — jump straight in

1

7-Step Ladder

The canonical command sequence. Memorise the order. Skip nothing.

2

Symptom → Command

Pick what the user actually reported. Widget gives you the next command.

3

Top 10 Causes

The 10 production root causes that account for 90% of "traffic not passing" tickets.

4

Session-End-Reason

Decode the single most-important log field — fast.

The mental model — before you touch a single command

"Traffic not passing" is the single most common ticket on every Palo Alto firewall in production. Sneha at Infosys gets one at 11:47 AM on a Tuesday: "users can't reach the SAP server, fix it now." She has two choices — guess (open the GUI, eyeball rules, hope) or run the ladder. The ladder is faster, repeatable, and works the same on every PAN-OS version from 9.1 to 11.2.

One sentence to memorise: "Match the rule → match the NAT → find the session → read the counter → capture the packet → look at ACC → check threat logs." Seven steps. That's the entire blog in one line. The rest is "exactly how" and "exactly when."

Why this order matters

Each step eliminates one possible cause. Step 1 rules out policy mis-match. Step 2 rules out NAT. Step 3 confirms the firewall actually saw a session. Step 4 tells you which dataplane stage dropped a packet. Step 5 proves the packet is or isn't arriving at all. Step 6/7 finds the silent killers — App-ID re-eval and Security-Profile resets. Skip a step and you'll loop back to it in 20 minutes anyway. Run the ladder once, finish the ticket.

① The canonical 7-step ladder — animated

Press Play. Each stage lights up as Sneha walks the ladder for a real ticket — Infosys user 10.10.10.5 can't reach sap.infosys.in (resolves to 10.50.5.20) on TCP 443.

▶ Sneha runs the ladder live

7 commands, 90 seconds. Most tickets resolve before step 5.

① POLICY MATCH test security-policy-match from trust to dmz source 10.10.10.5 destination 10.50.5.20 destination-port 443 protocol 6
Does the rule Sneha THINKS allows this traffic actually match?
② NAT MATCH test nat-policy-match from trust to dmz source 10.10.10.5 destination 10.50.5.20 destination-port 443 protocol 6
Is there a NAT rule? Should there be? If "No matching NAT" but the destination is on a different subnet, you may still be fine (no-NAT). If you EXPECT NAT and see no match — there's your bug.
③ SESSION TABLE show session all filter source 10.10.10.5 destination 10.50.5.20
Zero sessions → firewall never created state (back to step 1/2). One session with c2s bytes but no s2c bytes → return path broken.
④ GLOBAL COUNTERS show counter global filter severity drop delta yes
Most diagnostic command on the entire firewall. delta yes zeros between runs. Watch for flow_policy_deny, flow_fwd_l3_noroute, flow_fwd_zonechange.
⑤ PACKET CAPTURE debug dataplane packet-diag set filter match source 10.10.10.5 destination 10.50.5.20
4 capture stages: rx → fw → tx → drop. rx empty = packet never arrived. drop populated = dataplane killed it (run step 4 again to learn why).
⑥ ACC PIVOT ACC → Network Activity → filter on source/dest IP. Look for application = incomplete or insufficient-data.
"incomplete" = TCP handshake never finished (server problem). "insufficient-data" = handshake OK but payload too small to App-ID.
⑦ THREAT LOGS Monitor → Logs → Threat. Cross-reference session ID. session-end-reason=threat means a Security Profile (AV / Anti-Spy / Vulnerability / URL / WildFire / DNS-Sec) reset the session — find which signature.
This is where "allowed but didn't pass" lives. Allow rule + bad signature = silent block from the user's POV.
Press Play to watch the ladder execute. Each press of Next moves one rung.

Steps 1 & 2 in real CLI

Step 1 — Does my security policy match?
test security-policy-match from trust to dmz source 10.10.10.5 destination 10.50.5.20 destination-port 443 protocol 6 application ssl
Good output — your custom rule matched
"Allow-Trust-to-DMZ" {
    from                  trust;
    source                [ corp-clients ];
    source-region         none;
    to                    dmz;
    destination           [ dmz-servers ];
    destination-region    none;
    user                  any;
    category              any;
    application/service   [ssl/tcp/any/443];
    action                allow;
    icmp-unreachable      no;
    terminal              yes;
}
Bad output — fell to interzone-default
"interzone-default" {
    action                deny;
    ...
}

If you see interzone-default or intrazone-default in the match, your custom rule never fired. Common culprits — wrong zone, wrong service (TCP/443 vs TCP/8443), wrong application (rule says ssl but client speaks web-browsing), or the rule is below another rule that already matched. Pre-NAT source & destination always — that's a top-5 mistake on day one.

Step 2 — Does my NAT rule match?
test nat-policy-match from trust to untrust source 10.10.10.5 destination 8.8.8.8 destination-port 443 protocol 6
Good output
NAT-Internet-Outbound {
    from        trust;
    to          untrust;
    source      [ corp-clients ];
    destination [ any ];
    service     any;
    nat-type    ipv4;
    to-interface  ethernet1/2;
    source-translation: dynamic-ip-and-port (interface-address)
}

"No matching NAT policy" is the most common bug on a freshly-rebuilt outbound rule. If you expected NAT and don't get a match, the rule's source/destination zone is likely flipped — Palo Alto NAT rules use PRE-NAT zones for both directions. Did this lesson already? Skim back to Blog 4 (NAT Deep-Dive) for the zone gotcha.

Quick check · Q1 of 10

Rahul at TCS runs test security-policy-match for a flow that should match his "Allow-DB-Replication" rule. Output says "interzone-default" with action deny. What's the FIRST thing he should check?

Correct: b. Falling to interzone-default means no custom rule matched. 95% of the time it's a zone mismatch (ingress zone is not what Rahul thought), wrong service (TCP/3306 vs 33060), or the rule uses application=mysql while the actual flow appears as web-browsing during App-ID identification. Use test security-policy-match iteratively — change one field, re-test, watch the match.

② Symptom → next command — the decision tree

Pick a symptom Priya at HCL hears on Monday morning. The widget tells you the exact next command and what to expect.

▶ Symptom-driven command picker

Select a real symptom. Press Play — widget reveals diagnosis path.

① START Pick a symptom above, then press Play.
② FIRST COMMAND
③ DECISION
④ LIKELY ROOT CAUSE
⑤ FIX
Pick a symptom. Press Play to walk the diagnosis.
Quick check · Q2 of 10

Karthik at Flipkart sees a session in show session all with c2s bytes = 4,328 but s2c bytes = 0. The session is in state ACTIVE. What's the most likely cause?

Correct: b. The c2s direction has bytes — firewall is forwarding client → server fine. Zero s2c bytes means the server's reply never came back through this firewall. Three usual suspects: (1) server's default gateway points to a different L3 device (asymmetric), (2) server is actually down / not listening on that port, (3) a static route is missing on the firewall for return traffic. Run show session id <id> for full details, then SSH the server and confirm with ss -lntp.

③ The 10 production root causes (in order of frequency)

Across PAN-OS 9.1 to 11.2, support tickets, and Live Community threads — these 10 cover ~90% of "traffic not passing" incidents. Memorise them as a checklist.

Wrong zone in rule
tap to flip

Rule says from trust to dmz but real ingress is from corp. test security-policy-match instantly catches it. Top cause for L1 escalations.

NAT not matching
tap

Forgot to update NAT after adding a new zone / subnet. test nat-policy-match says "No matching NAT". 80% of new-rule outages.

Application-default trap
tap

Rule uses application=ssl + service application-default. App runs on TCP 8443. Rule never matches. Set service to any or add custom service.

Route lookup fail
tap

Counter: flow_fwd_l3_noroute or flow_fwd_l3_noarp. Fix: show routing route + add static route or fix dynamic protocol.

Asymmetric A/A
tap

Counter: flow_fwd_zonechange. SYN went fw1, SYN-ACK arrived at fw2. Fix: enable HA3, or Active/Passive, or PBF with Symmetric Return.

Security-profile reset
tap

Traffic log: action=allow but session-end-reason=threat. AV / Anti-Spy / Vuln / URL / WildFire / DNS-Sec killed it. Check Threat log.

DIPP exhaustion
tap

Counter: nat_dyn_port_xlat_full. New sessions silently dropped. Fix: increase DIPP oversubscription (2x → 4x → 8x) or add IPs to pool.

MTU / fragmentation
tap

Counter: flow_ipfrag_recv_err. IPSec / GRE egress with no MSS-adjust drops payloads > 1400B. Fix: tcp-mss-adjust on tunnel zone interface.

Decrypt error
tap

session-end-reason=decrypt-error / decrypt-cert-validation. Forward Proxy hit an untrusted cert. Fix: add to No-Decrypt list or fix the upstream cert chain.

Stale ARP / MAC
tap

After server NIC swap, firewall ARP cache still has old MAC. clear arp interface ethernet1/3 fixes immediately; check show arp first.

Step 4 deep-dive — the global-counter buckets you must recognise

The show counter global filter severity drop delta yes output is your fingerprint. Each counter maps to one specific stage of the dataplane pipeline. The ones below cover ~85% of real-world drops.

CounterWhat it really means
flow_policy_denyExplicit Security-policy deny matched. Fix: re-run test security-policy-match, edit the rule.
flow_policy_nofwdNo destination zone resolved from FIB lookup. Route missing or VR misconfigured.
flow_fwd_l3_norouteFIB has no route for that destination. Add static or fix dynamic protocol.
flow_fwd_l3_noarpRoute exists but ARP for next-hop is failing. Check L2 / cable / VLAN.
flow_fwd_zonechangeExisting session sees a packet arriving from a different zone — classic asymmetric-routing fingerprint.
flow_tcp_non_syn_dropMid-flow TCP packet with no matching session — usually the return half of an asymmetric flow.
nat_dyn_port_xlat_fullDIPP port pool exhausted. Raise oversubscription or add public IPs.
flow_action_close / flow_action_resetFirewall injected RST. Threat profile or App-ID block fired.
flow_ipfrag_recv_errBad/missing fragments. MTU or asymmetric path. Adjust MSS.
flow_parse_l4_cksmL4 checksum bad. Usually NIC offload bug on VM-Series — disable offload at hypervisor.
flow_host_service_denyPacket to firewall's own management service blocked (no permitting Management Profile on interface).
Quick check · Q3 of 10

Karthik at Flipkart runs show counter global filter severity drop delta yes during a 2-minute test. Counter flow_fwd_zonechange increments by 184. What's the most likely root cause?

Correct: a. flow_fwd_zonechange means an existing session saw a packet arriving from a zone different from the one it originally established on. Top causes: (1) Active/Active HA without HA3 packet forwarding, (2) misconfigured VR with overlapping subnets, (3) downstream device load-balancing return traffic via a different uplink. Fix: enable HA3 on A/A pairs, or switch to A/P, or use PBF with Symmetric Return.

Step 5 deep-dive — the 4 packet-diag stages

The debug dataplane packet-diag capture splits a packet's journey into 4 stages. The combination of which stages have packets — and which don't — tells you where the drop happened, even before you read the counter.

▶ Packet-diag stage decision matrix

Aditya at Wipro captures rx + fw + tx + drop. The pattern tells him where to focus.

① RX (receive) Captures every packet entering the dataplane from the NIC, before any policy lookup. Empty RX = packet never arrived. Look upstream (switch, cable, ARP).
② FW (firewall lookup) Captures after route + zone + policy lookup. If RX has packets but FW is empty — policy or NAT dropped early (re-run step 4).
③ TX (transmit) Captures egress packets after all transformations. If FW has packets but TX is empty — the packet was processed but never sent out (egress interface down? PBF redirect to a black-hole?).
④ DROP (drop) Captures everything the dataplane dropped at ANY stage. If you have packets in DROP, immediately re-run show counter global filter severity drop delta yes — the counter tells you WHY.
Walk the four stages — the absence of packets in a stage is as informative as their presence.
Real capture session — Aditya's full sequence
debug dataplane packet-diag set filter match source 10.10.10.5 destination 10.50.5.20
debug dataplane packet-diag set filter on
debug dataplane packet-diag set capture stage receive  file rx.pcap
debug dataplane packet-diag set capture stage transmit file tx.pcap
debug dataplane packet-diag set capture stage drop     file drop.pcap
debug dataplane packet-diag set capture on

! ... reproduce the issue from the client (2-3 attempts is enough) ...

debug dataplane packet-diag set capture off
debug dataplane packet-diag set filter off
debug dataplane packet-diag clear all

! Now download via SCP:
scp from mgt admin@10.10.10.5:/tmp/  to-name rx.pcap from rx.pcap
Don't forget to turn capture OFF

Forgotten captures on busy firewalls fill the disk and stall mgmt-plane. Always pair set capture on with set capture off + clear all in your runbook. Auto-stop after 10 minutes is a good safety habit — set a Slack reminder.

Quick check · Q4 of 10

Sneha at Infosys runs packet-diag and sees packets in stages RX and FW, but TX is empty and DROP has 12 packets. What's the next command?

Correct: c. Packets in DROP means the dataplane killed them. The global counter is the dataplane's diary — every dropped packet increments at least one counter. Run delta-yes immediately after the capture stops so the increments correlate exactly with the captured packets. From there: flow_policy_deny → fix rule; flow_action_reset → check Threat log; flow_ipfrag_recv_err → adjust MSS.

④ Session-end-reason — the field that solves 70% of "allow but doesn't work"

Step 7 of the ladder is the single most-overlooked log field on the entire firewall. The session-end-reason column in Monitor → Logs → Traffic tells you why a session ended — even when action says allow. Pearl-string-of-evidence: an "allow" action with a non-clean end-reason is the firewall saying "I allowed it, but something else killed it."

ValueWhat's really happening
aged-outIdle timer expired (normal for UDP and short flows). Suspect on long TCP flows that had no FIN — possible silent hang downstream.
tcp-finClean three-way close. Both sides agreed. Healthy.
tcp-rst-from-clientClient sent RST. Endpoint problem — chase the user's app, not the firewall.
tcp-rst-from-serverServer sent RST. Service unavailable, port closed, or backend rejected.
tcp-reuseNew SYN on an existing 5-tuple. App-ID is being re-evaluated mid-flow. Often benign.
policy-denySecurity policy denied. Can occur on an "allow" rule when application-default service mismatches the actual port (top trick question on PCNSE).
threatA Security Profile (AV / Anti-Spy / Vuln / URL / WildFire / DNS-Sec) reset the session. Open the Threat log — same session ID — to find which signature.
decrypt-error / decrypt-cert-validation / decrypt-unsupport-paramSSL decryption pipeline killed it. Add the destination to No-Decrypt or fix the cert chain.
resources-unavailableSession table full, DIPP pool exhausted, or proxy memory exceeded. Capacity issue — not a config bug.
unknown / n/aSession not torn down cleanly, or PAN-OS < 7.1 (legacy).
PCNSE trap to memorise

"A Traffic log entry shows action=allow but session-end-reason=policy-deny. How is this possible?" Answer: the security rule used application=ssl with service application-default (TCP/443 by default). The actual flow used SSL on TCP/8443. App-ID matched ssl early, but the service-default check failed once the firewall confirmed the port — resulting in a late policy-deny on what looked like an allow rule. Fix: change service to any or define a custom service. This pattern appears once per PCNSE cycle.

🤖 Ask the AI Tutor

Tap any question — instant context-aware answer. No login, no waiting.

Pre-curated answers from PAN-OS docs + Live Community real-customer threads. For complex prod issues, paste your show counter global + show session id output into chat.techclick.in.

📝 Wrap-up — six more

You've already answered 4 inline. Six left. 70% (7 of 10) total marks the lesson complete on your profile. Tap Submit all answers at the end.

Q5 · Apply

A Traffic log shows action=allow but session-end-reason=threat. Which log do you open next?

Correct: b. session-end-reason=threat always means a Security Profile reset the flow. The Threat log carries the exact signature, profile name, and action — match on session-ID for the smoking gun. Then decide: tune the profile (allow override), upgrade content, or fix the actual threat. Don't blindly disable profiles to make traffic pass.
Q6 · Analyze

Priya at HCL gets a ticket: "after this morning's commit, the site-to-site VPN client subnet 192.168.50.0/24 can't reach internal apps". She runs test security-policy-match and the correct rule matches. test nat-policy-match shows "No matching NAT". Sessions show c2s bytes but no s2c. Where is she most likely stuck?

Correct: a. Classic VPN traffic-not-passing: policy matches, NAT is intentionally not present (VPN subnets stay un-NATed) — but the return half can't find its way back because internal servers don't know 192.168.50.0/24 is reachable via the firewall. Two fixes: (1) add static route on the servers (or default-gateway adjustment), (2) explicitly add a no-NAT (translation-type = none) rule on the firewall so traffic flow is symmetric without surprising the routing. Standard fix in B2B VPN setups.
Q7 · Analyze

Aditya at Wipro sees application=insufficient-data in ACC for the suspect flow. What does that mean — and where should he look?

Correct: a. insufficient-dataincomplete — that's the trap. incomplete = TCP handshake never finished. insufficient-data = handshake DID finish but payload was too small (or absent) for App-ID classification. Both point at the endpoint side, not the firewall. SSH the server, check the application logs, run ss -lntp to confirm the service is up.
Q8 · Analyze

Sneha runs show counter global filter severity drop delta yes and sees flow_action_reset incrementing rapidly. Traffic log says action=allow. What's the most likely sequence of events?

Correct: c. flow_action_reset is the dataplane saying "I generated a TCP RST." If your security action is "allow" but you see reset-counter incrementing, a Security Profile is doing it. Match the session-ID in Traffic log → Threat log to find the offending signature. Either tune the profile (set the signature to alert-only) or fix the actual issue if the signature is correctly flagging real malicious behavior.
Q9 · Evaluate

A user reports "the SaaS app works for 8 minutes then times out, every time." Sneha runs the ladder. Sessions exist, no drops in counters, no threat log entries, session-end-reason is aged-out. What's the most likely root cause?

Correct: a. aged-out on a long-lived TCP session that's expected to be active means the application went idle long enough that the firewall closed the session. Two paths: (1) tune the application's TCP idle timer on the firewall — Objects → Applications → <app> → Timeouts → TCP timeout (default 3600s) — or use a custom-timeout via Application Override; (2) the cleaner fix is to enable TCP keepalives at the application or load-balancer so traffic flows during idle periods.
Q10 · Evaluate

You inherit a firewall where 23% of all sessions show session-end-reason=tcp-rst-from-server. The traffic is allowed. Is this a firewall problem?

Correct: b. The end-reason directly names the source of the RST. tcp-rst-from-server means the server's TCP stack generated the reset. Treat the firewall as a witness, not the culprit. Look at the server: is the service listening? Is there an application-level firewall? Is the load balancer aggressively closing connections? Some old apps RST-on-close instead of FIN — annoying but harmless. Tune the server, leave the firewall alone.
Lesson complete — saved to your profile.
Almost! You need 70% (7 of 10) — re-read the section that tripped you up and tap "Try again".

📚 Sources

  1. Palo Alto Docs — Test Policy Matches (PAN-OS 11.0+ CLI Quick Start). docs.paloaltonetworks.com
  2. Palo Alto KB — Mitigating abnormal increase in flow_policy_deny. knowledgebase.paloaltonetworks.com
  3. Palo Alto KB — PAN-OS 7.1 new session-end-reason values. knowledgebase.paloaltonetworks.com
  4. Network Direction — Troubleshooting Palo Alto Firewalls (global counter table). networkdirection.net
  5. LIVECommunity — Active/Active L3 — asymmetric routing with NAT (thread 63341). live.paloaltonetworks.com
  6. LIVECommunity — Application = Incomplete — leading causes. live.paloaltonetworks.com
  7. LIVECommunity — action=allow but session-end-reason=policy-deny. live.paloaltonetworks.com
  8. Cordero.me — Incomplete / Not-applicable / Insufficient-data — field meanings. cordero.me

What's next?

Now that you can find where a packet died, the next blog teaches the session-table internals — how sessions are born, offloaded, aged, and predicted (FTP / SIP). When you understand the session lifecycle, troubleshooting becomes routine.