Most engineers think…
"Action: allow in the traffic log means the connection worked." That is how most students read Monitor — and exactly how engineers lose 40 minutes of a P1 bridge call blaming the application team while the real fault sits in routing.
Allow only means the firewall permitted the attempt. Whether a reply ever came back is written in two other columns — application and session end reason. By the end of this lesson you'll read allow + incomplete + aged-out the way a doctor reads an X-ray.
① Read the fire like a doctor — the 5-step ladder
Think of a good doctor at an OPD. A patient walks in with fever; the doctor doesn't guess a medicine. She orders tests in order — temperature, blood test, X-ray — and lets evidence pick the diagnosis. Palo Alto troubleshooting is the same OPD discipline. The user complaint is the symptom; the tests are the traffic log, the session table, the policy tester and the global counters. Engineers who fail interviews are the ones who guess the medicine.
Every scenario in this lesson is solved by the same 5-step ladder. Learn it once, reuse it forever:
- Traffic log first — Monitor → Logs → Traffic, filter on the source. Read three columns: action, application, session end reason.
- Session table second —
show session all filter source x.x.x.x, thenshow session id N. Byte counts don't lie. - Policy tester third —
test security-policy-matchproves which rule WOULD match, without sending a packet. - Global counters fourth —
show counter global … delta yescatches drops that never reach any log. - Fix, then verify from the log — never from the config screen.
Step 3 is the one interviewers love, because it has a syntax trap. protocol takes the IP protocol number — TCP is 6, UDP is 17 — not the word "tcp":
> test security-policy-match from trust to untrust source 10.40.14.197 destination 203.0.113.50 destination-port 443 protocol 6
"Allow-Web-Out; index: 4" {
from trust;
source 10.40.14.0/24;
to untrust;
destination any;
application/service [ ssl web-browsing ];
action allow;
terminal yes;
}And when the logs stay suspiciously empty, step 4 catches the silent drops:
> show counter global filter severity drop delta yes
Global counters: Elapsed time since last sampling: 5.21 seconds name value rate severity category aspect description flow_policy_deny 1432 274 drop flow session Session setup: denied by policy
A brand-new PA-3410 "doesn't process traffic, the Monitor is clear!" — a real LIVEcommunity thread. Two silent defaults wreck juniors here: the interzone-default rule denies AND logs nothing until you override it (Policies → Security → interzone-default → Override → Actions → Log at Session End), and dataplane interfaces don't answer ping without an Interface Management Profile. An empty log is not proof of no traffic.
The four log words that tell you everything
The application column speaks a tiny language. Tap each card — these four words answer "is it the firewall or not?" faster than any packet capture:
The 3-way handshake never finished — usually the reply never came back. So what: stop blaming policy, start chasing routing.
Handshake done, but too little payload for App-ID to name the app. So what: this is what a telnet "port open" test looks like — it proves nothing.
Denied on port/service before App-ID even ran. So what: the rule's service column — not the application — killed it.
The session expired waiting — nobody said goodbye. So what: paired with allow + incomplete, it's the fingerprint of a missing return route.
▶ Live demo 1 — anatomy of a session that "worked for 5 seconds"
Watch how a flow is allowed, identified, then re-judged mid-session. Press Play for the healthy path, then Break it to see the failure.
Rahul at TCS must prove to an auditor that the firewall would permit SSH from jump host 10.10.20.5 to server 172.16.5.10 — without sending live traffic. He types: test security-policy-match source 10.10.20.5 destination 172.16.5.10 destination-port 22 protocol ___. What completes the command?
protocol argument takes the IP protocol number — TCP is 6. "tcp" as a keyword is Cisco muscle memory and errors out; 22 is the destination port, already supplied; 17 is UDP and would test the wrong rule set for an SSH flow.② Policy says allow, the app still dies
This family of fires has one root: on a Palo Alto, the App-ID engine — not the port — decides what a flow is. Engineers arriving from ASA/FTD-land keep building port rules, and the firewall keeps politely refusing.
Rahul at TCS faces this
A new mail gateway must fetch IMAPS on TCP 993. His rule allows application ssl with service application-default. Telnet to 993 connects — but the mail sync fails, and the traffic log shows the session denied.
application-default means "this app on the ports Palo Alto defined for it" — for ssl that is 443 only. TCP 993 never matches the rule, no matter how valid the TLS is.
Check the rule's Service column, then confirm the app's official default ports.
Objects → Applications → ssl → Standard Ports: tcp/443Keep the App-ID, add an explicit service object for tcp/993 on that rule (or a second rule). Never flip to service any — that is the least-secure way out.
Re-run the sync, then confirm in Monitor: app imap over tcp/993, action allow, session end tcp-fin — not from the config screen.
Pause & Predict
Before a go-live, an engineer telnets from the app subnet to the destination port — it connects. The change ticket is closed as "firewall open". Does that telnet actually prove the real application will work? Type your guess.
insufficient-data, and proves nothing. Once the real app sends payload and gets named, policy can still deny it.That is the trap that catches even 20-year veterans: the telnet test. On an App-ID rule the firewall must let the 3-way handshake plus a few packets through — otherwise App-ID has nothing to read. So telnet connects, logs say insufficient-data, the change ticket gets closed as "port open"… and the real application still dies the next morning.
"Port open in telnet" proves only that handshakes are allowed — which App-ID rules always permit. To pre-test an App-ID rule, use the real client once, or test security-policy-match with the application argument. That's the answer interviewers are listening for.
The third fire in this family is the App-ID shift you watched in Live demo 1. A session starts as ssl or web-browsing, then shifts to the real app (google-base, ms-office365) once more payload arrives — and every shift re-tests policy. If no rule allows the newly-named app, the session that "worked for 5 seconds" dies with session end policy-deny. The user swears it works, then breaks; both are true.
Sneha at Wipro builds a rule for a market-data vendor: application ssl, service application-default, source 192.168.40.0/24, destination the vendor feed on TCP 563. The feed never connects; logs show the traffic denied. The right fix?
any passes traffic but allows ssl on every port — the Cisco-habit hole. Predefined App-ID ports are vendor-maintained and not editable. tcp-reject-non-syn is unrelated — the deny is a policy/port mismatch, and disabling TCP sanity checks fixes nothing.③ The return path is the killer
Order biryani on Swiggy and give the wrong callback number — the delivery boy reaches your gate, but the confirmation call goes nowhere and the order times out. Half of all "firewall is blocking us" tickets are exactly this: the request reaches the server, but the reply has no route back through the firewall. The firewall logged allow, did its job, and still gets blamed.
Two rules of the house before the scenario. First, the NAT golden rule: a NAT rule is written with pre-NAT zones and addresses, but the security rule that permits the flow uses the post-NAT zone with the pre-NAT destination IP. It is the single most-asked Palo Alto interview gotcha in India — usually dressed up as U-turn NAT. Second: byte counts don't lie.
Priya at Infosys faces this
A new monitoring VLAN (172.16.40.0/24) must poll an app server 10.20.8.40 through the firewall. Polls fail. Traffic log: allow · incomplete · aged-out. The server team insists "we see your SYNs arriving".
The server's reply follows its default route via the old core switch — bypassing the firewall. The SYN-ACK never comes back through the box that owns the session.
Open the session — if s2c bytes stay at 0, the return path is broken. Confirm with the asymmetry counters.
show session id 240752 · show counter global filter delta yes | match non_synFix the routing (server side returns via the firewall), or source-NAT the monitoring VLAN behind a firewall-owned IP the server already routes to. PBF cannot help — it steers what the firewall sees, and the reply never reaches the firewall.
Re-poll, re-open the session: c2s AND s2c byte counts climbing, session ends tcp-fin.
> show session id 240752
Session 240752
c2s flow:
source: 172.16.40.21 [trust]
dst: 10.20.8.40
proto: 6
state: INIT type: FLOW
total byte count(c2s) : 74
total byte count(s2c) : 0 <-- the reply never came homeHalf the internet's advice for asymmetry is "set tcp-reject-non-syn to no and asymmetric-path to bypass". Those defaults (yes / drop) are TCP sanity checks. Disabling them globally doesn't fix your routing — it blinds the firewall to out-of-state packets, permanently. If you must, do it per-zone via a Zone Protection profile, time-boxed, while you fix the actual path.
GlobalProtect — "connected, but nothing works"
Remote-access tickets are the same fires with a tunnel wrapped around them. The key fact almost nobody reads in the docs: GlobalProtect uses exactly two data ports — TCP 443 (portal, gateway, SSL tunnel) and UDP 4501 (IPSec as ESP-in-UDP, no IKE at all). The client always tries IPSec first and silently falls back to SSL.
▶ Live demo 2 — how a GlobalProtect connection actually comes up
Four stages, two ports. Press Play for the healthy path, then Break it to see the silent fallback users complain about.
show global-protect-gateway current-user → Tunnel Type: IPSec — full speedPause & Predict
A connected GlobalProtect user can SSH to 10.10.7.15 by IP without issues, but every internal website fails by hostname. Is this a firewall problem — and what ONE test proves your answer? Type your guess.
nslookup against the tunnel's assigned DNS server settles it in ten seconds. (A famous Linux variant: GlobalProtect wrote DNS into systemd-resolved but /etc/resolv.conf pointed at the wrong file — users blamed the VPN for months.)Aditya at Flipkart gets complaints that GlobalProtect users (pool 10.200.50.0/24) cannot open an internal app at 10.10.8.40. The traffic log shows action allow, application incomplete, session end aged-out for every attempt. What does this fingerprint tell him, and what fixes it?
④ When the platform itself fails — HA, change nights and CVE mornings
The hardest scenario questions aren't about traffic at all. They're about the firewall as a patient: the HA pair that betrays you, the content update that changes everything while "nothing changed", and the morning a 9.3 CVE drops before your patch exists.
Fire 1 — "the primary recovered… but stayed passive"
Karthik at HCL faces this
Last week the primary firewall lost power and the secondary took over — perfect. The primary has been healthy again for three days, yet it is still passive, and the manager wants to know why the "main" box isn't active.
Nothing is broken. Preemption is disabled by default — a recovered firewall does not take the active role back unless preemption is enabled on both peers.
Read the HA state and election settings — lower Device Priority number = higher priority.
show high-availability state · Device → High Availability → General → Election SettingsEither accept the current active (textbook answer — fewer failovers = fewer risks), or enable Preemptive on both peers; the recovered box waits out the preemption hold (default 1 min) and takes over.
System log filter ( subtype eq ha ) shows the preempt event; show high-availability state reports local: active.
Pause & Predict
Monitoring suddenly shows BOTH firewalls of an HA pair claiming the active role at the same time. Which single HA link failing produces exactly this symptom? Type your guess.
Config-sync green only proves the configuration replicated. In a real PA-7050 outage, failover "worked" — and the internet died anyway, because the passive box's links had never actually negotiated LACP and no path monitoring tested its forwarding. Senior habit: do a live failover test monthly, and configure link + path monitoring so the pair fails over on real-world brokenness, not just dead boxes.
Fire 2 — "nothing changed, everything broke"
A real P1 from the community: at 2 AM a scheduled content update activated a new App-ID called citrix-director. From that moment, traffic that had always classified as ms-sql matched the new app instead — and every rule that allowed only ms-sql silently dropped the bank's database traffic. Zero config change. The config audit proved nothing was touched — because the thing that changed was the App-ID database, not the config. The vendor shipped a corrected content release the same day; the immediate fix is request content downgrade install previous.
Same family: a commit that fails with 'tiktok' is not a valid reference on a box nobody touched — the config references an App-ID that the installed content version doesn't contain. Content state is config state. Review new App-IDs before each install, and stage content on a small firewall first.
Fire 3 — decryption breaks one app, and only one
After enabling SSL decryption, most traffic is fine but one app dies with Received fatal alert CertificateUnknown from client. The trap: this single log line has at least three distinct causes — an incomplete certificate chain (only an intermediate in the cert store), certificate pinning (the app needs a Decryption Exclusion, matched on SNI or CN), and in 2024 a genuine PAN-OS bug with Chromium's oversized post-quantum ClientHello. TLS 1.3 raises the stakes: certificate info is encrypted in-handshake, so the firewall can no longer auto-add exclusions the way it did on TLS 1.2. Check Monitor → Logs → Decryption with (err_index eq Certificate) before touching any certificate.
Fire 4 — the CVE drops before your patch exists
May 2026, real timeline: CVE-2026-0300 — a buffer overflow in the User-ID Authentication (Captive) Portal. CVSS 9.3, unauthenticated root RCE, exploited in the wild, listed in CISA KEV — and patches rolled out branch-by-branch over two weeks. If your branch's fix isn't out, "wait for the patch" is not an answer. The defensible move is layered mitigation tonight: restrict the portal to trusted zones, disable response pages on untrusted L3 interfaces, enable Threat ID 510019, audit exposure of TCP 6081/6082 — then patch the moment your build ships. Same month, CVE-2026-0257 (GlobalProtect auth-override cookie bypass) was re-scored 4.7 → 7.8 after live exploitation: severity is not static, and "we deprioritized it last week" is how breaches start.
After any CVE mitigation, prove exposure is gone from outside: scan your public IPs for the vulnerable ports (6081/6082 for the Auth Portal), confirm the Threat Prevention signature is firing in Monitor → Logs → Threat, and document the compensating controls. The patch removes the vulnerability — it does not remove an implant that arrived before it. CVE-2024-3400 taught everyone that "patched" and "safe" are different words.
Vikram at Airtel runs an active/passive pair. HA widget: green. Config sync: green for six months. During a planned failover test the passive takes over — but its aggregate links to the core never come up, and a 9-minute outage follows. Why did the green status mislead the team?
The exam connection — what changed in 2025-26
PCNSE retired on July 31, 2025. The flagship firewall cert is now the NGFW-Engineer (Specialist level): ~75 questions, 90 minutes, pass mark 860/1000, domains weighted 40% networking + 40% device settings + 20% integration & automation. The stems read exactly like this lesson — "an engineer deploys X but Y fails; which configuration should be verified first?" — and India L2 panels drill the same six themes: policy-but-blocked, U-turn NAT, decryption side-effects, HA failover, packet flow, and Panorama push order (pre-rules → local → post-rules). Prep once, pass both.
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from Palo Alto docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: why can a session show action allow and still be a routing problem? Then compare to the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- App-ID
- Palo Alto's engine that identifies which application a flow is from its payload — not from its port number.
- application-default
- Service setting meaning "this app only on the ports Palo Alto defined for it" (ssl = 443, web-browsing = 80).
- Session end reason
- The log column that says why a session ended — aged-out, tcp-fin, tcp-rst-from-server, policy-deny.
- Session table
- The firewall's live table of all current connections, with state, byte counts and the identified app.
- U-turn NAT
- Letting internal users reach an internal/DMZ server via its public IP — a NAT rule that hairpins traffic back inside.
- Preemption
- HA setting that lets a recovered higher-priority firewall take the active role back. Off by default; must be enabled on both peers.
- Split brain
- Both HA peers believing they are active at once — the classic result of losing the HA1 control link.
- ESP-in-UDP
- How GlobalProtect carries IPSec — ESP wrapped in UDP 4501, with no IKE negotiation at all.
- SSL decryption
- SSL Forward Proxy — the firewall terminates the client's TLS, inspects the traffic, then re-encrypts it with a certificate forged from its own CA.
- Certificate pinning
- An app that only accepts its server's exact baked-in certificate — it can never be decrypted, only excluded.
- SNI
- Server Name Indication — the hostname in the TLS Client Hello; decryption exclusions match on SNI or certificate CN.
- Content update
- The Apps & Threats package that updates App-ID and threat signatures — it can reclassify traffic with zero config change.
- CISA KEV
- CISA's Known Exploited Vulnerabilities catalog — a CVE listed here is being used in real attacks right now.
📚 Sources
- Palo Alto Networks Docs — Test Policy Rule Traffic Matches; Session Settings and Timeouts; HA Timers; Device Priority and Preemption; Ports Used for GlobalProtect; TLSv1.3 Decryption Support. docs.paloaltonetworks.com
- Palo Alto Networks Knowledge Base — Global counters with delta; asymmetric routing & TCP SYN checks; confirm GP tunnel IPSec vs SSL; debug swm revert; content-version install error. knowledgebase.paloaltonetworks.com
- Palo Alto LIVEcommunity threads — "insufficient-data but still allowed"; "application incomplete when using NNTPS"; "Received fatal alert CertificateUnknown"; "traffic cannot return"; "HA failover issue on PA-3420, both nodes active". live.paloaltonetworks.com
- r/paloaltonetworks — MS-SQL reclassified by content update 8656-7766; PA-3440 11.1.6-h4 HA failure; "protips you wish someone told you" (pre-NAT IP / post-NAT zone). reddit.com/r/paloaltonetworks
- Palo Alto Networks Security Advisories — CVE-2026-0300 (Auth Portal RCE, exploited in the wild) and CVE-2026-0257 (GP auth-override cookie bypass, re-scored 4.7→7.8). security.paloaltonetworks.com
- Palo Alto Networks Education — NGFW-Engineer exam datasheet (Nov 2025): domains 40/40/20, ~75 questions, 90 minutes; PCNSE retirement July 31, 2025. paloaltonetworks.com/services/education
- Hirist Tech & Network Kings — Top Palo Alto interview questions asked in Indian L2 panels (U-turn NAT, policy-but-blocked, SSL decryption slowness). hirist.tech · nwkings.com
What's next?
You can now read the four log words, climb the 5-step ladder and defend a CVE-night decision. Next, pressure-test it: the 3 AM playbook — what actually breaks on Palo Alto firewalls in production, failure by failure.