TTechclickAll lessons
Check Point · Logging & Troubleshooting · CLI PlaybookInteractive · L2 / L3

Check Point Logging — SmartLog, cpview, and the 60-Second Drop-to-Root-Cause Playbook

A user opens a P1 ticket: "VPN intermittent, drops every 20 minutes". You have 60 seconds to either nail it or punt. The CLI toolkit — cpview, fw ctl zdebug + drop, fw monitor, cpinfo — turns 4-hour troubleshooting into 60-second triage. Pick a tool below, watch a drop become root cause, master the playbook in 12 minutes.

📅 2026-05-26·⏱ 12 min · 5 SVG infographics + 1 animated drop-to-RCA·🏷 10-Q assessment + AI Tutor

Pick a tool — jump straight to it

1

SmartLog

The GUI — first 30 seconds of every ticket.

2

fw ctl zdebug

Live drop tracing on the gateway. Real-time triage.

3

cpview

Live counters — CPU, conn-table, blade-by-blade load.

4

SIEM forwarding

LEA / Syslog / CEF — getting CP logs into Splunk/Sentinel.

The interview question that filters L1 from L2

Interview: "User says VPN drops every 20 min. Where do you start?"
L1 answer: "Check the tunnel". L2 answer: "(1) SmartLog: filter user IP + last 1 hour, look for the drop entries — what rule + layer matched? (2) On the gateway, cpview → VPN → encrypt/decrypt errors counter. (3) If counters show errors, vpn debug ikeon → reproduce → open ike.elg in IKEView. (4) If counters are clean and SmartLog shows clean accepts, fw ctl zdebug + drop | grep <user-ip> to see kernel-level drops. (5) cpinfo only if escalating to TAC." The order matters; that's the playbook.

💡 The hospital triage analogy

An ER patient walks in. The triage nurse looks at vitals (SmartLog — surface-level event search). If something jumps out, the doctor orders blood work (cpview — live system counters). If still unclear, advanced imaging (fw ctl zdebug, fw monitor — kernel-level instrumentation). Only when stumped → call the consultant (cpinfo → TAC). Each level costs more time and CPU. Stop at the level that gave you the answer.

① SmartLog — the first 30 seconds

SmartLog (formerly SmartView Tracker) is the indexed log search. Free-text query bar, time picker, filter chips. Powered by an indexer that runs on the Log Server.

Check Point logging architecture Gateways send logs over SIC to Log Server which indexes and forwards to SmartLog and SIEM via LEA/Syslog. Check Point logging architecture GW-Mumbaicplog buffer + send GW-Bengalurucplog buffer + send GW-Delhicplog buffer + send SIC (TLS) Log Servercpd indexer$FWDIR/log/fw.logSmartLog SOLR index SmartLog (GUI)indexed search LEA / Syslog→ SIEM (Splunk/QRadar) Log Exporter (CEF)→ Azure Sentinel / cloud
Figure 1 — Logging architecture. Gateways buffer + send to Log Server over SIC. Log Server indexes (SmartLog) and exports (LEA/Syslog/CEF) to SIEMs.

SmartLog search syntax that earns L2 stripes

SmartLog query examples (top bar)
src:10.20.5.50 AND action:Drop AND time:last_1_hour
src:10.20.5.50 AND blade:"Application Control" AND action:Drop
"layer name":Application AND rule:"Block-Social-Media"
host:GW-Mumbai AND severity:critical AND time:last_24_hours

SmartLog top-of-screen filter chips can also be clicked to add same condition without typing.

② fw ctl zdebug — the live kernel-drop tracer

SmartLog only shows what the log policy decided to LOG. Some kernel drops never make it to the log (anti-spoofing, malformed packet, state-table miss). For those you need fw ctl zdebug — runs a kernel debug ring buffer in real time, prints drops with reason.

Live drop trace — gateway expert mode
# Watch all drops in real time
fw ctl zdebug + drop

# Filter for one source IP
fw ctl zdebug + drop | grep 10.20.5.50

# Watch a specific connection
fw ctl zdebug + drop | grep -E "10.20.5.50.*443"
Sample output
;[cpu_3];[fw4_0];fw_log_drop_ex: Packet proto=6 10.20.5.50:51844 -> 157.240.7.35:443
   dropped by fw_first_packet_state_checks Reason: Rule;
Common mistake — leaving zdebug on in production

fw ctl zdebug burns CPU. ALWAYS stop with fw ctl zdebug 0 after triage. On busy gateways, leaving it on can drop legit traffic.

The 4 most useful zdebug modules

🚫
+ drop
tap to flip

Most useful. Shows every kernel-level drop with reason: Rule / Anti-spoofing / State-mismatch / TCP-flag invalid / NAT-fail.

🔁
+ conn
tap to flip

Shows connection-creation events. Useful when you see logs say "Accept" but app behaves like the connection is broken — check if conn entry was created.

🔐
+ nat
tap to flip

Shows NAT decisions in real time. When "static NAT works inside but not outside", this tells you whether the NAT rule even fired.

🧠
+ cluster
tap to flip

ClusterXL state changes. Shows when a member transitions Active→Standby (or vice versa) and why — interface monitor / pnote / process state.

▶ Watch a drop become root-cause in 60 seconds

Rahul reports "VPN works for some apps, not all". Real-time triage from ticket open to RCA.

① 14:30 — TICKET"VPN gives access to my Salesforce but not to internal HR portal. Started this morning."
② 14:31 — SMARTLOGFilter src:rahul-vpn-IP AND time:last_30_min AND action:Drop. Finds 3 drop entries. Matched rule = "Cleanup-Drop", Layer = "Application". App layer dropping.
③ 14:32 — HYPOTHESISHR portal traffic not in any explicit Application-layer Allow rule. Hits implicit cleanup. Either rule missing OR rule install didn't reach this gateway.
④ 14:33 — VERIFYSmartConsole: confirm Application layer has rule "Allow VPN → HR-Portal". Install policy log — yes, installed at 02:00 last night. So rule exists.
⑤ 14:34 — ROOT CAUSERule's Source = Access Role "VPN-Users". Rahul is in AD group "Contractors-VPN", which isn't in the role. Identity Awareness mis-mapping. Fix: add his AD group to the Access Role, publish, install. 5 min to fix.
Press Play to watch a real ticket triaged in 5 minutes.
Quick check · Q1 of 10

SmartLog shows zero drop entries for the affected source IP — but the user clearly can't reach the destination. Most likely cause?

Correct: c. SmartLog only shows logged drops — i.e., drops the policy was configured to log. Anti-spoofing + state-mismatch + TCP-flag drops often don't log by default. zdebug is the kernel-level x-ray.
Troubleshooting escalation pyramid Pyramid from quick to expensive — SmartLog, cpview, zdebug, fw monitor, cpinfo+TAC. Escalation pyramid — start cheap, climb only if needed cpinfo + TAC fw monitor fw ctl zdebug cpview SmartLog CHEAP · FAST EXPENSIVE · LAST RESORT Each layer 5× more diagnostic; stop at the level that answered the question.
Figure 3 — Troubleshooting escalation pyramid. 80% of tickets close at SmartLog. 15% at cpview. 4% at zdebug + fw monitor. 1% at cpinfo + TAC.

③ cpview — live counters that tell you where the pain is

cpview is a real-time top-style dashboard. Press number keys to navigate categories: Overview, CPU, Memory, Network, NAT, VPN, Firewall, SecureXL, Software Blades.

cpview keys + history
cpview                # interactive, real-time
cpview -t            # text-only, scriptable
cpview -s            # static snapshot
cpview --history    # historical mode (last 24h, drill in time)
cpview category map cpview top-level categories with what each shows. cpview — live counters, key-driven navigation [1] OverviewCPU, Mem, throughputFirst key you press [2] CPUPer-core load, instancesCoreXL instance balance [3] NetworkConn table, ports, dropsconn-table at capacity? [4] SecureXLF2F/Slow/Acceler %why SXL bypassing? [5] NATNAT table, exhaustionport exhaustion? [6] VPNEncrypt/decrypt errPhase 1+2 errors [7] Threat PrevIPS/AB/AV hitsblade-by-blade load [8] Software Bladesper-blade CPU + memwhich blade is the hog? Press number to jump, ESC to back, --history for time-travel
Figure 2 — cpview category map. Each category is a number key. --history gives 24h time-travel for "when did CPU spike" questions.

The 5 most useful CLI checks (in order)

L2 cheat-sheet (run from gateway expert mode)
# 1. What policy is installed + when?
fw stat -l

# 2. CPU + connection-table + blade health
cpview

# 3. Live kernel-level drops
fw ctl zdebug + drop | grep <source-IP>

# 4. Packet-level inspection with NAT visibility (4 stages)
fw monitor -e 'accept host(10.20.5.50);' -m iIoO -o /tmp/dbg.cap

# 5. Cluster member state
cphaprob stat
cphaprob list

④ SIEM forwarding — LEA / Syslog / CEF

Three options to ship logs to your SIEM (Splunk / QRadar / Sentinel / Elastic):

SIEM forwarding option comparison 3-column comparison of LEA, Syslog, and Log Exporter. LEA (proprietary) Syslog Log Exporter ✓ Native CP, real-time✓ Lossless✓ Splunk Add-on uses this✗ Requires LEA SDKUse forSplunk Enterprise ✓ Universal protocol✓ Trivial setup✗ Lossy under load✗ Fewer fieldsUse forLegacy / small fleets ✓ R80+ canonical✓ CEF / LEEF / Splunk fmt✓ Cloud-SIEM ready✓ TLS supportUse forSentinel / QRadar / Elastic
Figure 3 — SIEM forwarding options. Log Exporter is the right default in 2026. LEA for Splunk with native add-on. Syslog only for legacy.
Configure Log Exporter — CEF to Azure Sentinel
cp_log_export add name SentinelExporter target-server 10.50.20.5 target-port 514 \
  protocol tcp format cef
cp_log_export reload
cp_log_export show
Quick check · Q2 of 10

Priya needs to ship Check Point logs to Azure Sentinel. Which forwarder?

Correct: c. Log Exporter is the R80+ canonical way. CEF is Sentinel's preferred format. LEA is for Splunk's native add-on; Syslog is lossy.

cpinfo — only when escalating to TAC

cpinfo -y all generates a full-system tarball — config + kernel state + logs + version info. Large (50+ MB) but standard TAC asks "send cpinfo". Don't run it casually; it's heavy on the system. Run only when ticketing TAC.

🤖 Ask the AI Tutor

Tap any question — instant context-aware answer.

Deeper questions → chat.techclick.in.

The 5 troubleshooting habits that mark a senior engineer

Habit 1 — Start at the cheapest tool, escalate

SmartLog → cpview → zdebug → fw monitor → cpinfo. Each level costs more. Stop at the level that gave you the answer.

Habit 2 — Always turn off zdebug

fw ctl zdebug 0 after every session. Production gateways with zdebug left on have shipped P1 incidents.

Habit 3 — Time-correlate everything

SmartLog timestamp + audit log timestamp + change-management ticket. "It broke" + "you installed policy at exactly that time" = root cause in 30 seconds.

Habit 4 — Save your SmartLog queries

Save the canonical 5 (per-user-drops, per-IP-VPN-events, per-blade-detections, per-cluster-member-states, install-policy-failures). Click instead of typing during P1s.

Habit 5 — Run cpview --history before opening TAC

Shows the 24h trend that the snapshot misses. "CPU went 30%→95% at 14:02 — same time as the install" = the answer.

📝 Check your understanding — 10 questions, 70% to pass

Q1–Q2 above already count. Below are Q3 to Q10.

Q3 of 10 · Remember

Which CLI tool gives a live, key-driven dashboard of CPU, memory, connection table, per-blade load + 24h history mode?

Correct: b. cpview is the canonical live-counters tool. fw stat = policy status only. top = OS-level CPU. cpinfo = full tarball for TAC.
Q4 of 10 · Apply

SmartLog shows zero entries for user IP 10.20.5.50 in the last hour. User clearly can't reach Salesforce. Next step?

Correct: a. Zero SmartLog entries means the policy didn't log it — either it's matched but not logged, or kernel dropped it without reaching policy. zdebug surfaces both. (b) is over-escalating. (c/d) skip diagnosis.
Q5 of 10 · Apply

Aditya wants to forward Check Point logs to Azure Sentinel. Best forwarder + format?

Correct: b. Log Exporter + CEF is the R80+ canonical path to Sentinel. LEA is for Splunk's native add-on. Syslog is lossy.
Q6 of 10 · Analyze

Sneha sees a "static NAT works inside but external clients can't reach it" issue. SmartLog shows zero drops. What CLI sequence narrows the cause?

Correct: c. The classic Manual-NAT-without-Proxy-ARP scenario. The 3-step CLI sequence isolates it from layer 2 ARP up through the NAT engine.
Q7 of 10 · Analyze

Cluster member transitions Active→Standby unexpectedly at 14:02. Two minutes later it goes Active again. What's the diagnostic sequence?

Correct: a. The canonical 4-step. cphaprob is the cluster oracle. cpview --history time-travels to the incident moment. zdebug+cluster catches a flapper in real time.
Q8 of 10 · Analyze

cpview shows 95% CPU. Drilling into [8] Software Blades reveals 70% of CPU on "HTTPS Inspection". What's the FIRST fix to try?

Correct: d. Bypass discipline + SXL on are the single biggest HTTPS-I performance levers. (a) loses protection. (b) capex without diagnosis. (c) doesn't help HTTPS-I.
Q9 of 10 · Evaluate

For a 5000-user enterprise SOC, which logging architecture is right?

Correct: b. Senior multi-layer architecture. Dedicated Log Server avoids mgmt-server overload. Log Exporter + SmartLog gives both SIEM and L1-quick-look. Compliance retention is non-negotiable in BFSI/healthcare.
Q10 of 10 · Evaluate

Post-CVE-2024-24919, what logging hygiene matters most?

Correct: a. Senior hygiene. Defense-in-depth: SIEM forwarding + admin-auth monitoring + crash alerts + compliance retention + KEV-aligned patching.
Lesson complete — score saved to your profile.
Score below 70%. Re-read the section you got wrong.

Next up — Check Point ClusterXL Deep-Dive

Now you can read the logs. Next: HA vs Load Sharing, CCP, MAC magic, sync internals.

Sources cited inline

  1. R81 Logging & Monitoring Admin Guide
  2. sk31616 — cpview reference
  3. sk30583 — fw ctl zdebug usage
  4. sk122323 — Log Exporter
  5. sk100395 — Best practices for Logging
  6. sk182336 — CVE-2024-24919 Hotfix
  7. CheckMates — Log Exporter CEF to Sentinel
  8. CCSE R81.20 Syllabus