The interview question that filters L1 from L2
Interview: "User says VPN drops every 20 min. Where do you start?"
L1 answer: "Check the tunnel". L2 answer: "(1) SmartLog: filter user IP + last 1 hour, look for the drop entries — what rule + layer matched? (2) On the gateway, cpview → VPN → encrypt/decrypt errors counter. (3) If counters show errors, vpn debug ikeon → reproduce → open ike.elg in IKEView. (4) If counters are clean and SmartLog shows clean accepts, fw ctl zdebug + drop | grep <user-ip> to see kernel-level drops. (5) cpinfo only if escalating to TAC." The order matters; that's the playbook.
💡 The hospital triage analogy
An ER patient walks in. The triage nurse looks at vitals (SmartLog — surface-level event search). If something jumps out, the doctor orders blood work (cpview — live system counters). If still unclear, advanced imaging (fw ctl zdebug, fw monitor — kernel-level instrumentation). Only when stumped → call the consultant (cpinfo → TAC). Each level costs more time and CPU. Stop at the level that gave you the answer.
① SmartLog — the first 30 seconds
SmartLog (formerly SmartView Tracker) is the indexed log search. Free-text query bar, time picker, filter chips. Powered by an indexer that runs on the Log Server.
SmartLog search syntax that earns L2 stripes
src:10.20.5.50 AND action:Drop AND time:last_1_hour src:10.20.5.50 AND blade:"Application Control" AND action:Drop "layer name":Application AND rule:"Block-Social-Media" host:GW-Mumbai AND severity:critical AND time:last_24_hours
SmartLog top-of-screen filter chips can also be clicked to add same condition without typing.
② fw ctl zdebug — the live kernel-drop tracer
SmartLog only shows what the log policy decided to LOG. Some kernel drops never make it to the log (anti-spoofing, malformed packet, state-table miss). For those you need fw ctl zdebug — runs a kernel debug ring buffer in real time, prints drops with reason.
# Watch all drops in real time fw ctl zdebug + drop # Filter for one source IP fw ctl zdebug + drop | grep 10.20.5.50 # Watch a specific connection fw ctl zdebug + drop | grep -E "10.20.5.50.*443"
;[cpu_3];[fw4_0];fw_log_drop_ex: Packet proto=6 10.20.5.50:51844 -> 157.240.7.35:443 dropped by fw_first_packet_state_checks Reason: Rule;
fw ctl zdebug burns CPU. ALWAYS stop with fw ctl zdebug 0 after triage. On busy gateways, leaving it on can drop legit traffic.
The 4 most useful zdebug modules
Most useful. Shows every kernel-level drop with reason: Rule / Anti-spoofing / State-mismatch / TCP-flag invalid / NAT-fail.
Shows connection-creation events. Useful when you see logs say "Accept" but app behaves like the connection is broken — check if conn entry was created.
Shows NAT decisions in real time. When "static NAT works inside but not outside", this tells you whether the NAT rule even fired.
ClusterXL state changes. Shows when a member transitions Active→Standby (or vice versa) and why — interface monitor / pnote / process state.
▶ Watch a drop become root-cause in 60 seconds
Rahul reports "VPN works for some apps, not all". Real-time triage from ticket open to RCA.
src:rahul-vpn-IP AND time:last_30_min AND action:Drop. Finds 3 drop entries. Matched rule = "Cleanup-Drop", Layer = "Application". App layer dropping.SmartLog shows zero drop entries for the affected source IP — but the user clearly can't reach the destination. Most likely cause?
③ cpview — live counters that tell you where the pain is
cpview is a real-time top-style dashboard. Press number keys to navigate categories: Overview, CPU, Memory, Network, NAT, VPN, Firewall, SecureXL, Software Blades.
cpview # interactive, real-time cpview -t # text-only, scriptable cpview -s # static snapshot cpview --history # historical mode (last 24h, drill in time)
The 5 most useful CLI checks (in order)
# 1. What policy is installed + when? fw stat -l # 2. CPU + connection-table + blade health cpview # 3. Live kernel-level drops fw ctl zdebug + drop | grep <source-IP> # 4. Packet-level inspection with NAT visibility (4 stages) fw monitor -e 'accept host(10.20.5.50);' -m iIoO -o /tmp/dbg.cap # 5. Cluster member state cphaprob stat cphaprob list
④ SIEM forwarding — LEA / Syslog / CEF
Three options to ship logs to your SIEM (Splunk / QRadar / Sentinel / Elastic):
- LEA (Log Export API) — Check Point proprietary, native, real-time. Splunk's CP Add-on uses LEA. Lossless. Requires LEA SDK on the SIEM-side.
- Syslog — old-school. Less rich. Forwarded via
cp_log_export. Simple but fields can drop. - Log Exporter (CEF / Splunk / LEEF / generic) — Check Point's R80+ canonical exporter. Format selectable per destination. Cloud SIEM (Azure Sentinel) uses CEF. Best of both.
cp_log_export add name SentinelExporter target-server 10.50.20.5 target-port 514 \ protocol tcp format cef cp_log_export reload cp_log_export show
Priya needs to ship Check Point logs to Azure Sentinel. Which forwarder?
cpinfo — only when escalating to TAC
cpinfo -y all generates a full-system tarball — config + kernel state + logs + version info. Large (50+ MB) but standard TAC asks "send cpinfo". Don't run it casually; it's heavy on the system. Run only when ticketing TAC.
🤖 Ask the AI Tutor
Tap any question — instant context-aware answer.
Deeper questions → chat.techclick.in.
The 5 troubleshooting habits that mark a senior engineer
SmartLog → cpview → zdebug → fw monitor → cpinfo. Each level costs more. Stop at the level that gave you the answer.
fw ctl zdebug 0 after every session. Production gateways with zdebug left on have shipped P1 incidents.
SmartLog timestamp + audit log timestamp + change-management ticket. "It broke" + "you installed policy at exactly that time" = root cause in 30 seconds.
Save the canonical 5 (per-user-drops, per-IP-VPN-events, per-blade-detections, per-cluster-member-states, install-policy-failures). Click instead of typing during P1s.
Shows the 24h trend that the snapshot misses. "CPU went 30%→95% at 14:02 — same time as the install" = the answer.
📝 Check your understanding — 10 questions, 70% to pass
Q1–Q2 above already count. Below are Q3 to Q10.
Which CLI tool gives a live, key-driven dashboard of CPU, memory, connection table, per-blade load + 24h history mode?
SmartLog shows zero entries for user IP 10.20.5.50 in the last hour. User clearly can't reach Salesforce. Next step?
Aditya wants to forward Check Point logs to Azure Sentinel. Best forwarder + format?
Sneha sees a "static NAT works inside but external clients can't reach it" issue. SmartLog shows zero drops. What CLI sequence narrows the cause?
Cluster member transitions Active→Standby unexpectedly at 14:02. Two minutes later it goes Active again. What's the diagnostic sequence?
cpview shows 95% CPU. Drilling into [8] Software Blades reveals 70% of CPU on "HTTPS Inspection". What's the FIRST fix to try?
For a 5000-user enterprise SOC, which logging architecture is right?
Post-CVE-2024-24919, what logging hygiene matters most?
Next up — Check Point ClusterXL Deep-Dive
Now you can read the logs. Next: HA vs Load Sharing, CCP, MAC magic, sync internals.