Mukesh — network engineer at a Hyderabad payment-gateway customer is told the payments client cannot tolerate even a 2-second TCP drop during firmware upgrades. What's the single most important toggle to validate before kicking off the firmware push?

Correct: a. Session-pickup is the single switch that determines whether existing TCP / UDP sessions survive an HA failover. Without it, the upgrade-failover at the midpoint will drop existing flows. Option b changes HA mode (orthogonal). Option c configures monitored interfaces (also useful, but not the answer). Option d is what you run if checksums mismatch — not a pre-upgrade gate.

Mukesh wants FortiGate-A to be primary whenever it is healthy — even after A reboots and B has been running longer. Which exact combination does he configure?

Correct: c. Without override, priority is checked but uptime breaks ties and effectively wins forever (the longer-running unit stays primary). With override enabled on BOTH units + asymmetric priority values, priority becomes decisive and A will pre-empt B when A is healthy. Option a misses override. Option b is not a real CLI. Option d doesn't exist.

Mukesh's CFO asks for "twice the firewall throughput by moving to Active-Active." Mukesh's traffic mix is 80% large Veeam backup flows and 20% small DNS. Which response is correct?

Correct: b. The throughput myth — interviewers love this. Each TCP flow is owned by exactly one cluster member. Large flows do not span units. A-A is a session-count load-share, not a bandwidth doubler. With Mukesh's 80% large-flow workload, A-A buys little to nothing — the right answer is a bigger model or workload segmentation.

After a 2 AM rack maintenance, Mukesh sees TCP RSTs hitting the payment app. He SSHes the cluster's mgmt IP and lands on FortiGate-A. get system ha status on A says "Number of Members: 1". What is the most likely diagnosis, and what's the safe next move?

Correct: a. "Number of Members: 1" + payment-side RSTs = strong split-brain signal. Confirm via the peer's reserved HA mgmt interface — that's literally why it exists. Options b, c, and d destroy evidence and can make the outage worse (b causes another failover; c removes the HA stack mid-incident; d wipes the unit). Always confirm before mutating production state.

Mukesh notices the cluster shows "in-sync" but the new firewall policy he added yesterday on the primary doesn't appear on the secondary. diag sys ha checksum shows the firewall.policy section's checksum differs between members. What does this mean, and what's the fix?

Correct: d. Checksum mismatch is the definitive signal of config drift, even when "in-sync" appears in get system ha status . Force-sync from primary is the first move; if it doesn't converge, the cause is usually HBdev hello drops or a fgfmd / sync process stuck on the secondary. Option a is a hammer that may not help. Option b is wrong — checksums are the ground truth. Option c is destructive and unnecessary at this stage.

A new engineer proposes: "To save a rack-U, we can put HBdev over the production LAN VLAN instead of cross-cabling — same broadcast domain, same packets." Evaluate this proposal.

Correct: c. HBdev availability is the single point your cluster's correctness rests on. Sharing it with user traffic invites two failures: (1) hello drops under load → false failover or split-brain, and (2) lateral exploit surface (CVE-2024-23113 lesson). Best practice is direct cross-cable, or — when impractical — a dedicated VLAN that no user traffic can reach. Anything else trades a rack-U for a P0 outage waiting to happen.

Mukesh has a 2-node A-P cluster with session-pickup enable and uninterruptible-upgrade enable . He wants to push FortiOS 7.6.2 with the LEAST risk to payment traffic. Pick the safest sequence.

Correct: b. The uninterruptible-upgrade flow IS the safe path — FortiOS orchestrates secondary-first, mid-upgrade failover, then primary-upgrade automatically. The two essential pre-flight checks are checksum (no drift) + ha status (healthy). Option a creates unnecessary outage. Option c destroys the cluster and adds risk. Option d leaves you with split-version cluster and unpredictable behaviour.

FortiGate High Availability — A-P vs A-A, Split-Brain Recovery, FGCP Deep-Dive

Q: Mukesh — network engineer at a Hyderabad payment-gateway customer is asked: "What is the role of HBdev in a FortiGate FGCP cluster?"

Correct: d. HBdev is the FGCP backplane — hellos at 200 ms intervals, config sync, and session-sync when session-pickup is on. Losing it = peer declared dead = election fires = potential split-brain. Options a, b, c describe completely different interfaces (mgmt, WAN failover, log shipper). Best practice: always cable two HBdevs (ha1 + ha2) so a single cable cut doesn't trigger failover.

Q: Mukesh's payments client says "we cannot lose a single TCP session during failover." Which one setting must Mukesh enable on the cluster?

Correct: b. Session-pickup is OFF by default — toggle it on so the primary streams its live session table to the secondary over HBdev. After failover, the new primary has the session entries and existing TCP/UDP flows survive. Option a enables a reserved management interface (useful but unrelated). Option c controls election behaviour, not session survival. Option d changes cluster mode and does not by itself preserve sessions.

Q: Mukesh suspects split-brain. He SSHes into FortiGate-A and runs get system ha status . The output says Number of Members: 1 . What is the next single command that confirms — or rules out — split-brain?

Correct: c. Split-brain by definition means BOTH units claim primary in isolation. You confirm by checking the peer separately — over the reserved HA management interface (which is exactly why set ha-mgmt-status enable is best practice). Option a destroys evidence and may make recovery worse. Option b checks data-plane but doesn't tell you cluster role. Option d is a policy-match tool, unrelated.

Content-specific feature visual for this lesson: use it as the 60-second map before reading the full detail.

Why this matters — the pilot and the co-pilot

Imagine a Boeing 777 doing the Mumbai–Singapore route. There's a pilot flying and a co-pilot on the right seat. The co-pilot isn't a passenger — both hands are on the yoke, both pairs of eyes on the instruments. If the pilot has a heart attack mid-flight, the co-pilot doesn't need to be "called from the back". Control transfers in seconds. The passengers in 23F never notice.

That's exactly what a FortiGate FGCP cluster does for the network. Two physical firewalls, one virtual identity. One actively pushing traffic ("primary"), the other quietly verifying config + watching the heartbeat ("secondary"). When the primary fails, the secondary doesn't reboot, doesn't relearn — it takes the controls in under 2 seconds. Users keep their TCP sessions. Voice calls don't drop. The payments client doesn't see 5xx.

That single image — pilot and co-pilot, both hands on the yoke — is what every interviewer is testing for when they ask "explain HA to me." Get it right and the next 10 minutes are yours.

Scenario · Mukesh — network engineer at a Hyderabad payment-gateway customer

Mukesh (L2 network engineer, 14 months in) is deploying a 2-node FGCP cluster for a payments client. Two FortiGate 200F units, racked side-by-side. Internal subnet 10.30.10.0/24 behind the cluster's virtual IP 10.30.0.1. WAN side on 203.0.113.20. HA1 and HA2 ports cross-cabled between the two boxes for the heartbeat. The deployment plan says "A-P, primary on the left, RPO=0".

Mukesh's senior throws him three questions before signoff: "What happens if the heartbeat cable falls out?" · "If I add a second WAN circuit, should you change to A-A?" · "How do you upgrade firmware without dropping the payments traffic?" Three good questions. Three classic interview rounds. We answer all three in this blog.

The three building blocks every interviewer wants you to name

Before we dive in, here are the three terms you will be asked to define out loud — separated, in plain English. Get these crisp and you've already passed the L1 round.

HBdev (heartbeat interface) — the cable (or VLAN) between the two FortiGates carrying the FGCP hellos. Both hands on the yoke.
Virtual MAC — the shared layer-2 address the cluster announces on each monitored interface, so failover doesn't break the upstream switch's MAC table.
FGCP election — the algorithm that picks the primary when the cluster forms or when something breaks.

Legend primary FortiGate unit secondary / standby unit HA heartbeat link shared virtual MAC / virtual IP

Figure 1. A textbook 2-node FGCP cluster. HA1 + HA2 cross-cabled between the two physical units carry FGCP hellos and config-sync. The internal interface advertises one shared virtual MAC + virtual IP that both members own — but only the primary actively responds.

FGCP election in 30 seconds

When the cluster forms (or reforms after a failure), FGCP picks a primary in this strict order — first tie-breaker wins:

Monitored interface health — a unit with a failed monitored interface (e.g. wan1 down) can never become primary.
HA priority — higher wins (default 128 on every unit; you set it explicitly in your config).
Uptime — longer-running unit wins, but ONLY when set override is disabled (default).
Serial number — lexicographic; lower wins as a last resort.

Pro tip — pin a specific unit as primary

set ha-uptime override is NOT a thing — the right trick is set override enable on the unit you want to win, then set its priority higher than its peer's. This is the only reliable way to pin a specific unit as primary across reboots, because with override enabled, priority now beats uptime in the election. Without override, the unit that booted first keeps winning forever — which is fine, until you reboot it and the "wrong" unit takes over.

The cluster from CLI — what you actually type

Here's the minimal HA config Mukesh runs on both units (they MUST match group-name and password — that's how they find each other on the HA link):

CLI — bring up FGCP (run on both members)

config system ha
    set group-name "payments-cluster"
    set mode a-p
    set password ENC SuperLongHaSecret_2026
    set hbdev "ha1" 200 "ha2" 100
    set session-pickup enable
    set ha-mgmt-status enable
    config ha-mgmt-interfaces
        edit 1
            set interface "mgmt"
            set gateway 10.30.99.1
        next
    end
    set override enable
    set priority 200          # set 100 on FortiGate-B
    set monitor "wan1" "internal"
end

Expected output (after sync completes) — get system ha status

HA Health Status: OK
Model: FortiGate-200F
Mode: HA A-P
Group Name: payments-cluster
Number of Members: 2
Cluster Uptime: 0 days 0:14:12
Primary  : FortiGate-A FGT200F-0001 priority=200
Secondary: FortiGate-B FGT200F-0002 priority=100
HBDEV stats: ha1 up · ha2 up · 0 lost hello packets
Configuration Status:
   FortiGate-A(updated 5s ago): in-sync
   FortiGate-B(updated 5s ago): in-sync

Two things to read in that output. First, HBDEV stats — both heartbeat links up, zero lost hellos. Second, Configuration Status — both members in-sync. If either line is off, you have a problem; we'll walk through both later.

💓

FGCP

tap to flip

FortiGate Clustering Protocol. Fortinet's proprietary HA protocol. Hellos every 200 ms by default. 5 missed hellos = peer declared dead → election starts.

🔌

Heartbeat (HBdev)

tap to flip

Dedicated link carrying FGCP hellos + config-sync + (with session-pickup on) session table sync. Always cable 2 HBdevs — one is single-point-of-failure.

📦

Session-pickup

tap to flip

Off by default. When ON, primary streams live session table to secondary over HBdev → failover keeps existing TCP/UDP sessions alive. Critical for VoIP / payments.

🔄

Config sync

tap to flip

Always on. Primary pushes config to secondary over HBdev. Verify with diag sys ha checksum — every section must match between members.

🏷

Virtual MAC

tap to flip

Shared L2 address per monitored interface (format 00:09:0f:09:XX:YY). After failover, the new primary sends a gratuitous ARP → upstream switch updates its CAM table in <1s.

📌

Override priority

tap to flip

set override enable + higher priority = this unit always wins the election. Without override, longest-uptime wins → "wrong" unit may stay primary forever after a reboot.

Pause & Predict #1

Before you scroll — Mukesh configures set priority 200 on FortiGate-A but forgets set override enable. He boots A first, then B. Months later, A reboots. When A comes back, who is primary?

FortiGate-B remains primary. Without override, FGCP uses uptime as the tie-breaker AFTER priority — and the election only re-runs when something forces it. A's higher priority doesn't pre-empt B because B has been continuously running. To make A win, Mukesh needed set override enable; only then does priority beat uptime in the election. Fix: enable override on both units, then run execute ha manage 0 + execute ha synchronize start.

Quick check · Q1 of 10 · Remember

Mukesh — network engineer at a Hyderabad payment-gateway customer is asked: "What is the role of HBdev in a FortiGate FGCP cluster?"

a) It is the management interface used to push policy from FortiManager b) The WAN-side failover link to a second ISP c) The interface used for FortiAnalyzer log shipping d) The dedicated link between cluster members that carries FGCP hellos, config sync, and (with session-pickup enabled) the session table — losing it triggers election and risks split-brain

Correct: d. HBdev is the FGCP backplane — hellos at 200 ms intervals, config sync, and session-sync when session-pickup is on. Losing it = peer declared dead = election fires = potential split-brain. Options a, b, c describe completely different interfaces (mgmt, WAN failover, log shipper). Best practice: always cable two HBdevs (ha1 + ha2) so a single cable cut doesn't trigger failover.

Failover in slow-mo — wan1 down, payments stay up

The most common interview ask after "explain HA" is "walk me through a failover, step by step". Most candidates say "primary fails, secondary takes over" — too vague. Here's the actual sequence Mukesh saw the day his client's left rack lost power.

Figure 2. Failover in 6 steps. Default total cut-over time is ~1-2 seconds. With session-pickup off (the default), existing TCP sessions drop but reconnect cleanly; with it on, your payments traffic doesn't even notice.

▶ Watch failover play out — 7 stages

Press Play to auto-advance, or Next to step manually. Stage 1 to 7 — primary dies on the left, secondary takes over on the right.

Stage 1 FortiGate-A (primary) is processing payment traffic. get system ha status → both members healthy.

▼

Stage 2 Power-strip on the left rack pops. FortiGate-A goes dark. wan1 + internal interfaces go down at the same instant.

▼

Stage 3 FortiGate-B stops receiving FGCP hellos on ha1 and ha2. It waits — 1 missed, 2 missed, 3 missed, 4 missed.

▼

Stage 4 5th hello missed (t = ~1.0 s). FortiGate-B declares peer dead. FGCP election runs — B's monitored interfaces are all UP, so B wins by default.

▼

Stage 5 FortiGate-B promotes itself to primary. Sends gratuitous ARP on wan1 and internal — "the virtual MAC now lives on my port".

▼

Stage 6 Upstream and downstream switches update their CAM tables. Next frame for the virtual MAC now physically leaves via the right rack's port.

▼

Stage 7 Payment TCP sessions resume (session-pickup ON → no reconnect). Total wire-time blackout: ~1.5 s. Mukesh's pager doesn't ring.

Press Play to watch the 7-stage failover, or step manually with Next.

Quick check · Q2 of 10

Mukesh's payments client says "we cannot lose a single TCP session during failover." Which one setting must Mukesh enable on the cluster?

a) set ha-mgmt-status enable b) set session-pickup enable — primary streams the live session table to the secondary over HBdev, so existing TCP/UDP sessions survive the cut-over c) set override enable d) set mode a-a

Correct: b. Session-pickup is OFF by default — toggle it on so the primary streams its live session table to the secondary over HBdev. After failover, the new primary has the session entries and existing TCP/UDP flows survive. Option a enables a reserved management interface (useful but unrelated). Option c controls election behaviour, not session survival. Option d changes cluster mode and does not by itself preserve sessions.

A-P vs A-A — and the throughput myth that costs candidates the job

The single most common Fortinet HA interview trap: "Should I move to Active-Active to double my firewall throughput?" Most candidates say yes. Most candidates are wrong.

A-P (Active-Passive) is the default. One unit pushes traffic; the other watches. Simple, predictable, easy to debug. ~99% of production FortiGate deployments use A-P.

A-A (Active-Active) load-balances new sessions across both units — but each session is still owned by exactly ONE unit for its lifetime. Asymmetric flows (where the SYN takes one path and the SYN-ACK another) still get hair-pinned over HBdev so both halves go through the owning unit's flow engine. Net effect: A-A helps when you have lots of short-lived sessions (DNS, HTTP/1.1 connections, brute small APIs); it does not double throughput for a single big flow.

set mode a-p YES → Active-Active (A-A) New sessions hash across both units. Each owned by ONE. ✓ Helps short-lived session-heavy ✓ UTM scan load-shares ✗ Asymmetric routing = HBdev hairpin → set mode a-a ⚠ Don't enable A-A expecting 2× throughput — common interview trap A single TCP flow is owned by ONE unit for its lifetime. A 10 Gbps file transfer stays on one box — the other unit doesn't help. A-A load-balances NEW sessions across both, so 100,000 small DNS lookups spread across boxes — but it's session-count load-share, NOT bandwidth doubling. If you actually need more throughput, buy a bigger model or move to FGSP / VRRP-style cluster.

Figure 3. A-P vs A-A decision tree. The red callout is the interview trap. A-A is not a free 2× — it is a session-count load-balancer with strict ownership. Use it for short-session-heavy workloads, not for "I need more Gbps."

The "A-A doubles throughput" myth — debunked

The hard truth recruiters want to hear: each session in A-A is still owned by exactly ONE unit (the one that received the SYN). A 5 Gbps Veeam backup stream rides one box. A 2 Gbps SQL replication stream rides one box. The other unit doesn't carry that traffic. A-A helps when you have lots of small, independent sessions — DNS, REST API calls, HTTP/1.1, UTM-heavy AV scanning. It does not help a single fat flow. Say this line in the interview and you've separated yourself from 90% of candidates.

Pause & Predict #2

Mukesh's senior asks: "We have 50,000 IoT devices behind the cluster, each doing a 200-byte MQTT keepalive every 30 seconds. A-P or A-A?"

A-A is the better fit here. 50,000 devices × frequent short MQTT sessions = high session-count, low per-session bandwidth — exactly A-A's sweet spot. New sessions hash across both units, UTM inspection load-shares, and a single fat flow isn't the bottleneck. Caveat: confirm there's no asymmetric routing (both directions of a session must reach the same unit, else HBdev hair-pinning eats your gains). For chat-y / IoT / web-API traffic, A-A wins. For VPN concentrator / big-file transfer / database replication, stay on A-P.

Split-brain — when both units believe they're primary

Split-brain is the textbook HA disaster. Both FortiGates simultaneously claim to be primary. Both respond to the cluster's virtual IP. Both send gratuitous ARPs for the virtual MAC. The upstream switch's MAC table flaps between ports. ARP caches on every downstream host go schizophrenic. TCP sessions break in 2-3 seconds. The only way to trigger split-brain is to lose ALL heartbeat connectivity between the units.

That's why best practice is "always cable two HBdev ports". The probability of one heartbeat cable failing is non-zero; the probability of both failing at once is many orders of magnitude lower. Mukesh's cluster has both ha1 and ha2 cross-cabled — exactly so a single cable bump can't trigger this.

Figure 4. Left — normal state. Right — split-brain: both units self-promote when HBdev dies. Recovery requires restoring the heartbeat first; only then can FGCP election (using override priority) pick the legitimate winner and force the loser to demote.

▶ Watch split-brain happen — and watch it heal

7 stages — both heartbeats die, both units self-promote, ARP chaos hits the LAN, you restore the cable, the cluster re-elects.

Stage 1 Normal. FortiGate-A primary, FortiGate-B secondary. ha1 and ha2 both up. Hellos at 200 ms intervals.

▼

Stage 2 Datacenter tech accidentally unplugs both ha1 and ha2 while reseating a SFP. HBdev fully dark.

▼

Stage 3 Both units miss 5 hellos at the same time. Both run FGCP election in isolation. Both pass their own monitored-interface check. Both promote themselves.

▼

Stage 4 Both units send gratuitous ARP for the same virtual MAC. Upstream switch CAM table flaps between port-3 and port-7. ARP storm on the LAN.

▼

Stage 5 Payment app starts seeing intermittent TCP RSTs. Mukesh's monitoring pages him. He runs get system ha status on the mgmt port — sees "Number of Members: 1" on BOTH boxes.

▼

Stage 6 Mukesh reseats the SFPs. ha1 + ha2 come back up. FGCP re-elects. Override is enabled + A has priority 200 → A wins.

▼

Stage 7 B demotes itself, stops owning vMAC, withdraws ARP. Cluster is healthy in ~3 s. Mukesh runs diag sys ha checksum to confirm config-sync, then writes the RCA.

Press Play to watch split-brain form and resolve in 7 stages.

Why FortiGate alone can't prevent split-brain

If you cut ALL HBdev paths simultaneously, FortiGate has no out-of-band way to ask "is my peer still alive?" — it assumes dead. FGCP does NOT use the WAN or LAN interfaces as a tie-breaker by default. The defenses are physical: two HBdev cables (so a single cable cut doesn't kill the link), routed HBdev over a dedicated VLAN if direct cabling is impractical, and reserved HA management interface (ha-mgmt-status enable) so even during split-brain you can still SSH each unit individually and reseat the cable.

Quick check · Q3 of 10

Mukesh suspects split-brain. He SSHes into FortiGate-A and runs get system ha status. The output says Number of Members: 1. What is the next single command that confirms — or rules out — split-brain?

a) execute reboot on A b) diag sniffer packet any 'host 10.30.0.1' 4 c) SSH into FortiGate-B via its reserved HA mgmt IP and run get system ha status — if it also reports "Number of Members: 1" with itself as primary, you have split-brain confirmed d) diag firewall iprope lookup

Correct: c. Split-brain by definition means BOTH units claim primary in isolation. You confirm by checking the peer separately — over the reserved HA management interface (which is exactly why set ha-mgmt-status enable is best practice). Option a destroys evidence and may make recovery worse. Option b checks data-plane but doesn't tell you cluster role. Option d is a policy-match tool, unrelated.

HA-aware upgrades — patching without downtime

Mukesh's senior's third question: "how do you upgrade firmware without dropping the payments traffic?" The answer is the FortiOS uninterruptable upgrade flag, which is on by default. Here's exactly what happens.

When you push a firmware image to the cluster's primary, FortiOS detects HA mode and follows this script automatically:

Push the image to secondary first. Secondary reboots into the new firmware. Cluster runs split-version for ~30 s while B is rebooting.
Once B is back up on the new firmware and rejoins the cluster, primary triggers an HA failover: the now-upgraded B becomes primary, A becomes secondary.
Primary (now A, still on old firmware) upgrades itself, reboots into the new firmware, rejoins as secondary.
If set override enable was set on A with higher priority, another failover flips A back to primary. Otherwise B stays primary.

Total user-visible blackout: 1-2 seconds, twice. ~30 seconds of running on a single unit (during each reboot). Same as a normal failover.

CLI — HA-aware upgrade (run on primary)

config system ha
    set uninterruptible-upgrade enable
end

# upload + apply firmware
execute restore image tftp FGT_200F-v7.6.2.M-build0123-FORTINET.out 10.30.99.50

Expected output (real-time during upgrade)

Image checksum verified. Installing image to FortiGate-B (secondary)...
FortiGate-B: image installed, rebooting...
FortiGate-B: rejoined cluster on FortiOS 7.6.2
HA failover: FortiGate-B is now primary
FortiGate-A: receiving image...
FortiGate-A: image installed, rebooting...
FortiGate-A: rejoined cluster on FortiOS 7.6.2
HA failover (override): FortiGate-A is now primary
Cluster upgrade complete.

Pro tip — checksum-mismatch is the upgrade trap

If a partial config change happened on the primary but didn't sync to secondary before the upgrade started, you can end up with a checksum mismatch AFTER the upgrade — both units on the new firmware, but with diverging configs. Always run diag sys ha checksum BEFORE you push the firmware. If any section's checksum differs between members, run execute ha synchronize start from the primary and re-verify before proceeding.

Figure 5. Cheat-sheet — 9 commands that handle ~80% of HA operations and troubleshooting. Print this page or screenshot it for your lab bench.

Verify the cluster is healthy

After any HA change (config, upgrade, cable reseat), run this 2-line check:

get system ha status — confirm both members listed, both in-sync, HBDEV stats showing zero lost hellos.

diag sys ha checksum — confirm per-section checksums match across members. ANY mismatch = sync issue = fix before signoff.

Quick lab — try this in your home lab tonight

If you have two FortiGate VMs running on the same hypervisor (or two physical units), this 6-step lab walks you through the full lifecycle Mukesh just saw. Takes 25 minutes end-to-end.

Bring up two FortiGate VMs on FortiOS 7.4 / 7.6. Static-IP the mgmt port on each (e.g. 10.30.99.10 + 10.30.99.11).
Add a virtual switch / shared network between them for HBdev — route both ha1 and ha2 over it.
Paste the HA config block from this blog onto BOTH units (change priority to 100 on the secondary). Verify get system ha status on both — should see "Number of Members: 2".
Enable set session-pickup enable. From a client behind the cluster, start a long-running curl with --continue-at - against a public file. Confirm it survives.
Pull the cable from primary's wan1 (or shut the interface). Watch the 1-2 s failover. Verify get system ha status on B shows it as primary.
Restore wan1. If override was set on A, A re-takes primary. Run diag sys ha checksum + diag sys ha dump-by all-vdom to confirm cluster healthy.

2024 incident every Fortinet candidate must know — CVE-2024-23113 + FGCP exposure

In early 2024, Fortinet PSIRT disclosed CVE-2024-23113 — a format-string vulnerability in the FortiOS fgfmd daemon. The daemon also handles parts of the FGCP protocol exposed on HBdev and reserved HA management interfaces. Defensive lesson for this blog: the HBdev link must never traverse a routed / shared L3 path that's reachable from production VLANs. Cross-cable HBdev directly, or use a dedicated isolated VLAN that no user traffic can reach. CISA flagged this as actively exploited in 2024. Source: Fortinet PSIRT FG-IR-24-029.

Pause & Predict #3

Mukesh runs diag sys ha checksum on FortiGate-A and sees the firewall.policy checksum is 0x4a2b1f08, but on B it's 0x7c14e0bb. What's the single best next command — and what is he likely to find?

execute ha synchronize start from the primary, then re-run diag sys ha checksum. Most likely root cause: someone (or a partial config push) committed a change directly on the secondary while config-sync was queued — or there was a sync-failure during a recent upgrade. Force-sync from primary will overwrite the secondary's diverging section. If checksums still don't match after force-sync, run get system ha status to confirm HBdev is up and hellos aren't being dropped — a flapping HBdev silently breaks sync without flipping the cluster state. Top L3 muscle memory.

🤖 Ask the AI Tutor

Tap any question — instant context-aware answer. Tuned on FortiOS 7.4 / 7.6 docs + community.

Pre-curated answers grounded in FortiOS 7.4 / 7.6 docs + LIVECommunity. For complex prod issues, paste your get system ha status + diag sys ha checksum output into chat.techclick.in.

Self-explanation prompt

In 2-3 sentences, explain to a hypothetical batchmate: "Why doesn't Active-Active double FortiGate throughput, even though both units are forwarding packets?" Writing it out cements the concept faster than re-reading.

📤 Teach a friend on WhatsApp: Share Remind me to revisit in 5 days (spaced repetition)

📖 Mini-glossary — terms used in this blog

FGCP: FortiGate Clustering Protocol — Fortinet's proprietary HA protocol carried over HBdev.
HBdev: The heartbeat interface — dedicated link carrying FGCP hellos, config sync, and session sync.
A-P (Active-Passive): Default HA mode. One unit active, the other watching. ~99% of production deployments.
A-A (Active-Active): New sessions hash across both units; each session still owned by ONE unit. Not a throughput doubler.
Virtual MAC: Shared L2 address per monitored interface, owned by whichever unit is primary.
Session-pickup: Off by default. When on, primary streams live session table → secondary so TCP/UDP survive failover.
Config sync: Always-on. Primary pushes config to secondary; verify with diag sys ha checksum.
Override priority: When set override enable, priority beats uptime in election → pins specific unit as primary.
Monitor-interface: Listed in set monitor. If any monitored interface drops, that unit loses the election.
Virtual cluster: Multiple HA clusters running on the same physical pair (per-VDOM). Each VDOM elects independently.

Where this gets asked: Every production FortiGate deployment is HA — which means HA gets tested at every Fortinet interview round. an Indian enterprise, an Indian IT services firm, the payment-gateway customer, an Indian MSP and an Indian security firm all open with "explain HA" at the L1 screen, then drill into split-brain recovery and override priority at the L2 round, then into HA-aware firmware upgrades + CVE-2024-23113 awareness at the L3 round.

What's next?

Blog 4 opens up FortiGate VPNs — IPsec Phase 1 / Phase 2, SSL VPN, and how to harden them against the CVE-2024-21762 + CVE-2024-55591 exploitation wave still active across 2025.

Blog 4 · FortiGate VPNs (CVE-hardened) → ← All Fortinet lessons

📚 Sources

Fortinet Docs — FortiGate HA Administration Guide (FortiOS 7.4 / 7.6) — FGCP cluster, session-pickup, override, uninterruptible-upgrade. docs.fortinet.com
UniNets — Top Fortinet Firewall Interview Q&A 2025 — High Availability section. uninets.com/blog/fortinet-firewall-interview-questions-answers
MindMajix — Top Fortinet Interview Q&A 2025 — HA, A-P vs A-A, split-brain. mindmajix.com/fortinet-interview-questions
Glassdoor — Fortinet Network Security Engineer interview reports (an Indian enterprise / an Indian IT services firm / the payment-gateway customer / an Indian MSP, 2024-2025) — frequency-ranked HA questions. glassdoor.com
NWKings — Top 20 Fortinet Firewall Interview Questions and Answers (2025) — HA failover + checksum drift. nwkings.com/fortinet-firewall-interview-questions-and-answers
Fortinet Community — Troubleshooting Tip: FGCP cluster checksum mismatch + force synchronize. community.fortinet.com (LIVECommunity FGCP thread)
Fortinet PSIRT — FG-IR-24-029 / CVE-2024-23113 advisory and fgfmd hardening guidance. fortinet.com/psirt

📩 Quiz me on this in 7 days. Opt in and we'll email you 3 micro-questions from this lesson at Day 1, Day 7 and Day 30 — spaced repetition is how it sticks. Un-tick any time.

FortiGate High Availability — A-P vs A-A, Split-Brain Recovery, and FGCP in 11 Minutes

Pick your path — jump to your weak spot

FGCP basics

A-P vs A-A

Split-brain recovery

HA-aware upgrades

Why this matters — the pilot and the co-pilot

The three building blocks every interviewer wants you to name

FGCP election in 30 seconds

The cluster from CLI — what you actually type

Failover in slow-mo — wan1 down, payments stay up

▶ Watch failover play out — 7 stages

A-P vs A-A — and the throughput myth that costs candidates the job

Split-brain — when both units believe they're primary

▶ Watch split-brain happen — and watch it heal

HA-aware upgrades — patching without downtime

Quick lab — try this in your home lab tonight

🤖 Ask the AI Tutor

📝 Final round — seven more

Self-explanation prompt

📖 Mini-glossary — terms used in this blog

What's next?

📚 Sources