TTechclick All lessons
Fortinet · FortiGate · High Availability (FGCP)🔥 ~80% interview hit-rate · #4 in frequencyInteractive · L1 / L2 / L3

FortiGate High Availability — A-P vs A-A, Split-Brain Recovery, and FGCP in 11 Minutes

Every production FortiGate is a pair, not a box. Recruiters at an Indian enterprise, an Indian IT services firm, the payment-gateway customer, an Indian MSP and an Indian security firm don't want "I know HA exists" — they want the heartbeat interface, why A-A doesn't double your throughput, what split-brain feels like in the wire, and the override-priority trick that pins a unit as primary. This blog gets you there.

📅 2026-05-26 · ⏱ 11 min · 5 SVG infographics · 2 animated visualizers · 🏷 10-Q Bloom-tiered assessment + AI Tutor

Pick your path — jump to your weak spot

1

FGCP basics

Heartbeat interface, virtual MAC, election — the cluster from scratch.

2

A-P vs A-A

Why A-A does NOT double throughput — the session-ownership trap.

3

Split-brain recovery

Both units claiming primary, duplicate IPs, ARP chaos — and the fix.

4

HA-aware upgrades

Uninterruptable upgrade flag, checksum drift, when override fights you.

Why this matters — the pilot and the co-pilot

Imagine a Boeing 777 doing the Mumbai–Singapore route. There's a pilot flying and a co-pilot on the right seat. The co-pilot isn't a passenger — both hands are on the yoke, both pairs of eyes on the instruments. If the pilot has a heart attack mid-flight, the co-pilot doesn't need to be "called from the back". Control transfers in seconds. The passengers in 23F never notice.

That's exactly what a FortiGate FGCP cluster does for the network. Two physical firewalls, one virtual identity. One actively pushing traffic ("primary"), the other quietly verifying config + watching the heartbeat ("secondary"). When the primary fails, the secondary doesn't reboot, doesn't relearn — it takes the controls in under 2 seconds. Users keep their TCP sessions. Voice calls don't drop. The payments client doesn't see 5xx.

That single image — pilot and co-pilot, both hands on the yoke — is what every interviewer is testing for when they ask "explain HA to me." Get it right and the next 10 minutes are yours.

Scenario · Mukesh — network engineer at a Hyderabad payment-gateway customer

Mukesh (L2 network engineer, 14 months in) is deploying a 2-node FGCP cluster for a payments client. Two FortiGate 200F units, racked side-by-side. Internal subnet 10.30.10.0/24 behind the cluster's virtual IP 10.30.0.1. WAN side on 203.0.113.20. HA1 and HA2 ports cross-cabled between the two boxes for the heartbeat. The deployment plan says "A-P, primary on the left, RPO=0".

Mukesh's senior throws him three questions before signoff: "What happens if the heartbeat cable falls out?" · "If I add a second WAN circuit, should you change to A-A?" · "How do you upgrade firmware without dropping the payments traffic?" Three good questions. Three classic interview rounds. We answer all three in this blog.

The three building blocks every interviewer wants you to name

Before we dive in, here are the three terms you will be asked to define out loud — separated, in plain English. Get these crisp and you've already passed the L1 round.

FortiGate FGCP cluster topology — two units, heartbeat link, shared virtual MAC Two FortiGate firewalls placed side by side. WAN, internal, and HA-mgmt interfaces are shown. A dedicated HA1 and HA2 heartbeat link is cross-cabled between them. The internal interface advertises a virtual MAC and a shared virtual IP. The primary unit is highlighted in deep blue; the secondary in light gray. FGCP cluster — 2 units, 1 virtual identity Internet 203.0.113.20 Upstream switch (wan1) FortiGate-A PRIMARY · priority 200 SN: FGT200F-0001 wan1 internal ha1 ha2 mgmt (reserved) FortiGate-B SECONDARY · priority 100 SN: FGT200F-0002 wan1 internal ha1 ha2 mgmt (reserved) HA1 + HA2 heartbeat (169.254.0.x /30 link-local) Downstream switch (internal) Virtual MAC 00:09:0f:09:00:01 · Virtual IP 10.30.0.1 Both units share. Only the primary actually answers on it.
Figure 1. A textbook 2-node FGCP cluster. HA1 + HA2 cross-cabled between the two physical units carry FGCP hellos and config-sync. The internal interface advertises one shared virtual MAC + virtual IP that both members own — but only the primary actively responds.

FGCP election in 30 seconds

When the cluster forms (or reforms after a failure), FGCP picks a primary in this strict order — first tie-breaker wins:

  1. Monitored interface health — a unit with a failed monitored interface (e.g. wan1 down) can never become primary.
  2. HA priority — higher wins (default 128 on every unit; you set it explicitly in your config).
  3. Uptime — longer-running unit wins, but ONLY when set override is disabled (default).
  4. Serial number — lexicographic; lower wins as a last resort.
Pro tip — pin a specific unit as primary

set ha-uptime override is NOT a thing — the right trick is set override enable on the unit you want to win, then set its priority higher than its peer's. This is the only reliable way to pin a specific unit as primary across reboots, because with override enabled, priority now beats uptime in the election. Without override, the unit that booted first keeps winning forever — which is fine, until you reboot it and the "wrong" unit takes over.

The cluster from CLI — what you actually type

Here's the minimal HA config Mukesh runs on both units (they MUST match group-name and password — that's how they find each other on the HA link):

CLI — bring up FGCP (run on both members)
config system ha
    set group-name "payments-cluster"
    set mode a-p
    set password ENC SuperLongHaSecret_2026
    set hbdev "ha1" 200 "ha2" 100
    set session-pickup enable
    set ha-mgmt-status enable
    config ha-mgmt-interfaces
        edit 1
            set interface "mgmt"
            set gateway 10.30.99.1
        next
    end
    set override enable
    set priority 200          # set 100 on FortiGate-B
    set monitor "wan1" "internal"
end
Expected output (after sync completes) — get system ha status
HA Health Status: OK
Model: FortiGate-200F
Mode: HA A-P
Group Name: payments-cluster
Number of Members: 2
Cluster Uptime: 0 days 0:14:12
Primary  : FortiGate-A FGT200F-0001 priority=200
Secondary: FortiGate-B FGT200F-0002 priority=100
HBDEV stats: ha1 up · ha2 up · 0 lost hello packets
Configuration Status:
   FortiGate-A(updated 5s ago): in-sync
   FortiGate-B(updated 5s ago): in-sync

Two things to read in that output. First, HBDEV stats — both heartbeat links up, zero lost hellos. Second, Configuration Status — both members in-sync. If either line is off, you have a problem; we'll walk through both later.

💓
FGCP
tap to flip

FortiGate Clustering Protocol. Fortinet's proprietary HA protocol. Hellos every 200 ms by default. 5 missed hellos = peer declared dead → election starts.

🔌
Heartbeat (HBdev)
tap to flip

Dedicated link carrying FGCP hellos + config-sync + (with session-pickup on) session table sync. Always cable 2 HBdevs — one is single-point-of-failure.

📦
Session-pickup
tap to flip

Off by default. When ON, primary streams live session table to secondary over HBdev → failover keeps existing TCP/UDP sessions alive. Critical for VoIP / payments.

🔄
Config sync
tap to flip

Always on. Primary pushes config to secondary over HBdev. Verify with diag sys ha checksum — every section must match between members.

🏷
Virtual MAC
tap to flip

Shared L2 address per monitored interface (format 00:09:0f:09:XX:YY). After failover, the new primary sends a gratuitous ARP → upstream switch updates its CAM table in <1s.

📌
Override priority
tap to flip

set override enable + higher priority = this unit always wins the election. Without override, longest-uptime wins → "wrong" unit may stay primary forever after a reboot.

Pause & Predict #1

Before you scroll — Mukesh configures set priority 200 on FortiGate-A but forgets set override enable. He boots A first, then B. Months later, A reboots. When A comes back, who is primary?

FortiGate-B remains primary. Without override, FGCP uses uptime as the tie-breaker AFTER priority — and the election only re-runs when something forces it. A's higher priority doesn't pre-empt B because B has been continuously running. To make A win, Mukesh needed set override enable; only then does priority beat uptime in the election. Fix: enable override on both units, then run execute ha manage 0 + execute ha synchronize start.
Quick check · Q1 of 10 · Remember

Mukesh — network engineer at a Hyderabad payment-gateway customer is asked: "What is the role of HBdev in a FortiGate FGCP cluster?"

Correct: d. HBdev is the FGCP backplane — hellos at 200 ms intervals, config sync, and session-sync when session-pickup is on. Losing it = peer declared dead = election fires = potential split-brain. Options a, b, c describe completely different interfaces (mgmt, WAN failover, log shipper). Best practice: always cable two HBdevs (ha1 + ha2) so a single cable cut doesn't trigger failover.

Failover in slow-mo — wan1 down, payments stay up

The most common interview ask after "explain HA" is "walk me through a failover, step by step". Most candidates say "primary fails, secondary takes over" — too vague. Here's the actual sequence Mukesh saw the day his client's left rack lost power.

FGCP failover sequence — wan1 down on primary Horizontal six-box flow showing the failover sequence: (1) primary owns sessions, (2) wan1 link goes down on primary, (3) HBdev detects via 5 missed hellos, (4) secondary promotes via FGCP election, (5) virtual MAC gratuitous ARP is sent, (6) traffic switches in approximately 1-2 seconds. Failover sequence — primary down, secondary takes over in ~1-2 s ① t=0 Primary owns all sessions heartbeat OK ② t=0.0s wan1 link DOWN on primary PSU dies / cable cut ③ t=1.0s 5 missed hellos on HBdev 200 ms × 5 = 1 s ④ t=1.1s Secondary wins election monitored intf check ⑤ t=1.2s Gratuitous ARP for virtual MAC upstream switch updates ⑥ t=1.5–2.0s Traffic flows on new primary payments unaffected If session-pickup is ENABLED → existing TCP sessions survive Mukesh's payments flow keeps its TCP state. SIP calls don't drop. Users never see a reconnect. If session-pickup is OFF (default) → existing TCP sessions DROP Clients reconnect. ~1-2 s of "site loading" for the user. NEW sessions ride the new primary normally.
Figure 2. Failover in 6 steps. Default total cut-over time is ~1-2 seconds. With session-pickup off (the default), existing TCP sessions drop but reconnect cleanly; with it on, your payments traffic doesn't even notice.

▶ Watch failover play out — 7 stages

Press Play to auto-advance, or Next to step manually. Stage 1 to 7 — primary dies on the left, secondary takes over on the right.

Stage 1 FortiGate-A (primary) is processing payment traffic. get system ha status → both members healthy.
Stage 2 Power-strip on the left rack pops. FortiGate-A goes dark. wan1 + internal interfaces go down at the same instant.
Stage 3 FortiGate-B stops receiving FGCP hellos on ha1 and ha2. It waits — 1 missed, 2 missed, 3 missed, 4 missed.
Stage 4 5th hello missed (t = ~1.0 s). FortiGate-B declares peer dead. FGCP election runs — B's monitored interfaces are all UP, so B wins by default.
Stage 5 FortiGate-B promotes itself to primary. Sends gratuitous ARP on wan1 and internal — "the virtual MAC now lives on my port".
Stage 6 Upstream and downstream switches update their CAM tables. Next frame for the virtual MAC now physically leaves via the right rack's port.
Stage 7 Payment TCP sessions resume (session-pickup ON → no reconnect). Total wire-time blackout: ~1.5 s. Mukesh's pager doesn't ring.
Press Play to watch the 7-stage failover, or step manually with Next.
Quick check · Q2 of 10

Mukesh's payments client says "we cannot lose a single TCP session during failover." Which one setting must Mukesh enable on the cluster?

Correct: b. Session-pickup is OFF by default — toggle it on so the primary streams its live session table to the secondary over HBdev. After failover, the new primary has the session entries and existing TCP/UDP flows survive. Option a enables a reserved management interface (useful but unrelated). Option c controls election behaviour, not session survival. Option d changes cluster mode and does not by itself preserve sessions.

A-P vs A-A — and the throughput myth that costs candidates the job

The single most common Fortinet HA interview trap: "Should I move to Active-Active to double my firewall throughput?" Most candidates say yes. Most candidates are wrong.

A-P (Active-Passive) is the default. One unit pushes traffic; the other watches. Simple, predictable, easy to debug. ~99% of production FortiGate deployments use A-P.

A-A (Active-Active) load-balances new sessions across both units — but each session is still owned by exactly ONE unit for its lifetime. Asymmetric flows (where the SYN takes one path and the SYN-ACK another) still get hair-pinned over HBdev so both halves go through the owning unit's flow engine. Net effect: A-A helps when you have lots of short-lived sessions (DNS, HTTP/1.1 connections, brute small APIs); it does not double throughput for a single big flow.

Decision tree — Active-Passive vs Active-Active for FortiGate HA Branching tree starting from the root question of whether you need to double throughput or accept default. The A-P branch leads to simpler, single-active operation. The A-A branch leads to per-new-session load balancing across both units, with a red callout warning that A-A does not double single-flow throughput. A-P or A-A? Pick on workload, NOT on "throughput" Workload = many short-lived sessions (DNS, brute API, HTTP/1.1)? NO (default) → Active-Passive (A-P) Default. One unit active. Other idle, watching. ✓ Simpler to debug ✓ Predictable session ownership ✓ ~99% of prod deployments set mode a-p YES → Active-Active (A-A) New sessions hash across both units. Each owned by ONE. ✓ Helps short-lived session-heavy ✓ UTM scan load-shares ✗ Asymmetric routing = HBdev hairpin set mode a-a ⚠ Don't enable A-A expecting 2× throughput — common interview trap A single TCP flow is owned by ONE unit for its lifetime. A 10 Gbps file transfer stays on one box — the other unit doesn't help. A-A load-balances NEW sessions across both, so 100,000 small DNS lookups spread across boxes — but it's session-count load-share, NOT bandwidth doubling. If you actually need more throughput, buy a bigger model or move to FGSP / VRRP-style cluster.
Figure 3. A-P vs A-A decision tree. The red callout is the interview trap. A-A is not a free 2× — it is a session-count load-balancer with strict ownership. Use it for short-session-heavy workloads, not for "I need more Gbps."
The "A-A doubles throughput" myth — debunked

The hard truth recruiters want to hear: each session in A-A is still owned by exactly ONE unit (the one that received the SYN). A 5 Gbps Veeam backup stream rides one box. A 2 Gbps SQL replication stream rides one box. The other unit doesn't carry that traffic. A-A helps when you have lots of small, independent sessions — DNS, REST API calls, HTTP/1.1, UTM-heavy AV scanning. It does not help a single fat flow. Say this line in the interview and you've separated yourself from 90% of candidates.

Pause & Predict #2

Mukesh's senior asks: "We have 50,000 IoT devices behind the cluster, each doing a 200-byte MQTT keepalive every 30 seconds. A-P or A-A?"

A-A is the better fit here. 50,000 devices × frequent short MQTT sessions = high session-count, low per-session bandwidth — exactly A-A's sweet spot. New sessions hash across both units, UTM inspection load-shares, and a single fat flow isn't the bottleneck. Caveat: confirm there's no asymmetric routing (both directions of a session must reach the same unit, else HBdev hair-pinning eats your gains). For chat-y / IoT / web-API traffic, A-A wins. For VPN concentrator / big-file transfer / database replication, stay on A-P.

Split-brain — when both units believe they're primary

Split-brain is the textbook HA disaster. Both FortiGates simultaneously claim to be primary. Both respond to the cluster's virtual IP. Both send gratuitous ARPs for the virtual MAC. The upstream switch's MAC table flaps between ports. ARP caches on every downstream host go schizophrenic. TCP sessions break in 2-3 seconds. The only way to trigger split-brain is to lose ALL heartbeat connectivity between the units.

That's why best practice is "always cable two HBdev ports". The probability of one heartbeat cable failing is non-zero; the probability of both failing at once is many orders of magnitude lower. Mukesh's cluster has both ha1 and ha2 cross-cabled — exactly so a single cable bump can't trigger this.

Split-brain — normal state versus split-brain state Two side-by-side panels. Left panel shows normal HA operation: heartbeat link healthy, one primary one secondary, single virtual MAC announced. Right panel shows split-brain: heartbeat link cut, both units promoted to primary, both announcing the same virtual MAC, causing duplicate IPs on the LAN and ARP cache flapping. Below the panels, a recovery flow shows: restore HBdev cable, override priority decides winner, loser demotes, sync resumes. Split-brain — what it looks like, and how to recover Normal state FortiGate-A PRIMARY owns vMAC FortiGate-B SECONDARY watching HBdev OK — hellos every 200ms LAN — virtual MAC announced ONCE by A Split-brain FortiGate-A PRIMARY (self) owns vMAC FortiGate-B PRIMARY (self) owns vMAC HBdev DOWN — both think peer is dead LAN — duplicate vMAC, ARP cache flapping ⚠ Recovery — 4 steps Restore HBdev cable / SFP + verify both up FGCP re-elects override priority decides winner Loser demotes stops owning vMAC withdraws ARP Sync resumes config + sessions cluster healthy CLI verify after recovery get system ha status · diag sys ha checksum · diag sys ha dump-by all-vdom
Figure 4. Left — normal state. Right — split-brain: both units self-promote when HBdev dies. Recovery requires restoring the heartbeat first; only then can FGCP election (using override priority) pick the legitimate winner and force the loser to demote.

▶ Watch split-brain happen — and watch it heal

7 stages — both heartbeats die, both units self-promote, ARP chaos hits the LAN, you restore the cable, the cluster re-elects.

Stage 1 Normal. FortiGate-A primary, FortiGate-B secondary. ha1 and ha2 both up. Hellos at 200 ms intervals.
Stage 2 Datacenter tech accidentally unplugs both ha1 and ha2 while reseating a SFP. HBdev fully dark.
Stage 3 Both units miss 5 hellos at the same time. Both run FGCP election in isolation. Both pass their own monitored-interface check. Both promote themselves.
Stage 4 Both units send gratuitous ARP for the same virtual MAC. Upstream switch CAM table flaps between port-3 and port-7. ARP storm on the LAN.
Stage 5 Payment app starts seeing intermittent TCP RSTs. Mukesh's monitoring pages him. He runs get system ha status on the mgmt port — sees "Number of Members: 1" on BOTH boxes.
Stage 6 Mukesh reseats the SFPs. ha1 + ha2 come back up. FGCP re-elects. Override is enabled + A has priority 200 → A wins.
Stage 7 B demotes itself, stops owning vMAC, withdraws ARP. Cluster is healthy in ~3 s. Mukesh runs diag sys ha checksum to confirm config-sync, then writes the RCA.
Press Play to watch split-brain form and resolve in 7 stages.
Why FortiGate alone can't prevent split-brain

If you cut ALL HBdev paths simultaneously, FortiGate has no out-of-band way to ask "is my peer still alive?" — it assumes dead. FGCP does NOT use the WAN or LAN interfaces as a tie-breaker by default. The defenses are physical: two HBdev cables (so a single cable cut doesn't kill the link), routed HBdev over a dedicated VLAN if direct cabling is impractical, and reserved HA management interface (ha-mgmt-status enable) so even during split-brain you can still SSH each unit individually and reseat the cable.

Quick check · Q3 of 10

Mukesh suspects split-brain. He SSHes into FortiGate-A and runs get system ha status. The output says Number of Members: 1. What is the next single command that confirms — or rules out — split-brain?

Correct: c. Split-brain by definition means BOTH units claim primary in isolation. You confirm by checking the peer separately — over the reserved HA management interface (which is exactly why set ha-mgmt-status enable is best practice). Option a destroys evidence and may make recovery worse. Option b checks data-plane but doesn't tell you cluster role. Option d is a policy-match tool, unrelated.

HA-aware upgrades — patching without downtime

Mukesh's senior's third question: "how do you upgrade firmware without dropping the payments traffic?" The answer is the FortiOS uninterruptable upgrade flag, which is on by default. Here's exactly what happens.

When you push a firmware image to the cluster's primary, FortiOS detects HA mode and follows this script automatically:

  1. Push the image to secondary first. Secondary reboots into the new firmware. Cluster runs split-version for ~30 s while B is rebooting.
  2. Once B is back up on the new firmware and rejoins the cluster, primary triggers an HA failover: the now-upgraded B becomes primary, A becomes secondary.
  3. Primary (now A, still on old firmware) upgrades itself, reboots into the new firmware, rejoins as secondary.
  4. If set override enable was set on A with higher priority, another failover flips A back to primary. Otherwise B stays primary.

Total user-visible blackout: 1-2 seconds, twice. ~30 seconds of running on a single unit (during each reboot). Same as a normal failover.

CLI — HA-aware upgrade (run on primary)
config system ha
    set uninterruptible-upgrade enable
end

# upload + apply firmware
execute restore image tftp FGT_200F-v7.6.2.M-build0123-FORTINET.out 10.30.99.50
Expected output (real-time during upgrade)
Image checksum verified. Installing image to FortiGate-B (secondary)...
FortiGate-B: image installed, rebooting...
FortiGate-B: rejoined cluster on FortiOS 7.6.2
HA failover: FortiGate-B is now primary
FortiGate-A: receiving image...
FortiGate-A: image installed, rebooting...
FortiGate-A: rejoined cluster on FortiOS 7.6.2
HA failover (override): FortiGate-A is now primary
Cluster upgrade complete.
Pro tip — checksum-mismatch is the upgrade trap

If a partial config change happened on the primary but didn't sync to secondary before the upgrade started, you can end up with a checksum mismatch AFTER the upgrade — both units on the new firmware, but with diverging configs. Always run diag sys ha checksum BEFORE you push the firmware. If any section's checksum differs between members, run execute ha synchronize start from the primary and re-verify before proceeding.

FortiGate HA — 9-command cheat sheet Nine numbered tiles, each containing one essential FortiGate HA CLI command and a short description of what it shows or does. Together they cover the eight most common HA operational and troubleshooting tasks. FortiGate HA — 9-command cheat sheet ① HA STATUS get system ha status Cluster role, members, uptime, sync state — your first command. ② CHECKSUM diag sys ha checksum show Per-section config hash — detect drift between members. ③ DUMP-BY VDOM diag sys ha dump-by all-vdom Full per-VDOM HA state — deep inspection for multi-VDOM. ④ MANAGE PEER execute ha manage 1 Jump CLI to secondary unit — no separate SSH needed. ⑤ FORCE SYNC execute ha synchronize start Force config sync from primary when checksum diverges. ⑥ RESET UPTIME diag sys ha reset-uptime Reset uptime counter to force re-election (no override needed). ⑦ CONFIGURE HA config system ha Top-level HA config block — priority / mode / hbdev / monitor. ⑧ SESSION-PICKUP set session-pickup enable Stream sessions to secondary — survive TCP through failover. ⑨ OVERRIDE set override enable set priority 200 Pin a specific unit as primary — priority beats uptime in election. Bookmark this page. These 9 commands handle ~80% of HA operations and incidents in the field. Tested on FortiOS 7.4 / 7.6. Output format identical on FortiGate VM, FortiGate 60F, 100F, 200F, 600F.
Figure 5. Cheat-sheet — 9 commands that handle ~80% of HA operations and troubleshooting. Print this page or screenshot it for your lab bench.
Verify the cluster is healthy

After any HA change (config, upgrade, cable reseat), run this 2-line check:

get system ha status — confirm both members listed, both in-sync, HBDEV stats showing zero lost hellos.

diag sys ha checksum — confirm per-section checksums match across members. ANY mismatch = sync issue = fix before signoff.

Quick lab — try this in your home lab tonight

If you have two FortiGate VMs running on the same hypervisor (or two physical units), this 6-step lab walks you through the full lifecycle Mukesh just saw. Takes 25 minutes end-to-end.

  1. Bring up two FortiGate VMs on FortiOS 7.4 / 7.6. Static-IP the mgmt port on each (e.g. 10.30.99.10 + 10.30.99.11).
  2. Add a virtual switch / shared network between them for HBdev — route both ha1 and ha2 over it.
  3. Paste the HA config block from this blog onto BOTH units (change priority to 100 on the secondary). Verify get system ha status on both — should see "Number of Members: 2".
  4. Enable set session-pickup enable. From a client behind the cluster, start a long-running curl with --continue-at - against a public file. Confirm it survives.
  5. Pull the cable from primary's wan1 (or shut the interface). Watch the 1-2 s failover. Verify get system ha status on B shows it as primary.
  6. Restore wan1. If override was set on A, A re-takes primary. Run diag sys ha checksum + diag sys ha dump-by all-vdom to confirm cluster healthy.
2024 incident every Fortinet candidate must know — CVE-2024-23113 + FGCP exposure

In early 2024, Fortinet PSIRT disclosed CVE-2024-23113 — a format-string vulnerability in the FortiOS fgfmd daemon. The daemon also handles parts of the FGCP protocol exposed on HBdev and reserved HA management interfaces. Defensive lesson for this blog: the HBdev link must never traverse a routed / shared L3 path that's reachable from production VLANs. Cross-cable HBdev directly, or use a dedicated isolated VLAN that no user traffic can reach. CISA flagged this as actively exploited in 2024. Source: Fortinet PSIRT FG-IR-24-029.

Pause & Predict #3

Mukesh runs diag sys ha checksum on FortiGate-A and sees the firewall.policy checksum is 0x4a2b1f08, but on B it's 0x7c14e0bb. What's the single best next command — and what is he likely to find?

execute ha synchronize start from the primary, then re-run diag sys ha checksum. Most likely root cause: someone (or a partial config push) committed a change directly on the secondary while config-sync was queued — or there was a sync-failure during a recent upgrade. Force-sync from primary will overwrite the secondary's diverging section. If checksums still don't match after force-sync, run get system ha status to confirm HBdev is up and hellos aren't being dropped — a flapping HBdev silently breaks sync without flipping the cluster state. Top L3 muscle memory.

🤖 Ask the AI Tutor

Tap any question — instant context-aware answer. Tuned on FortiOS 7.4 / 7.6 docs + community.

Pre-curated answers grounded in FortiOS 7.4 / 7.6 docs + LIVECommunity. For complex prod issues, paste your get system ha status + diag sys ha checksum output into chat.techclick.in.

📝 Final round — seven more

You've already answered 3 inline. Seven more. 70% (7 of 10) total marks this lesson complete on your Techclick profile. Tap Submit all answers at the end.

Q4 · Apply

Mukesh — network engineer at a Hyderabad payment-gateway customer is told the payments client cannot tolerate even a 2-second TCP drop during firmware upgrades. What's the single most important toggle to validate before kicking off the firmware push?

Correct: a. Session-pickup is the single switch that determines whether existing TCP / UDP sessions survive an HA failover. Without it, the upgrade-failover at the midpoint will drop existing flows. Option b changes HA mode (orthogonal). Option c configures monitored interfaces (also useful, but not the answer). Option d is what you run if checksums mismatch — not a pre-upgrade gate.
Q5 · Apply

Mukesh wants FortiGate-A to be primary whenever it is healthy — even after A reboots and B has been running longer. Which exact combination does he configure?

Correct: c. Without override, priority is checked but uptime breaks ties and effectively wins forever (the longer-running unit stays primary). With override enabled on BOTH units + asymmetric priority values, priority becomes decisive and A will pre-empt B when A is healthy. Option a misses override. Option b is not a real CLI. Option d doesn't exist.
Q6 · Apply

Mukesh's CFO asks for "twice the firewall throughput by moving to Active-Active." Mukesh's traffic mix is 80% large Veeam backup flows and 20% small DNS. Which response is correct?

Correct: b. The throughput myth — interviewers love this. Each TCP flow is owned by exactly one cluster member. Large flows do not span units. A-A is a session-count load-share, not a bandwidth doubler. With Mukesh's 80% large-flow workload, A-A buys little to nothing — the right answer is a bigger model or workload segmentation.
Q7 · Analyze

After a 2 AM rack maintenance, Mukesh sees TCP RSTs hitting the payment app. He SSHes the cluster's mgmt IP and lands on FortiGate-A. get system ha status on A says "Number of Members: 1". What is the most likely diagnosis, and what's the safe next move?

Correct: a. "Number of Members: 1" + payment-side RSTs = strong split-brain signal. Confirm via the peer's reserved HA mgmt interface — that's literally why it exists. Options b, c, and d destroy evidence and can make the outage worse (b causes another failover; c removes the HA stack mid-incident; d wipes the unit). Always confirm before mutating production state.
Q8 · Analyze

Mukesh notices the cluster shows "in-sync" but the new firewall policy he added yesterday on the primary doesn't appear on the secondary. diag sys ha checksum shows the firewall.policy section's checksum differs between members. What does this mean, and what's the fix?

Correct: d. Checksum mismatch is the definitive signal of config drift, even when "in-sync" appears in get system ha status. Force-sync from primary is the first move; if it doesn't converge, the cause is usually HBdev hello drops or a fgfmd / sync process stuck on the secondary. Option a is a hammer that may not help. Option b is wrong — checksums are the ground truth. Option c is destructive and unnecessary at this stage.
Q9 · Evaluate

A new engineer proposes: "To save a rack-U, we can put HBdev over the production LAN VLAN instead of cross-cabling — same broadcast domain, same packets." Evaluate this proposal.

Correct: c. HBdev availability is the single point your cluster's correctness rests on. Sharing it with user traffic invites two failures: (1) hello drops under load → false failover or split-brain, and (2) lateral exploit surface (CVE-2024-23113 lesson). Best practice is direct cross-cable, or — when impractical — a dedicated VLAN that no user traffic can reach. Anything else trades a rack-U for a P0 outage waiting to happen.
Q10 · Evaluate

Mukesh has a 2-node A-P cluster with session-pickup enable and uninterruptible-upgrade enable. He wants to push FortiOS 7.6.2 with the LEAST risk to payment traffic. Pick the safest sequence.

Correct: b. The uninterruptible-upgrade flow IS the safe path — FortiOS orchestrates secondary-first, mid-upgrade failover, then primary-upgrade automatically. The two essential pre-flight checks are checksum (no drift) + ha status (healthy). Option a creates unnecessary outage. Option c destroys the cluster and adds risk. Option d leaves you with split-version cluster and unpredictable behaviour.
Lesson complete — saved to your profile.
Almost! You need 70% (7 of 10) — re-read the section that tripped you up and tap "Try again".

Self-explanation prompt

In 2-3 sentences, explain to a hypothetical batchmate: "Why doesn't Active-Active double FortiGate throughput, even though both units are forwarding packets?" Writing it out cements the concept faster than re-reading.

📤 Teach a friend on WhatsApp: Share

📖 Mini-glossary — terms used in this blog

FGCP
FortiGate Clustering Protocol — Fortinet's proprietary HA protocol carried over HBdev.
HBdev
The heartbeat interface — dedicated link carrying FGCP hellos, config sync, and session sync.
A-P (Active-Passive)
Default HA mode. One unit active, the other watching. ~99% of production deployments.
A-A (Active-Active)
New sessions hash across both units; each session still owned by ONE unit. Not a throughput doubler.
Virtual MAC
Shared L2 address per monitored interface, owned by whichever unit is primary.
Session-pickup
Off by default. When on, primary streams live session table → secondary so TCP/UDP survive failover.
Config sync
Always-on. Primary pushes config to secondary; verify with diag sys ha checksum.
Override priority
When set override enable, priority beats uptime in election → pins specific unit as primary.
Monitor-interface
Listed in set monitor. If any monitored interface drops, that unit loses the election.
Virtual cluster
Multiple HA clusters running on the same physical pair (per-VDOM). Each VDOM elects independently.
Where this gets asked: Every production FortiGate deployment is HA — which means HA gets tested at every Fortinet interview round. an Indian enterprise, an Indian IT services firm, the payment-gateway customer, an Indian MSP and an Indian security firm all open with "explain HA" at the L1 screen, then drill into split-brain recovery and override priority at the L2 round, then into HA-aware firmware upgrades + CVE-2024-23113 awareness at the L3 round.

What's next?

Blog 4 opens up FortiGate VPNs — IPsec Phase 1 / Phase 2, SSL VPN, and how to harden them against the CVE-2024-21762 + CVE-2024-55591 exploitation wave still active across 2025.

📚 Sources

  1. Fortinet Docs — FortiGate HA Administration Guide (FortiOS 7.4 / 7.6) — FGCP cluster, session-pickup, override, uninterruptible-upgrade. docs.fortinet.com
  2. UniNets — Top Fortinet Firewall Interview Q&A 2025 — High Availability section. uninets.com/blog/fortinet-firewall-interview-questions-answers
  3. MindMajix — Top Fortinet Interview Q&A 2025 — HA, A-P vs A-A, split-brain. mindmajix.com/fortinet-interview-questions
  4. Glassdoor — Fortinet Network Security Engineer interview reports (an Indian enterprise / an Indian IT services firm / the payment-gateway customer / an Indian MSP, 2024-2025) — frequency-ranked HA questions. glassdoor.com
  5. NWKings — Top 20 Fortinet Firewall Interview Questions and Answers (2025) — HA failover + checksum drift. nwkings.com/fortinet-firewall-interview-questions-and-answers
  6. Fortinet Community — Troubleshooting Tip: FGCP cluster checksum mismatch + force synchronize. community.fortinet.com (LIVECommunity FGCP thread)
  7. Fortinet PSIRT — FG-IR-24-029 / CVE-2024-23113 advisory and fgfmd hardening guidance. fortinet.com/psirt