Why this matters — the pilot and the co-pilot
Imagine a Boeing 777 doing the Mumbai–Singapore route. There's a pilot flying and a co-pilot on the right seat. The co-pilot isn't a passenger — both hands are on the yoke, both pairs of eyes on the instruments. If the pilot has a heart attack mid-flight, the co-pilot doesn't need to be "called from the back". Control transfers in seconds. The passengers in 23F never notice.
That's exactly what a FortiGate FGCP cluster does for the network. Two physical firewalls, one virtual identity. One actively pushing traffic ("primary"), the other quietly verifying config + watching the heartbeat ("secondary"). When the primary fails, the secondary doesn't reboot, doesn't relearn — it takes the controls in under 2 seconds. Users keep their TCP sessions. Voice calls don't drop. The payments client doesn't see 5xx.
That single image — pilot and co-pilot, both hands on the yoke — is what every interviewer is testing for when they ask "explain HA to me." Get it right and the next 10 minutes are yours.
Mukesh (L2 network engineer, 14 months in) is deploying a 2-node FGCP cluster for a payments client. Two FortiGate 200F units, racked side-by-side. Internal subnet 10.30.10.0/24 behind the cluster's virtual IP 10.30.0.1. WAN side on 203.0.113.20. HA1 and HA2 ports cross-cabled between the two boxes for the heartbeat. The deployment plan says "A-P, primary on the left, RPO=0".
Mukesh's senior throws him three questions before signoff: "What happens if the heartbeat cable falls out?" · "If I add a second WAN circuit, should you change to A-A?" · "How do you upgrade firmware without dropping the payments traffic?" Three good questions. Three classic interview rounds. We answer all three in this blog.
The three building blocks every interviewer wants you to name
Before we dive in, here are the three terms you will be asked to define out loud — separated, in plain English. Get these crisp and you've already passed the L1 round.
- HBdev (heartbeat interface) — the cable (or VLAN) between the two FortiGates carrying the FGCP hellos. Both hands on the yoke.
- Virtual MAC — the shared layer-2 address the cluster announces on each monitored interface, so failover doesn't break the upstream switch's MAC table.
- FGCP election — the algorithm that picks the primary when the cluster forms or when something breaks.
FGCP election in 30 seconds
When the cluster forms (or reforms after a failure), FGCP picks a primary in this strict order — first tie-breaker wins:
- Monitored interface health — a unit with a failed monitored interface (e.g. wan1 down) can never become primary.
- HA priority — higher wins (default 128 on every unit; you set it explicitly in your config).
- Uptime — longer-running unit wins, but ONLY when
set overrideis disabled (default). - Serial number — lexicographic; lower wins as a last resort.
set ha-uptime override is NOT a thing — the right trick is set override enable on the unit you want to win, then set its priority higher than its peer's. This is the only reliable way to pin a specific unit as primary across reboots, because with override enabled, priority now beats uptime in the election. Without override, the unit that booted first keeps winning forever — which is fine, until you reboot it and the "wrong" unit takes over.
The cluster from CLI — what you actually type
Here's the minimal HA config Mukesh runs on both units (they MUST match group-name and password — that's how they find each other on the HA link):
config system ha
set group-name "payments-cluster"
set mode a-p
set password ENC SuperLongHaSecret_2026
set hbdev "ha1" 200 "ha2" 100
set session-pickup enable
set ha-mgmt-status enable
config ha-mgmt-interfaces
edit 1
set interface "mgmt"
set gateway 10.30.99.1
next
end
set override enable
set priority 200 # set 100 on FortiGate-B
set monitor "wan1" "internal"
end
get system ha status
HA Health Status: OK Model: FortiGate-200F Mode: HA A-P Group Name: payments-cluster Number of Members: 2 Cluster Uptime: 0 days 0:14:12 Primary : FortiGate-A FGT200F-0001 priority=200 Secondary: FortiGate-B FGT200F-0002 priority=100 HBDEV stats: ha1 up · ha2 up · 0 lost hello packets Configuration Status: FortiGate-A(updated 5s ago): in-sync FortiGate-B(updated 5s ago): in-sync
Two things to read in that output. First, HBDEV stats — both heartbeat links up, zero lost hellos. Second, Configuration Status — both members in-sync. If either line is off, you have a problem; we'll walk through both later.
FortiGate Clustering Protocol. Fortinet's proprietary HA protocol. Hellos every 200 ms by default. 5 missed hellos = peer declared dead → election starts.
Dedicated link carrying FGCP hellos + config-sync + (with session-pickup on) session table sync. Always cable 2 HBdevs — one is single-point-of-failure.
Off by default. When ON, primary streams live session table to secondary over HBdev → failover keeps existing TCP/UDP sessions alive. Critical for VoIP / payments.
Always on. Primary pushes config to secondary over HBdev. Verify with diag sys ha checksum — every section must match between members.
Shared L2 address per monitored interface (format 00:09:0f:09:XX:YY). After failover, the new primary sends a gratuitous ARP → upstream switch updates its CAM table in <1s.
set override enable + higher priority = this unit always wins the election. Without override, longest-uptime wins → "wrong" unit may stay primary forever after a reboot.
Before you scroll — Mukesh configures set priority 200 on FortiGate-A but forgets set override enable. He boots A first, then B. Months later, A reboots. When A comes back, who is primary?
set override enable; only then does priority beat uptime in the election. Fix: enable override on both units, then run execute ha manage 0 + execute ha synchronize start.
Mukesh — network engineer at a Hyderabad payment-gateway customer is asked: "What is the role of HBdev in a FortiGate FGCP cluster?"
Failover in slow-mo — wan1 down, payments stay up
The most common interview ask after "explain HA" is "walk me through a failover, step by step". Most candidates say "primary fails, secondary takes over" — too vague. Here's the actual sequence Mukesh saw the day his client's left rack lost power.
▶ Watch failover play out — 7 stages
Press Play to auto-advance, or Next to step manually. Stage 1 to 7 — primary dies on the left, secondary takes over on the right.
get system ha status → both members healthy.
Mukesh's payments client says "we cannot lose a single TCP session during failover." Which one setting must Mukesh enable on the cluster?
A-P vs A-A — and the throughput myth that costs candidates the job
The single most common Fortinet HA interview trap: "Should I move to Active-Active to double my firewall throughput?" Most candidates say yes. Most candidates are wrong.
A-P (Active-Passive) is the default. One unit pushes traffic; the other watches. Simple, predictable, easy to debug. ~99% of production FortiGate deployments use A-P.
A-A (Active-Active) load-balances new sessions across both units — but each session is still owned by exactly ONE unit for its lifetime. Asymmetric flows (where the SYN takes one path and the SYN-ACK another) still get hair-pinned over HBdev so both halves go through the owning unit's flow engine. Net effect: A-A helps when you have lots of short-lived sessions (DNS, HTTP/1.1 connections, brute small APIs); it does not double throughput for a single big flow.
The hard truth recruiters want to hear: each session in A-A is still owned by exactly ONE unit (the one that received the SYN). A 5 Gbps Veeam backup stream rides one box. A 2 Gbps SQL replication stream rides one box. The other unit doesn't carry that traffic. A-A helps when you have lots of small, independent sessions — DNS, REST API calls, HTTP/1.1, UTM-heavy AV scanning. It does not help a single fat flow. Say this line in the interview and you've separated yourself from 90% of candidates.
Mukesh's senior asks: "We have 50,000 IoT devices behind the cluster, each doing a 200-byte MQTT keepalive every 30 seconds. A-P or A-A?"
Split-brain — when both units believe they're primary
Split-brain is the textbook HA disaster. Both FortiGates simultaneously claim to be primary. Both respond to the cluster's virtual IP. Both send gratuitous ARPs for the virtual MAC. The upstream switch's MAC table flaps between ports. ARP caches on every downstream host go schizophrenic. TCP sessions break in 2-3 seconds. The only way to trigger split-brain is to lose ALL heartbeat connectivity between the units.
That's why best practice is "always cable two HBdev ports". The probability of one heartbeat cable failing is non-zero; the probability of both failing at once is many orders of magnitude lower. Mukesh's cluster has both ha1 and ha2 cross-cabled — exactly so a single cable bump can't trigger this.
▶ Watch split-brain happen — and watch it heal
7 stages — both heartbeats die, both units self-promote, ARP chaos hits the LAN, you restore the cable, the cluster re-elects.
get system ha status on the mgmt port — sees "Number of Members: 1" on BOTH boxes.
diag sys ha checksum to confirm config-sync, then writes the RCA.
If you cut ALL HBdev paths simultaneously, FortiGate has no out-of-band way to ask "is my peer still alive?" — it assumes dead. FGCP does NOT use the WAN or LAN interfaces as a tie-breaker by default. The defenses are physical: two HBdev cables (so a single cable cut doesn't kill the link), routed HBdev over a dedicated VLAN if direct cabling is impractical, and reserved HA management interface (ha-mgmt-status enable) so even during split-brain you can still SSH each unit individually and reseat the cable.
Mukesh suspects split-brain. He SSHes into FortiGate-A and runs get system ha status. The output says Number of Members: 1. What is the next single command that confirms — or rules out — split-brain?
set ha-mgmt-status enable is best practice). Option a destroys evidence and may make recovery worse. Option b checks data-plane but doesn't tell you cluster role. Option d is a policy-match tool, unrelated.HA-aware upgrades — patching without downtime
Mukesh's senior's third question: "how do you upgrade firmware without dropping the payments traffic?" The answer is the FortiOS uninterruptable upgrade flag, which is on by default. Here's exactly what happens.
When you push a firmware image to the cluster's primary, FortiOS detects HA mode and follows this script automatically:
- Push the image to secondary first. Secondary reboots into the new firmware. Cluster runs split-version for ~30 s while B is rebooting.
- Once B is back up on the new firmware and rejoins the cluster, primary triggers an HA failover: the now-upgraded B becomes primary, A becomes secondary.
- Primary (now A, still on old firmware) upgrades itself, reboots into the new firmware, rejoins as secondary.
- If
set override enablewas set on A with higher priority, another failover flips A back to primary. Otherwise B stays primary.
Total user-visible blackout: 1-2 seconds, twice. ~30 seconds of running on a single unit (during each reboot). Same as a normal failover.
config system ha
set uninterruptible-upgrade enable
end
# upload + apply firmware
execute restore image tftp FGT_200F-v7.6.2.M-build0123-FORTINET.out 10.30.99.50
Image checksum verified. Installing image to FortiGate-B (secondary)... FortiGate-B: image installed, rebooting... FortiGate-B: rejoined cluster on FortiOS 7.6.2 HA failover: FortiGate-B is now primary FortiGate-A: receiving image... FortiGate-A: image installed, rebooting... FortiGate-A: rejoined cluster on FortiOS 7.6.2 HA failover (override): FortiGate-A is now primary Cluster upgrade complete.
If a partial config change happened on the primary but didn't sync to secondary before the upgrade started, you can end up with a checksum mismatch AFTER the upgrade — both units on the new firmware, but with diverging configs. Always run diag sys ha checksum BEFORE you push the firmware. If any section's checksum differs between members, run execute ha synchronize start from the primary and re-verify before proceeding.
After any HA change (config, upgrade, cable reseat), run this 2-line check:
get system ha status — confirm both members listed, both in-sync, HBDEV stats showing zero lost hellos.
diag sys ha checksum — confirm per-section checksums match across members. ANY mismatch = sync issue = fix before signoff.
Quick lab — try this in your home lab tonight
If you have two FortiGate VMs running on the same hypervisor (or two physical units), this 6-step lab walks you through the full lifecycle Mukesh just saw. Takes 25 minutes end-to-end.
- Bring up two FortiGate VMs on FortiOS 7.4 / 7.6. Static-IP the mgmt port on each (e.g. 10.30.99.10 + 10.30.99.11).
- Add a virtual switch / shared network between them for HBdev — route both ha1 and ha2 over it.
- Paste the HA config block from this blog onto BOTH units (change
priorityto 100 on the secondary). Verifyget system ha statuson both — should see "Number of Members: 2". - Enable
set session-pickup enable. From a client behind the cluster, start a long-running curl with--continue-at -against a public file. Confirm it survives. - Pull the cable from primary's wan1 (or shut the interface). Watch the 1-2 s failover. Verify
get system ha statuson B shows it as primary. - Restore wan1. If override was set on A, A re-takes primary. Run
diag sys ha checksum+diag sys ha dump-by all-vdomto confirm cluster healthy.
In early 2024, Fortinet PSIRT disclosed CVE-2024-23113 — a format-string vulnerability in the FortiOS fgfmd daemon. The daemon also handles parts of the FGCP protocol exposed on HBdev and reserved HA management interfaces. Defensive lesson for this blog: the HBdev link must never traverse a routed / shared L3 path that's reachable from production VLANs. Cross-cable HBdev directly, or use a dedicated isolated VLAN that no user traffic can reach. CISA flagged this as actively exploited in 2024. Source: Fortinet PSIRT FG-IR-24-029.
Mukesh runs diag sys ha checksum on FortiGate-A and sees the firewall.policy checksum is 0x4a2b1f08, but on B it's 0x7c14e0bb. What's the single best next command — and what is he likely to find?
execute ha synchronize start from the primary, then re-run diag sys ha checksum. Most likely root cause: someone (or a partial config push) committed a change directly on the secondary while config-sync was queued — or there was a sync-failure during a recent upgrade. Force-sync from primary will overwrite the secondary's diverging section. If checksums still don't match after force-sync, run get system ha status to confirm HBdev is up and hellos aren't being dropped — a flapping HBdev silently breaks sync without flipping the cluster state. Top L3 muscle memory.
🤖 Ask the AI Tutor
Tap any question — instant context-aware answer. Tuned on FortiOS 7.4 / 7.6 docs + community.
Pre-curated answers grounded in FortiOS 7.4 / 7.6 docs + LIVECommunity. For complex prod issues, paste your get system ha status + diag sys ha checksum output into chat.techclick.in.
📝 Final round — seven more
You've already answered 3 inline. Seven more. 70% (7 of 10) total marks this lesson complete on your Techclick profile. Tap Submit all answers at the end.
Self-explanation prompt
In 2-3 sentences, explain to a hypothetical batchmate: "Why doesn't Active-Active double FortiGate throughput, even though both units are forwarding packets?" Writing it out cements the concept faster than re-reading.
📖 Mini-glossary — terms used in this blog
- FGCP
- FortiGate Clustering Protocol — Fortinet's proprietary HA protocol carried over HBdev.
- HBdev
- The heartbeat interface — dedicated link carrying FGCP hellos, config sync, and session sync.
- A-P (Active-Passive)
- Default HA mode. One unit active, the other watching. ~99% of production deployments.
- A-A (Active-Active)
- New sessions hash across both units; each session still owned by ONE unit. Not a throughput doubler.
- Virtual MAC
- Shared L2 address per monitored interface, owned by whichever unit is primary.
- Session-pickup
- Off by default. When on, primary streams live session table → secondary so TCP/UDP survive failover.
- Config sync
- Always-on. Primary pushes config to secondary; verify with
diag sys ha checksum. - Override priority
- When
set override enable, priority beats uptime in election → pins specific unit as primary. - Monitor-interface
- Listed in
set monitor. If any monitored interface drops, that unit loses the election. - Virtual cluster
- Multiple HA clusters running on the same physical pair (per-VDOM). Each VDOM elects independently.
What's next?
Blog 4 opens up FortiGate VPNs — IPsec Phase 1 / Phase 2, SSL VPN, and how to harden them against the CVE-2024-21762 + CVE-2024-55591 exploitation wave still active across 2025.
📚 Sources
- Fortinet Docs — FortiGate HA Administration Guide (FortiOS 7.4 / 7.6) — FGCP cluster, session-pickup, override, uninterruptible-upgrade. docs.fortinet.com
- UniNets — Top Fortinet Firewall Interview Q&A 2025 — High Availability section. uninets.com/blog/fortinet-firewall-interview-questions-answers
- MindMajix — Top Fortinet Interview Q&A 2025 — HA, A-P vs A-A, split-brain. mindmajix.com/fortinet-interview-questions
- Glassdoor — Fortinet Network Security Engineer interview reports (an Indian enterprise / an Indian IT services firm / the payment-gateway customer / an Indian MSP, 2024-2025) — frequency-ranked HA questions. glassdoor.com
- NWKings — Top 20 Fortinet Firewall Interview Questions and Answers (2025) — HA failover + checksum drift. nwkings.com/fortinet-firewall-interview-questions-and-answers
- Fortinet Community — Troubleshooting Tip: FGCP cluster checksum mismatch + force synchronize. community.fortinet.com (LIVECommunity FGCP thread)
- Fortinet PSIRT — FG-IR-24-029 / CVE-2024-23113 advisory and fgfmd hardening guidance. fortinet.com/psirt