TTechclick All lessons
Palo Alto · NGFW HA + Resiliency · A/P + A/AInteractive · L2 / L3

Palo Alto HA — Watch a Failover Happen in 12 Minutes

Active/Passive. Active/Active. HA1, HA2, HA3 links. Preempt, path-monitoring, split-brain, floating IPs. Skip the wall of text — pick a path, press Play on the failover animator, and watch what really happens when a heartbeat dies.

📅 2026-05-25 · ⏱ 12 min · 2 live failover demos · 🏷 10-Q assessment + AI Tutor inline

Pick your path — jump straight in

1

Active/Passive Basics

One firewall runs traffic, the other waits warm. The default deployment for 90% of enterprises.

2

HA1, HA2, HA3 Links

Heartbeats. Session sync. A/A packet forwarding. Which port does what.

3

Election & Failover

Priority, preempt, path-monitor, link-monitor — the rulebook the firewalls use to swap roles.

4

A/A + Floating IPs

Both firewalls forward traffic. Session owner, ARP owner, HA3 — and the split-brain trap.

① Active/Passive — the default everyone starts with

Two firewalls, identical hardware and PAN-OS. One sits active and forwards every byte of production traffic. The other sits passive — fully configured, fully booted, mirroring state in real time, but its dataplane interfaces are down. The world only ever sees the active firewall's MAC addresses on the LAN.

When the active firewall stops looking healthy (missed heartbeats, link down, monitored path unreachable), the passive node brings up its interfaces, sends a flurry of gratuitous ARPs to flip every neighbour's ARP cache to its own MAC, and inherits every session that was being mirrored over HA2. Done in roughly a second on dedicated HA hardware. End users see a short pause and the connection survives — that's the whole point.

🟢
Active
tap to flip

Dataplane interfaces are up. Owns every layer-2 MAC on the LAN. All production sessions live here. Sends heartbeats to the passive peer over HA1.

🟡
Passive
tap to flip

Dataplane interfaces are down. Fully booted, full config synced, session table mirrored over HA2. Watches for HA1 heartbeats — silent until needed.

🔁
Session Sync
tap to flip

The active firewall pushes every new session, NAT mapping, and decryption state to the passive peer over HA2 — Layer-2 link, ether-type 0x7261, unidirectional.

📣
G-ARP storm
tap to flip

On failover the new-active firewall sends gratuitous ARPs for every interface IP it owns. Neighbours immediately flip their ARP cache to the new MAC — that's how traffic moves in <1 second.

Watch a real A/P failover

▶ Active/Passive failover animator

Press Play. The dedicated HA1 cable on the active firewall is cut at stage 2. Watch the passive node take over.

① STEADY FW-A: active → forwarding 14 Gbps · FW-B: passive → silent, mirroring sessions
Sneha at Infosys: traffic flowing normally. HA1 heartbeats every 1000 ms (default).
② TRIGGER Active firewall's monitored interface ethernet1/2 goes down (cable yanked / upstream switch dies)
Link Monitoring fires — interface is in the LinkMon group with "any" failure condition.
③ DETECT FW-A signals failure on HA1. FW-B confirms within ~1 sec (3 missed heartbeats default).
Without Heartbeat Backup over MGT, even an HA1 cable cut here would risk split-brain.
④ ROLE SWAP FW-A moves to non-functional · FW-B promotes itself to active · brings its dataplane up
⑤ G-ARP STORM FW-B sends gratuitous ARPs for every interface IP it now owns → upstream switches and clients flip their ARP cache to FW-B's MAC
⑥ SESSIONS RESUME FW-B continues every session inherited via HA2 sync — TCP doesn't even reset
Total user-visible blip: ~800ms - 2s. Excellent for HTTP, fine for SSH, may stutter on live VoIP.
Press Play to watch the full failover sequence. Each press of Next advances one stage.
Quick check · Q1 of 10

Sneha at Infosys watches a planned failover and sees traffic resume on the new-active firewall in about 1.2 seconds. Which mechanism is mostly responsible for upstream switches and clients sending traffic to the NEW active firewall almost immediately?

Correct: b. The new-active firewall blasts out gratuitous ARPs (G-ARPs) for every interface IP it now owns. Neighbours' ARP caches flip from FW-A's MAC to FW-B's MAC in milliseconds — that's why the L3 IP follows the new firewall instantly without DHCP, DNS or routing reconvergence playing a role.

Palo Alto HA uses up to four physical (or sub-interface) cables to keep two firewalls in sync. Most A/P deployments use just HA1 + HA2; A/A adds HA3 plus an optional HA4 on PA-7000/5400 platforms. Memorise the table — every PCNSE cycle has a "which link carries X" question.

💓
HA1 — Control
tap

Layer-3 link, needs IPs. Carries heartbeats (ICMP-like), Hello, state-sync, config-sync, mgmt-plane mirror. TCP 28769/28260; if encryption is enabled, 28 only. Cut HA1 → split-brain risk.

📦
HA2 — Data
tap

Layer-2 link, ether-type 0x7261. Carries the live session table, NAT mappings, decryption state. Traffic is unidirectional active→passive (except HA2 keep-alive). Default keep-alive: log-only.

🔄
HA3 — Packet Forward
tap

Active/Active only. Layer-2 link using MAC-in-MAC encapsulation. Used to forward a packet to its session owner when the non-owner firewall receives it (asymmetric ingress). No L3 addressing, no encryption.

🛟
HA1 Backup
tap

Optional but strongly recommended. Backup heartbeat path. On PA-220 / VM-series without dedicated HA ports, use MGT as HA1 and dataplane as HA1-backup. Prevents split-brain when the primary HA1 link dies.

The "Heartbeat Backup" toggle is non-negotiable in prod

HA1 carries the heartbeat. If only one HA1 link exists and it dies, both firewalls believe the other is dead → both go active → split-brain → duplicate IPs on the LAN → outage worse than no HA at all. Fix: tick Heartbeat Backup in Device → High Availability → General. It piggy-backs heartbeats over the management interface as a secondary path. Free insurance.

Quick check · Q2 of 10

Rahul at TCS configures HA on a PA-3220 pair using a single dedicated HA1 cable between the firewalls. During a planned maintenance the HA1 cable is accidentally unplugged. Both firewalls immediately go active — duplicate-IP storm hits the LAN. What single change would have prevented this?

Correct: d. Heartbeat Backup is the antidote to split-brain caused by a single HA1 cable failure. It uses the management port as a redundant heartbeat path. Preempt has nothing to do with it; A/A doesn't fix HA1-loss split-brain; shorter hellos just speed up the wrong outcome. Always enable Heartbeat Backup in production.

③ Election & Failover — the rulebook

When two firewalls boot, they elect roles. When something breaks, they re-elect. The election logic is deterministic — memorise the precedence and you'll never lose a PCNSE election question.

The three election factors, in order

Device Priority
tap

Lower number wins. Default is 100. Configure the preferred-active firewall with priority 90, the other with 100. Always set this explicitly — never rely on defaults in prod.

Tiebreaker — MAC
tap

If priorities are equal, the firewall with the lower MAC address on the HA1 control link wins. Convenient default, terrible for predictability. Use explicit priorities.

Preempt
tap

If enabled on both firewalls, the higher-priority node reclaims active role after recovery (Preempt Hold Time = 1 min default). Off by default — and most ops teams leave it off, to avoid double failovers.

Hold-Down Timers
tap

Monitor Fail Hold (path/link), Promotion Hold, Preempt Hold. Defaults are conservative — tune carefully. Aggressive timers cause flap; slow timers cause user-visible outage.

Failover triggers — what actually causes a swap

Four things cause an HA swap, in rough order of frequency in production:

The most common HA flap in production

Path Monitoring configured against a single ISP gateway IP. ISP filters or rate-limits ICMP. Firewall sees 3 missed pings, flips to passive, peer takes over, peer's pings also fail because the rate-limit affects both, peer flips back. HA flaps every few minutes. Fix: monitor 2–3 destination IPs (e.g. 8.8.8.8 + 1.1.1.1 + your upstream router) with condition = "all", so single-IP filtering can't trigger a flap. Also widen the failure threshold to 5 consecutive misses on busy links.

CLI — confirm current HA state in one shot
show high-availability state
show high-availability all
Expected output (snippet)
Enabled: yes
Group ID: 1
Local Info:
    State:           active
    State Duration:  4 days 22 hours
    Priority:        90
    Preemptive:      no
    Mode:            Active-Passive
Peer Info:
    Connection HA1: up
    Connection HA2: up
    State:           passive
Quick check · Q3 of 10

Priya at HCL deploys a new A/P pair. FW-A has device priority 100, FW-B has device priority 100, preempt is disabled on both. FW-A's HA1 MAC ends in :1A:22; FW-B's ends in :1A:08. They power on at the same time. Which firewall becomes active and why?

Correct: c. With equal device priorities, PAN-OS uses the lower MAC on the HA1 control link as the tiebreaker. FW-B's :08 < FW-A's :22 → FW-B becomes active. This is exactly why you should always assign explicit non-equal priorities in production — MAC-based selection is correct but not memorable, and a hardware swap can flip which firewall becomes active. Preempt being off prevents reclaim, not initial election.

④ Active/Active — both firewalls work, with rules

A/A keeps both firewalls forwarding traffic at the same time. Asymmetric ingress is the rule, not the exception, so PAN-OS uses HA3 (packet-forwarding link) to ship a packet to whichever firewall owns the session it belongs to.

Two A/A roles to learn: Session Owner (the firewall that processes the full security stack for this flow) and Session Setup (the firewall that does the initial route + NAT + policy match and creates the session entry). The Session Owner is usually set to the firewall that receives the first packet ("First Packet" — recommended) so HA3 forwarding stays minimal.

▶ A/A session-owner + HA3 forwarding

An asymmetric return packet hits FW-B first. The session is owned by FW-A. Watch HA3 forward it.

① OUT Karthik at Flipkart office sends SYN via FW-A · session created on FW-A, owner = FW-A
Session Owner = "First Packet" (default), Session Setup = "Primary Device".
② SESSION SYNC FW-A pushes the new session entry to FW-B over HA2
Both firewalls now know about this session; only FW-A actually scans it.
③ ASYMMETRIC RETURN Return SYN-ACK from server gets routed back via FW-B (upstream router used FW-B as nexthop)
④ LOOK-UP FW-B does session lookup — finds the session, sees owner = FW-A
⑤ HA3 FORWARD FW-B wraps the packet in MAC-in-MAC and ships it over HA3 to FW-A for App-ID / Threat-Prevention scanning
⑥ FW-A PROCESSES FW-A inspects the return packet, then forwards it back to FW-B (which sends it out the original egress interface to Karthik)
A/A keeps stateful inspection intact even with asymmetric routing. The cost: extra HA3 hops on every asymmetric flow.
Press Play to see how HA3 rescues an asymmetric return flow. Reset and re-run to internalise it.

Floating IPs — the A/A LAN gateway pattern

Hosts on a LAN can use only one default-gateway IP. In A/A you give clients a Floating IP per VLAN — owned by one firewall at a time, but it migrates to the surviving firewall if its owner dies. Two patterns dominate:

The A/A floating-IP gotcha that costs hours

Traffic destined for a Floating IP via the non-owner firewall is not designed to traverse HA3 to the owner — it relies on neighbours having the right ARP. With asymmetric routing or BGP upstream, packets can land on the non-owner, where the destination IP doesn't belong to any of its interfaces, and get black-holed. Workaround: pin the upstream route so traffic to the Floating IP always lands on the owner, or use ARP Load-Sharing (PA-7000 / large platforms) to give both firewalls a piece of the floating IP. For most deployments: stick with single-Floating per subnet unless you genuinely need 2x throughput.

Quick check · Q4 of 10

Aditya at Wipro deploys A/A with Session Owner = "First Packet" and Session Setup = "Primary Device". Traffic patterns are highly asymmetric. After a week he notices that the HA3 link utilisation is at 60%. What's the most accurate explanation?

Correct: a. HA3 is the packet-forwarding link used precisely when a non-owner firewall receives a packet for an existing session. Asymmetric routing → lots of HA3 forwarding. The fix is design-time: size HA3 bandwidth to your peak asymmetric volume, prefer Session Owner = "First Packet" (so the ingress firewall usually is the owner), and use symmetric routing upstream where you can. HA3 doesn't carry session sync (that's HA2) or heartbeats (that's HA1).

⑤ Three commands you'll actually run during an incident

You're on a 2 AM call and the HA pair just flapped twice. Don't reach for the GUI. These three commands settle 80% of HA debates:

① State + peer status
show high-availability all
② What triggered the last failover
show high-availability transitions
debug log-receiver show | match ha-monitor
③ Live HA link health
show high-availability link-monitoring
show high-availability path-monitoring
Smoke-test a real failover safely

Before you trust HA in production, force a controlled swap during a maintenance window: request high-availability state suspend on the active firewall. Confirm the passive takes over cleanly, sessions persist, application probes succeed. Then request high-availability state functional on the suspended node. Repeat from the other side. Schedule this quarterly — HA you've never tested is HA you don't actually have.

🤖 Ask the AI Tutor

Tap any question — instant context-aware answer. No login, no waiting.

Pre-curated answers from PAN-OS docs + LIVECommunity TAC threads. For production HA design reviews, ping a Techclick instructor on chat.techclick.in.

📝 Wrap-up — six more

You've already answered 4 inline. Six left. 70% (7 of 10) total marks the lesson complete on your profile. Tap Submit all answers at the end.

Q5 · Apply

A team is replacing a single PA-3220 with an HA pair. Constraint: zero spare dedicated HA ports — they need to wire HA1, HA1-backup and HA2 using only dataplane and management interfaces. What's the recommended layout?

Correct: c. The documented PAN-OS best practice for platforms without dedicated HA ports (PA-220, PA-440, PA-3200, VM-series) is: dataplane port = HA1, MGT = HA1-backup, separate dataplane port = HA2. Never share MGT for both HA1 and HA2 (b) — congestion on MGT will tear down everything. Skipping HA1-backup (a) is the split-brain trap. Crossing HA1 over a switch (d) adds a SPOF.
Q6 · Analyze

An HA pair fails over every ~20 minutes. The show high-availability transitions output shows the cause as path-monitor-failed → recovered. The configuration pings a single ISP gateway IP. What's the most likely root cause and fix?

Correct: b. Single-IP path monitoring with ICMP filtering / rate-limiting is the most common HA-flap cause in real deployments. The cure: monitor multiple destination IPs (8.8.8.8 + 1.1.1.1 + your upstream router) with the "all" condition so a flap requires every target to fail. Also widen the failure-threshold to 5 consecutive misses. Preempt-induced flap looks different in the transition log (you'd see "preempt-triggered", not "path-monitor-failed").
Q7 · Analyze

An A/A pair has both firewalls active and forwarding traffic. The HA3 link is unplugged. What happens?

Correct: a. HA3 is mandatory in A/A precisely because it carries the cross-firewall packet forwarding for asymmetric flows. Lose HA3, asymmetric flows can no longer reach their session owner → the firewalls detect the impaired state, transition to "tentative", and one side eventually self-suspends to preserve session integrity. HA2 (session sync) cannot substitute for HA3 (packet forwarding) — different ether-types, different encapsulation, different purpose.
Q8 · Analyze

A team enables Preempt with the default Preempt Hold Time of 1 minute. The higher-priority firewall recovers from a reboot but its dataplane interfaces take ~90 seconds longer than the HA1 link to come fully UP. What happens?

Correct: d. Preempt is one of the most common self-inflicted outages. The firewall reclaims active role on a timer, not on a per-interface readiness check. If dataplane ports lag, you get a black-hole window. Either widen Preempt Hold Time enough to outlast worst-case interface-up, or — much more common in real ops — leave preempt OFF and only switch active manually during planned maintenance. Predictable beats clever.
Q9 · Evaluate

A site needs sub-second failover for VoIP. Currently A/P with default 1000 ms HA1 hello interval and 3-miss threshold. Operations proposes dropping hello to 200 ms and missed-hello threshold to 2 to achieve ~400 ms detection. Is this safe?

Correct: c. Sub-second HA detection is supported but has guardrails: use dedicated HA hardware ports (not shared dataplane), enable Heartbeat Backup, raise Promotion-Hold and Monitor-Fail-Hold so a brief micro-blip doesn't trigger a swap, and soak-test for at least a few weeks before promotion. A/A is not the right tool for VoIP availability — it doesn't make any single session "always-on", it just lets two firewalls forward in parallel.
Q10 · Evaluate

A team runs an A/P pair through a routine PAN-OS 11.1 → 11.2 upgrade. They upgrade FW-A first, FW-B second. After both upgrades, sessions reset and the GUI shows "version mismatch" warnings for 90 minutes between the two upgrades. How should the upgrade have been done?

Correct: b. Canonical HA-pair upgrade sequence: (1) suspend HA on the passive firewall, (2) upgrade it, (3) bring it back functional and let it become passive again, (4) suspend HA on the active firewall — failover occurs to the newly-upgraded passive, (5) upgrade the now-passive (formerly-active) firewall, (6) bring it back. PAN-OS only tolerates a small version delta on HA sync; longer windows cause exactly the symptoms in the question. Simultaneous upgrade (a, c) defeats the purpose of HA; preempt during upgrade (d) causes extra disruption.
Lesson complete — saved to your profile.
Almost! You need 70% (7 of 10) — re-read the section that tripped you up and tap "Try again".

📚 Sources

  1. Palo Alto Docs — Configure Active/Passive HA (PAN-OS 11.0 / 11.1). docs.paloaltonetworks.com
  2. Palo Alto Docs — HA Path Monitoring. docs.paloaltonetworks.com (Panorama admin)
  3. Palo Alto Docs — Configuration Guidelines for Active/Passive HA. docs.paloaltonetworks.com
  4. Palo Alto Knowledge Base — How To Avoid HA Split-Brain due to Missed Heartbeats. knowledgebase.paloaltonetworks.com
  5. Palo Alto Tech Note — PAN-OS Active/Active High Availability — Configuring Active/Active Clusters (PDF). live.paloaltonetworks.com
  6. LIVECommunity — Active/Active Floating IP / Traffic Forwarding Problem. live.paloaltonetworks.com/t5/general-topics/td-p/1926
  7. LIVECommunity — HA Configuration Questions / HA1 Backup Best Practice. live.paloaltonetworks.com/t5/general-topics/td-p/156050

What's next?

Up next: PBF & Multi-VR — how Policy-Based Forwarding overrides the FIB, when to use Symmetric Return, and how multi-VR + next-VR routes power the SD-WAN-style designs every dual-ISP enterprise eventually needs.