A team is replacing a single PA-3220 with an HA pair. Constraint: zero spare dedicated HA ports — they need to wire HA1, HA1-backup and HA2 using only dataplane and management interfaces. What's the recommended layout?

Correct: c. The documented PAN-OS best practice for platforms without dedicated HA ports (PA-220, PA-440, PA-3200, VM-series) is: dataplane port = HA1, MGT = HA1-backup, separate dataplane port = HA2. Never share MGT for both HA1 and HA2 (b) — congestion on MGT will tear down everything. Skipping HA1-backup (a) is the split-brain trap. Crossing HA1 over a switch (d) adds a SPOF.

An HA pair fails over every ~20 minutes. The show high-availability transitions output shows the cause as path-monitor-failed → recovered . The configuration pings a single ISP gateway IP. What's the most likely root cause and fix?

Correct: b. Single-IP path monitoring with ICMP filtering / rate-limiting is the most common HA-flap cause in real deployments. The cure: monitor multiple destination IPs (8.8.8.8 + 1.1.1.1 + your upstream router) with the "all" condition so a flap requires every target to fail. Also widen the failure-threshold to 5 consecutive misses. Preempt-induced flap looks different in the transition log (you'd see "preempt-triggered", not "path-monitor-failed").

An A/A pair has both firewalls active and forwarding traffic. The HA3 link is unplugged. What happens?

Correct: a. HA3 is mandatory in A/A precisely because it carries the cross-firewall packet forwarding for asymmetric flows. Lose HA3, asymmetric flows can no longer reach their session owner → the firewalls detect the impaired state, transition to "tentative", and one side eventually self-suspends to preserve session integrity. HA2 (session sync) cannot substitute for HA3 (packet forwarding) — different ether-types, different encapsulation, different purpose.

A team enables Preempt with the default Preempt Hold Time of 1 minute. The higher-priority firewall recovers from a reboot but its dataplane interfaces take ~90 seconds longer than the HA1 link to come fully UP. What happens?

Correct: d. Preempt is one of the most common self-inflicted outages. The firewall reclaims active role on a timer, not on a per-interface readiness check. If dataplane ports lag, you get a black-hole window. Either widen Preempt Hold Time enough to outlast worst-case interface-up, or — much more common in real ops — leave preempt OFF and only switch active manually during planned maintenance. Predictable beats clever.

A site needs sub-second failover for VoIP. Currently A/P with default 1000 ms HA1 hello interval and 3-miss threshold. Operations proposes dropping hello to 200 ms and missed-hello threshold to 2 to achieve ~400 ms detection. Is this safe?

Correct: c. Sub-second HA detection is supported but has guardrails: use dedicated HA hardware ports (not shared dataplane), enable Heartbeat Backup, raise Promotion-Hold and Monitor-Fail-Hold so a brief micro-blip doesn't trigger a swap, and soak-test for at least a few weeks before promotion. A/A is not the right tool for VoIP availability — it doesn't make any single session "always-on", it just lets two firewalls forward in parallel.

A team runs an A/P pair through a routine PAN-OS 11.1 → 11.2 upgrade. They upgrade FW-A first, FW-B second. After both upgrades, sessions reset and the GUI shows "version mismatch" warnings for 90 minutes between the two upgrades. How should the upgrade have been done?

Correct: b. Canonical HA-pair upgrade sequence: (1) suspend HA on the passive firewall, (2) upgrade it, (3) bring it back functional and let it become passive again, (4) suspend HA on the active firewall — failover occurs to the newly-upgraded passive, (5) upgrade the now-passive (formerly-active) firewall, (6) bring it back. PAN-OS only tolerates a small version delta on HA sync; longer windows cause exactly the symptoms in the question. Simultaneous upgrade (a, c) defeats the purpose of HA; preempt during upgrade (d) causes extra disruption.

Palo Alto HA Active/Passive vs Active/Active — Visual, Interactive, AI-Era

Q: Sneha at Infosys watches a planned failover and sees traffic resume on the new-active firewall in about 1.2 seconds. Which mechanism is mostly responsible for upstream switches and clients sending traffic to the NEW active firewall almost immediately?

Correct: b. The new-active firewall blasts out gratuitous ARPs (G-ARPs) for every interface IP it now owns. Neighbours' ARP caches flip from FW-A's MAC to FW-B's MAC in milliseconds — that's why the L3 IP follows the new firewall instantly without DHCP, DNS or routing reconvergence playing a role.

Q: Rahul at TCS configures HA on a PA-3220 pair using a single dedicated HA1 cable between the firewalls. During a planned maintenance the HA1 cable is accidentally unplugged. Both firewalls immediately go active — duplicate-IP storm hits the LAN. What single change would have prevented this?

Correct: d. Heartbeat Backup is the antidote to split-brain caused by a single HA1 cable failure. It uses the management port as a redundant heartbeat path. Preempt has nothing to do with it; A/A doesn't fix HA1-loss split-brain; shorter hellos just speed up the wrong outcome. Always enable Heartbeat Backup in production.

Q: Priya at HCL deploys a new A/P pair. FW-A has device priority 100, FW-B has device priority 100, preempt is disabled on both. FW-A's HA1 MAC ends in :1A:22 ; FW-B's ends in :1A:08 . They power on at the same time. Which firewall becomes active and why?

Correct: c. With equal device priorities, PAN-OS uses the lower MAC on the HA1 control link as the tiebreaker. FW-B's :08 < FW-A's :22 → FW-B becomes active. This is exactly why you should always assign explicit non-equal priorities in production — MAC-based selection is correct but not memorable, and a hardware swap can flip which firewall becomes active. Preempt being off prevents reclaim , not initial election.

Q: Aditya at Wipro deploys A/A with Session Owner = "First Packet" and Session Setup = "Primary Device". Traffic patterns are highly asymmetric. After a week he notices that the HA3 link utilisation is at 60%. What's the most accurate explanation?

Correct: a. HA3 is the packet-forwarding link used precisely when a non-owner firewall receives a packet for an existing session. Asymmetric routing → lots of HA3 forwarding. The fix is design-time: size HA3 bandwidth to your peak asymmetric volume, prefer Session Owner = "First Packet" (so the ingress firewall usually is the owner), and use symmetric routing upstream where you can. HA3 doesn't carry session sync (that's HA2) or heartbeats (that's HA1).

Content-specific feature visual for this lesson: use it as the 60-second map before reading the full detail.

Infographic: concept-to-practice path

Start with the mental model, then move into the workflow, evidence, and practice questions.

Infographic: evidence ladder

Use this ladder when the question asks for troubleshooting, rollout, or proof.

Infographic: healthy vs broken thinking

This comparison turns the article into an interview and troubleshooting checklist.

Infographic: mini runbook

Convert the learning into a practical story you can explain to a manager or interviewer.

① Active/Passive — the default everyone starts with

Two firewalls, identical hardware and PAN-OS. One sits active and forwards every byte of production traffic. The other sits passive — fully configured, fully booted, mirroring state in real time, but its dataplane interfaces are down. The world only ever sees the active firewall's MAC addresses on the LAN.

When the active firewall stops looking healthy (missed heartbeats, link down, monitored path unreachable), the passive node brings up its interfaces, sends a flurry of gratuitous ARPs to flip every neighbour's ARP cache to its own MAC, and inherits every session that was being mirrored over HA2. Done in roughly a second on dedicated HA hardware. End users see a short pause and the connection survives — that's the whole point.

🟢

Active

tap to flip

Dataplane interfaces are up. Owns every layer-2 MAC on the LAN. All production sessions live here. Sends heartbeats to the passive peer over HA1.

🟡

Passive

tap to flip

Dataplane interfaces are down. Fully booted, full config synced, session table mirrored over HA2. Watches for HA1 heartbeats — silent until needed.

🔁

Session Sync

tap to flip

The active firewall pushes every new session, NAT mapping, and decryption state to the passive peer over HA2 — Layer-2 link, ether-type 0x7261, unidirectional.

📣

G-ARP storm

tap to flip

On failover the new-active firewall sends gratuitous ARPs for every interface IP it owns. Neighbours immediately flip their ARP cache to the new MAC — that's how traffic moves in <1 second.

Watch a real A/P failover

Legend HA peer / state in focus active / syncing active-primary / healthy failed / suspended state label / inactive

▶ Active/Passive failover animator

Press Play. The dedicated HA1 cable on the active firewall is cut at stage 2. Watch the passive node take over.

① STEADY FW-A: active → forwarding 14 Gbps · FW-B: passive → silent, mirroring sessions

Sneha at Infosys: traffic flowing normally. HA1 heartbeats every 1000 ms (default).

▼

② TRIGGER Active firewall's monitored interface ethernet1/2 goes down (cable yanked / upstream switch dies)

Link Monitoring fires — interface is in the LinkMon group with "any" failure condition.

▼

③ DETECT FW-A signals failure on HA1. FW-B confirms within ~1 sec (3 missed heartbeats default).

Without Heartbeat Backup over MGT, even an HA1 cable cut here would risk split-brain.

▼

④ ROLE SWAP FW-A moves to non-functional · FW-B promotes itself to active · brings its dataplane up

▼

⑤ G-ARP STORM FW-B sends gratuitous ARPs for every interface IP it now owns → upstream switches and clients flip their ARP cache to FW-B's MAC

▼

⑥ SESSIONS RESUME FW-B continues every session inherited via HA2 sync — TCP doesn't even reset

Total user-visible blip: ~800ms - 2s. Excellent for HTTP, fine for SSH, may stutter on live VoIP.

Press Play to watch the full failover sequence. Each press of Next advances one stage.

Quick check · Q1 of 10

Sneha at Infosys watches a planned failover and sees traffic resume on the new-active firewall in about 1.2 seconds. Which mechanism is mostly responsible for upstream switches and clients sending traffic to the NEW active firewall almost immediately?

a) DNS cache invalidation — clients re-resolve the gateway hostname b) Gratuitous ARPs flooded by the new-active firewall force every neighbour to flip its ARP cache to the new MAC c) STP recalculates and reconverges around the failed switch port d) OSPF reconverges in <1 sec on the new-active firewall

Correct: b. The new-active firewall blasts out gratuitous ARPs (G-ARPs) for every interface IP it now owns. Neighbours' ARP caches flip from FW-A's MAC to FW-B's MAC in milliseconds — that's why the L3 IP follows the new firewall instantly without DHCP, DNS or routing reconvergence playing a role.

② HA1, HA2, HA3 — three links, three jobs

Palo Alto HA uses up to four physical (or sub-interface) cables to keep two firewalls in sync. Most A/P deployments use just HA1 + HA2; A/A adds HA3 plus an optional HA4 on PA-7000/5400 platforms. Memorise the table — every PCNSE cycle has a "which link carries X" question.

💓

HA1 — Control

tap

Layer-3 link, needs IPs. Carries heartbeats (ICMP-like), Hello, state-sync, config-sync, mgmt-plane mirror. TCP 28769/28260; if encryption is enabled, 28 only. Cut HA1 → split-brain risk.

📦

HA2 — Data

tap

Layer-2 link, ether-type 0x7261. Carries the live session table, NAT mappings, decryption state. Traffic is unidirectional active→passive (except HA2 keep-alive). Default keep-alive: log-only.

🔄

HA3 — Packet Forward

tap

Active/Active only. Layer-2 link using MAC-in-MAC encapsulation. Used to forward a packet to its session owner when the non-owner firewall receives it (asymmetric ingress). No L3 addressing, no encryption.

🛟

HA1 Backup

tap

Optional but strongly recommended. Backup heartbeat path. On PA-220 / VM-series without dedicated HA ports, use MGT as HA1 and dataplane as HA1-backup. Prevents split-brain when the primary HA1 link dies.

The "Heartbeat Backup" toggle is non-negotiable in prod

HA1 carries the heartbeat. If only one HA1 link exists and it dies, both firewalls believe the other is dead → both go active → split-brain → duplicate IPs on the LAN → outage worse than no HA at all. Fix: tick Heartbeat Backup in Device → High Availability → General. It piggy-backs heartbeats over the management interface as a secondary path. Free insurance.

Quick check · Q2 of 10

Rahul at TCS configures HA on a PA-3220 pair using a single dedicated HA1 cable between the firewalls. During a planned maintenance the HA1 cable is accidentally unplugged. Both firewalls immediately go active — duplicate-IP storm hits the LAN. What single change would have prevented this?

a) Disable preempt b) Move to Active/Active mode c) Configure shorter HA1 hello-interval d) Enable Heartbeat Backup so HA hellos also flow over the management interface — when the dedicated HA1 link dies, the backup path keeps each firewall aware that its peer is still alive

Correct: d. Heartbeat Backup is the antidote to split-brain caused by a single HA1 cable failure. It uses the management port as a redundant heartbeat path. Preempt has nothing to do with it; A/A doesn't fix HA1-loss split-brain; shorter hellos just speed up the wrong outcome. Always enable Heartbeat Backup in production.

③ Election & Failover — the rulebook

When two firewalls boot, they elect roles. When something breaks, they re-elect. The election logic is deterministic — memorise the precedence and you'll never lose a PCNSE election question.

The three election factors, in order

①

Device Priority

tap

Lower number wins. Default is 100. Configure the preferred-active firewall with priority 90, the other with 100. Always set this explicitly — never rely on defaults in prod.

②

Tiebreaker — MAC

tap

If priorities are equal, the firewall with the lower MAC address on the HA1 control link wins. Convenient default, terrible for predictability. Use explicit priorities.

③

Preempt

tap

If enabled on both firewalls, the higher-priority node reclaims active role after recovery (Preempt Hold Time = 1 min default). Off by default — and most ops teams leave it off, to avoid double failovers.

⏱

Hold-Down Timers

tap

Monitor Fail Hold (path/link), Promotion Hold, Preempt Hold. Defaults are conservative — tune carefully. Aggressive timers cause flap; slow timers cause user-visible outage.

Failover triggers — what actually causes a swap

Four things cause an HA swap, in rough order of frequency in production:

Heartbeat loss — 3 consecutive missed HA1 hellos at the default 1000 ms interval. Tighter intervals (200 ms / 3 miss = 600 ms) on PA-5400 dedicated HA hardware.
Link Monitoring — one or more monitored dataplane interfaces go down. Condition can be "any" (any single link failure triggers) or "all" (every monitored link must drop). Default is "any".
Path Monitoring — the active firewall sends ICMP pings to one or more destination IPs (path group). 3 consecutive failures mark the path down. Use when you need to fail over because a downstream path is broken even though the local interface is up.
Manual — request high-availability state suspend on the active firewall. Clean controlled swap for upgrades / maintenance.

The most common HA flap in production

Path Monitoring configured against a single ISP gateway IP. ISP filters or rate-limits ICMP. Firewall sees 3 missed pings, flips to passive, peer takes over, peer's pings also fail because the rate-limit affects both, peer flips back. HA flaps every few minutes. Fix: monitor 2–3 destination IPs (e.g. 8.8.8.8 + 1.1.1.1 + your upstream router) with condition = "all", so single-IP filtering can't trigger a flap. Also widen the failure threshold to 5 consecutive misses on busy links.

CLI — confirm current HA state in one shot

show high-availability state
show high-availability all

Expected output (snippet)

Enabled: yes
Group ID: 1
Local Info:
    State:           active
    State Duration:  4 days 22 hours
    Priority:        90
    Preemptive:      no
    Mode:            Active-Passive
Peer Info:
    Connection HA1: up
    Connection HA2: up
    State:           passive

Quick check · Q3 of 10

Priya at HCL deploys a new A/P pair. FW-A has device priority 100, FW-B has device priority 100, preempt is disabled on both. FW-A's HA1 MAC ends in :1A:22; FW-B's ends in :1A:08. They power on at the same time. Which firewall becomes active and why?

a) FW-A — first one to send the heartbeat wins b) Neither — they'll split-brain because preempt is off c) FW-B — equal priorities means the tiebreaker is the lower MAC on the HA1 link; FW-B's :08 is lower than FW-A's :22 d) FW-A — alphabetical hostname tiebreaker

Correct: c. With equal device priorities, PAN-OS uses the lower MAC on the HA1 control link as the tiebreaker. FW-B's :08 < FW-A's :22 → FW-B becomes active. This is exactly why you should always assign explicit non-equal priorities in production — MAC-based selection is correct but not memorable, and a hardware swap can flip which firewall becomes active. Preempt being off prevents reclaim, not initial election.

④ Active/Active — both firewalls work, with rules

A/A keeps both firewalls forwarding traffic at the same time. Asymmetric ingress is the rule, not the exception, so PAN-OS uses HA3 (packet-forwarding link) to ship a packet to whichever firewall owns the session it belongs to.

Two A/A roles to learn: Session Owner (the firewall that processes the full security stack for this flow) and Session Setup (the firewall that does the initial route + NAT + policy match and creates the session entry). The Session Owner is usually set to the firewall that receives the first packet ("First Packet" — recommended) so HA3 forwarding stays minimal.

▶ A/A session-owner + HA3 forwarding

An asymmetric return packet hits FW-B first. The session is owned by FW-A. Watch HA3 forward it.

① OUT Karthik at Flipkart office sends SYN via FW-A · session created on FW-A, owner = FW-A

Session Owner = "First Packet" (default), Session Setup = "Primary Device".

▼

② SESSION SYNC FW-A pushes the new session entry to FW-B over HA2

Both firewalls now know about this session; only FW-A actually scans it.

▼

③ ASYMMETRIC RETURN Return SYN-ACK from server gets routed back via FW-B (upstream router used FW-B as nexthop)

▼

④ LOOK-UP FW-B does session lookup — finds the session, sees owner = FW-A

▼

⑤ HA3 FORWARD FW-B wraps the packet in MAC-in-MAC and ships it over HA3 to FW-A for App-ID / Threat-Prevention scanning

▼

⑥ FW-A PROCESSES FW-A inspects the return packet, then forwards it back to FW-B (which sends it out the original egress interface to Karthik)

A/A keeps stateful inspection intact even with asymmetric routing. The cost: extra HA3 hops on every asymmetric flow.

Press Play to see how HA3 rescues an asymmetric return flow. Reset and re-run to internalise it.

Floating IPs — the A/A LAN gateway pattern

Hosts on a LAN can use only one default-gateway IP. In A/A you give clients a Floating IP per VLAN — owned by one firewall at a time, but it migrates to the surviving firewall if its owner dies. Two patterns dominate:

Single Floating IP per subnet — simple HSRP-like gateway. One firewall owns the IP and responds to ARP for it. If that firewall dies, the other claims the IP (G-ARP) and traffic continues. Only one firewall ever forwards client→internet traffic for that subnet, so you don't double your throughput — but you do get HA.
Two Floating IPs per subnet — split clients via DHCP scopes so half use Floating-IP-1 (owned by FW-A) and half use Floating-IP-2 (owned by FW-B). Now both firewalls actively forward, doubling effective throughput. On failure of one firewall, the surviving peer claims both IPs.

The A/A floating-IP gotcha that costs hours

Traffic destined for a Floating IP via the non-owner firewall is not designed to traverse HA3 to the owner — it relies on neighbours having the right ARP. With asymmetric routing or BGP upstream, packets can land on the non-owner, where the destination IP doesn't belong to any of its interfaces, and get black-holed. Workaround: pin the upstream route so traffic to the Floating IP always lands on the owner, or use ARP Load-Sharing (PA-7000 / large platforms) to give both firewalls a piece of the floating IP. For most deployments: stick with single-Floating per subnet unless you genuinely need 2x throughput.

Quick check · Q4 of 10

Aditya at Wipro deploys A/A with Session Owner = "First Packet" and Session Setup = "Primary Device". Traffic patterns are highly asymmetric. After a week he notices that the HA3 link utilisation is at 60%. What's the most accurate explanation?

a) HA3 forwards every packet whose ingress firewall is not the session owner; with heavily asymmetric flows, large fractions of traffic ride HA3 — that's expected and not a bug, but plan HA3 bandwidth ≥ peak asymmetric volume b) HA3 carries HA1 heartbeats — upgrade PAN-OS to reduce c) HA3 carries session sync — switch to HA4 to offload d) HA3 is broken — open a TAC case

Correct: a. HA3 is the packet-forwarding link used precisely when a non-owner firewall receives a packet for an existing session. Asymmetric routing → lots of HA3 forwarding. The fix is design-time: size HA3 bandwidth to your peak asymmetric volume, prefer Session Owner = "First Packet" (so the ingress firewall usually is the owner), and use symmetric routing upstream where you can. HA3 doesn't carry session sync (that's HA2) or heartbeats (that's HA1).

⑤ Three commands you'll actually run during an incident

You're on a 2 AM call and the HA pair just flapped twice. Don't reach for the GUI. These three commands settle 80% of HA debates:

① State + peer status

show high-availability all

② What triggered the last failover

show high-availability transitions
debug log-receiver show | match ha-monitor

③ Live HA link health

show high-availability link-monitoring
show high-availability path-monitoring

Smoke-test a real failover safely

Before you trust HA in production, force a controlled swap during a maintenance window: request high-availability state suspend on the active firewall. Confirm the passive takes over cleanly, sessions persist, application probes succeed. Then request high-availability state functional on the suspended node. Repeat from the other side. Schedule this quarterly — HA you've never tested is HA you don't actually have.

🤖 Ask the AI Tutor

Tap any question — instant context-aware answer. No login, no waiting.

Pre-curated answers from PAN-OS docs + LIVECommunity TAC threads. For production HA design reviews, ping a Techclick instructor on chat.techclick.in.

📚 Sources

Palo Alto Docs — Configure Active/Passive HA (PAN-OS 11.0 / 11.1). docs.paloaltonetworks.com
Palo Alto Docs — HA Path Monitoring. docs.paloaltonetworks.com (Panorama admin)
Palo Alto Docs — Configuration Guidelines for Active/Passive HA. docs.paloaltonetworks.com
Palo Alto Knowledge Base — How To Avoid HA Split-Brain due to Missed Heartbeats. knowledgebase.paloaltonetworks.com
Palo Alto Tech Note — PAN-OS Active/Active High Availability — Configuring Active/Active Clusters (PDF). live.paloaltonetworks.com
LIVECommunity — Active/Active Floating IP / Traffic Forwarding Problem. live.paloaltonetworks.com/t5/general-topics/td-p/1926
LIVECommunity — HA Configuration Questions / HA1 Backup Best Practice. live.paloaltonetworks.com/t5/general-topics/td-p/156050

What's next?

Up next: PBF & Multi-VR — how Policy-Based Forwarding overrides the FIB, when to use Symmetric Return, and how multi-VR + next-VR routes power the SD-WAN-style designs every dual-ISP enterprise eventually needs.

Blog 12 · PBF & Multi-VR → ← Recap NAT Deep-Dive

📩 Quiz me on this in 7 days. Opt in and we'll email you 3 micro-questions from this lesson at Day 1, Day 7 and Day 30 — spaced repetition is how it sticks. Un-tick any time.

Palo Alto HA — Watch a Failover Happen in 12 Minutes

Pick your path — jump straight in

Active/Passive Basics

HA1, HA2, HA3 Links

Election & Failover

A/A + Floating IPs

① Active/Passive — the default everyone starts with

Watch a real A/P failover

▶ Active/Passive failover animator

② HA1, HA2, HA3 — three links, three jobs

③ Election & Failover — the rulebook

The three election factors, in order

Failover triggers — what actually causes a swap

④ Active/Active — both firewalls work, with rules

▶ A/A session-owner + HA3 forwarding

Floating IPs — the A/A LAN gateway pattern

⑤ Three commands you'll actually run during an incident

🤖 Ask the AI Tutor

📝 Wrap-up — six more

📚 Sources

What's next?