① Active/Passive — the default everyone starts with
Two firewalls, identical hardware and PAN-OS. One sits active and forwards every byte of production traffic. The other sits passive — fully configured, fully booted, mirroring state in real time, but its dataplane interfaces are down. The world only ever sees the active firewall's MAC addresses on the LAN.
When the active firewall stops looking healthy (missed heartbeats, link down, monitored path unreachable), the passive node brings up its interfaces, sends a flurry of gratuitous ARPs to flip every neighbour's ARP cache to its own MAC, and inherits every session that was being mirrored over HA2. Done in roughly a second on dedicated HA hardware. End users see a short pause and the connection survives — that's the whole point.
Dataplane interfaces are up. Owns every layer-2 MAC on the LAN. All production sessions live here. Sends heartbeats to the passive peer over HA1.
Dataplane interfaces are down. Fully booted, full config synced, session table mirrored over HA2. Watches for HA1 heartbeats — silent until needed.
The active firewall pushes every new session, NAT mapping, and decryption state to the passive peer over HA2 — Layer-2 link, ether-type 0x7261, unidirectional.
On failover the new-active firewall sends gratuitous ARPs for every interface IP it owns. Neighbours immediately flip their ARP cache to the new MAC — that's how traffic moves in <1 second.
Watch a real A/P failover
▶ Active/Passive failover animator
Press Play. The dedicated HA1 cable on the active firewall is cut at stage 2. Watch the passive node take over.
ethernet1/2 goes down (cable yanked / upstream switch dies)
non-functional · FW-B promotes itself to active · brings its dataplane up
Sneha at Infosys watches a planned failover and sees traffic resume on the new-active firewall in about 1.2 seconds. Which mechanism is mostly responsible for upstream switches and clients sending traffic to the NEW active firewall almost immediately?
② HA1, HA2, HA3 — three links, three jobs
Palo Alto HA uses up to four physical (or sub-interface) cables to keep two firewalls in sync. Most A/P deployments use just HA1 + HA2; A/A adds HA3 plus an optional HA4 on PA-7000/5400 platforms. Memorise the table — every PCNSE cycle has a "which link carries X" question.
Layer-3 link, needs IPs. Carries heartbeats (ICMP-like), Hello, state-sync, config-sync, mgmt-plane mirror. TCP 28769/28260; if encryption is enabled, 28 only. Cut HA1 → split-brain risk.
Layer-2 link, ether-type 0x7261. Carries the live session table, NAT mappings, decryption state. Traffic is unidirectional active→passive (except HA2 keep-alive). Default keep-alive: log-only.
Active/Active only. Layer-2 link using MAC-in-MAC encapsulation. Used to forward a packet to its session owner when the non-owner firewall receives it (asymmetric ingress). No L3 addressing, no encryption.
Optional but strongly recommended. Backup heartbeat path. On PA-220 / VM-series without dedicated HA ports, use MGT as HA1 and dataplane as HA1-backup. Prevents split-brain when the primary HA1 link dies.
HA1 carries the heartbeat. If only one HA1 link exists and it dies, both firewalls believe the other is dead → both go active → split-brain → duplicate IPs on the LAN → outage worse than no HA at all. Fix: tick Heartbeat Backup in Device → High Availability → General. It piggy-backs heartbeats over the management interface as a secondary path. Free insurance.
Rahul at TCS configures HA on a PA-3220 pair using a single dedicated HA1 cable between the firewalls. During a planned maintenance the HA1 cable is accidentally unplugged. Both firewalls immediately go active — duplicate-IP storm hits the LAN. What single change would have prevented this?
③ Election & Failover — the rulebook
When two firewalls boot, they elect roles. When something breaks, they re-elect. The election logic is deterministic — memorise the precedence and you'll never lose a PCNSE election question.
The three election factors, in order
Lower number wins. Default is 100. Configure the preferred-active firewall with priority 90, the other with 100. Always set this explicitly — never rely on defaults in prod.
If priorities are equal, the firewall with the lower MAC address on the HA1 control link wins. Convenient default, terrible for predictability. Use explicit priorities.
If enabled on both firewalls, the higher-priority node reclaims active role after recovery (Preempt Hold Time = 1 min default). Off by default — and most ops teams leave it off, to avoid double failovers.
Monitor Fail Hold (path/link), Promotion Hold, Preempt Hold. Defaults are conservative — tune carefully. Aggressive timers cause flap; slow timers cause user-visible outage.
Failover triggers — what actually causes a swap
Four things cause an HA swap, in rough order of frequency in production:
- Heartbeat loss — 3 consecutive missed HA1 hellos at the default 1000 ms interval. Tighter intervals (200 ms / 3 miss = 600 ms) on PA-5400 dedicated HA hardware.
- Link Monitoring — one or more monitored dataplane interfaces go down. Condition can be "any" (any single link failure triggers) or "all" (every monitored link must drop). Default is "any".
- Path Monitoring — the active firewall sends ICMP pings to one or more destination IPs (path group). 3 consecutive failures mark the path down. Use when you need to fail over because a downstream path is broken even though the local interface is up.
- Manual —
request high-availability state suspendon the active firewall. Clean controlled swap for upgrades / maintenance.
Path Monitoring configured against a single ISP gateway IP. ISP filters or rate-limits ICMP. Firewall sees 3 missed pings, flips to passive, peer takes over, peer's pings also fail because the rate-limit affects both, peer flips back. HA flaps every few minutes. Fix: monitor 2–3 destination IPs (e.g. 8.8.8.8 + 1.1.1.1 + your upstream router) with condition = "all", so single-IP filtering can't trigger a flap. Also widen the failure threshold to 5 consecutive misses on busy links.
show high-availability state show high-availability all
Enabled: yes
Group ID: 1
Local Info:
State: active
State Duration: 4 days 22 hours
Priority: 90
Preemptive: no
Mode: Active-Passive
Peer Info:
Connection HA1: up
Connection HA2: up
State: passive
Priya at HCL deploys a new A/P pair. FW-A has device priority 100, FW-B has device priority 100, preempt is disabled on both. FW-A's HA1 MAC ends in :1A:22; FW-B's ends in :1A:08. They power on at the same time. Which firewall becomes active and why?
:08 < FW-A's :22 → FW-B becomes active. This is exactly why you should always assign explicit non-equal priorities in production — MAC-based selection is correct but not memorable, and a hardware swap can flip which firewall becomes active. Preempt being off prevents reclaim, not initial election.④ Active/Active — both firewalls work, with rules
A/A keeps both firewalls forwarding traffic at the same time. Asymmetric ingress is the rule, not the exception, so PAN-OS uses HA3 (packet-forwarding link) to ship a packet to whichever firewall owns the session it belongs to.
Two A/A roles to learn: Session Owner (the firewall that processes the full security stack for this flow) and Session Setup (the firewall that does the initial route + NAT + policy match and creates the session entry). The Session Owner is usually set to the firewall that receives the first packet ("First Packet" — recommended) so HA3 forwarding stays minimal.
▶ A/A session-owner + HA3 forwarding
An asymmetric return packet hits FW-B first. The session is owned by FW-A. Watch HA3 forward it.
SYN via FW-A · session created on FW-A, owner = FW-A
SYN-ACK from server gets routed back via FW-B (upstream router used FW-B as nexthop)
Floating IPs — the A/A LAN gateway pattern
Hosts on a LAN can use only one default-gateway IP. In A/A you give clients a Floating IP per VLAN — owned by one firewall at a time, but it migrates to the surviving firewall if its owner dies. Two patterns dominate:
- Single Floating IP per subnet — simple HSRP-like gateway. One firewall owns the IP and responds to ARP for it. If that firewall dies, the other claims the IP (G-ARP) and traffic continues. Only one firewall ever forwards client→internet traffic for that subnet, so you don't double your throughput — but you do get HA.
- Two Floating IPs per subnet — split clients via DHCP scopes so half use Floating-IP-1 (owned by FW-A) and half use Floating-IP-2 (owned by FW-B). Now both firewalls actively forward, doubling effective throughput. On failure of one firewall, the surviving peer claims both IPs.
Traffic destined for a Floating IP via the non-owner firewall is not designed to traverse HA3 to the owner — it relies on neighbours having the right ARP. With asymmetric routing or BGP upstream, packets can land on the non-owner, where the destination IP doesn't belong to any of its interfaces, and get black-holed. Workaround: pin the upstream route so traffic to the Floating IP always lands on the owner, or use ARP Load-Sharing (PA-7000 / large platforms) to give both firewalls a piece of the floating IP. For most deployments: stick with single-Floating per subnet unless you genuinely need 2x throughput.
Aditya at Wipro deploys A/A with Session Owner = "First Packet" and Session Setup = "Primary Device". Traffic patterns are highly asymmetric. After a week he notices that the HA3 link utilisation is at 60%. What's the most accurate explanation?
⑤ Three commands you'll actually run during an incident
You're on a 2 AM call and the HA pair just flapped twice. Don't reach for the GUI. These three commands settle 80% of HA debates:
show high-availability all
show high-availability transitions debug log-receiver show | match ha-monitor
show high-availability link-monitoring show high-availability path-monitoring
Before you trust HA in production, force a controlled swap during a maintenance window: request high-availability state suspend on the active firewall. Confirm the passive takes over cleanly, sessions persist, application probes succeed. Then request high-availability state functional on the suspended node. Repeat from the other side. Schedule this quarterly — HA you've never tested is HA you don't actually have.
🤖 Ask the AI Tutor
Tap any question — instant context-aware answer. No login, no waiting.
Pre-curated answers from PAN-OS docs + LIVECommunity TAC threads. For production HA design reviews, ping a Techclick instructor on chat.techclick.in.
📝 Wrap-up — six more
You've already answered 4 inline. Six left. 70% (7 of 10) total marks the lesson complete on your profile. Tap Submit all answers at the end.
📚 Sources
- Palo Alto Docs — Configure Active/Passive HA (PAN-OS 11.0 / 11.1). docs.paloaltonetworks.com
- Palo Alto Docs — HA Path Monitoring. docs.paloaltonetworks.com (Panorama admin)
- Palo Alto Docs — Configuration Guidelines for Active/Passive HA. docs.paloaltonetworks.com
- Palo Alto Knowledge Base — How To Avoid HA Split-Brain due to Missed Heartbeats. knowledgebase.paloaltonetworks.com
- Palo Alto Tech Note — PAN-OS Active/Active High Availability — Configuring Active/Active Clusters (PDF). live.paloaltonetworks.com
- LIVECommunity — Active/Active Floating IP / Traffic Forwarding Problem. live.paloaltonetworks.com/t5/general-topics/td-p/1926
- LIVECommunity — HA Configuration Questions / HA1 Backup Best Practice. live.paloaltonetworks.com/t5/general-topics/td-p/156050
What's next?
Up next: PBF & Multi-VR — how Policy-Based Forwarding overrides the FIB, when to use Symmetric Return, and how multi-VR + next-VR routes power the SD-WAN-style designs every dual-ISP enterprise eventually needs.