TTechclick ⚡ XP 0% All lessons
Juniper · Firewall · SRX Chassis Cluster HAInteractive · L1 / L2 / L3

Juniper SRX Chassis Cluster HA — node0/node1, Redundancy Groups & Failover

Two SRX firewalls can act as one. Chassis clustering pairs node0 and node1 over a control link and fabric links, groups interfaces into redundancy groups, and presents unified reth interfaces to the network — so when one node fails, traffic flips in seconds without a topology change. This lesson maps every piece and shows exactly how a failover decision is made.

📅 2026-06-20 · ⏱ 17 min · 4 infographics · live failover demo · 🏷 10-Q assessment + AI Tutor inline

⚡ Quick Answer

Master Juniper SRX chassis cluster HA in 2026: node0/node1 roles, control and fabric links, redundancy groups RG0/RG1, reth interfaces, failover triggers, preempt, and dual control links explained with lab-ready examples.

🎯 By the end you will be able to

Read as:

Pick where you want to start

1

Cluster basics

node0/node1, control link, fabric link roles.

2

Redundancy groups

RG0 vs RG1+, priority, preempt, weights.

3

reth interfaces

How reth binds both nodes into one interface.

4

Failover & hardening

Triggers, timing, dual control links, verify.

🧠 Warm-up — 3 questions, no score

Just notice which ones make you pause. We answer all three inside the lesson.

1. In an SRX chassis cluster, what does the control link carry?

Answered in Cluster basics.

2. Which redundancy group controls the routing engine (RE) primacy?

Answered in Redundancy groups.

3. What is a reth interface?

Answered in reth interfaces.

Most engineers think…

Most people picture SRX HA as 'an active/standby pair where the backup box just sits there'. That mental model fails both in the interview chair and when you're troubleshooting at 2 am.

SRX chassis clustering is a distributed single logical device: node0 and node1 share one configuration, one session table and one set of policies. The control link synchronises state; fabric links forward data-plane traffic from the standby node to the active node. Interfaces are bundled into redundancy groups and presented to the outside world as reth interfaces — so the network never sees a topology change during a failover. Understanding the split between RG0 (routing engine control) and RG1+ (data-plane groups), and how reth inherits its active node from its redundancy group, is what separates a confident answer from a guess.

① Cluster basics — node0, node1, control link and fabric links

An SRX chassis cluster joins exactly two SRX devices. One is assigned node0 and the other node1 via a bootstrap command (set chassis cluster cluster-id <id> node <0|1> reboot). After reboot both nodes load the same Junos configuration and behave as a single logical firewall to the rest of the network.

Two types of inter-node links connect them. The control link (typically a dedicated management-grade port, historically fxp1 on branch models) carries heartbeat pulses, configuration synchronisation and session state. If the control link goes silent, each node assumes the other has failed. The fabric links (ge/xe interfaces bound as fab0 on node0 and fab1 on node1) carry data-plane traffic: when the standby node receives a packet for an active session, it forwards it across the fabric to the active node rather than dropping it.

Because both nodes share one configuration, you manage the pair from a single point. A show chassis cluster status tells you which node holds each redundancy group right now.

Figure 1 — Chassis cluster data path — normal operation
Packets arrive at the active reth child on node0; the fabric link forwards session state so node1 can take over instantly.Chassis cluster data path — normal operationClient packetarrives at reth onnode0Control linkheartbeat + sessionsyncFabric linkdata forwarding pathActive nodenode0 processes packetEgressreply via reth toclient
Packets arrive at the active reth child on node0; the fabric link forwards session state so node1 can take over instantly.
Name both links and their jobs

In an interview, always separate the control link (heartbeat + sync) from the fabric links (data-plane forwarding). Mixing them up is the most common mistake — and interviewers test it directly.

Quick check · Q1 of 10 · Understand

What does the fabric link carry in an SRX chassis cluster?

Correct: b. Fabric links carry data-plane forwarding traffic. When the standby node receives a packet it cannot process locally (the session is active on the other node), it sends it across the fabric link to the active node. The control link handles heartbeat and sync.
👉 So far: Two nodes, two link types: control link (heartbeat + sync) and fabric links (data-plane forwarding). Both must be up for the cluster to work correctly.

② Redundancy groups — RG0 controls the RE, RG1+ control data

A redundancy group (RG) is the unit of failover. Every resource — interfaces, session tables — belongs to a redundancy group that is primary on exactly one node at a time. RG0 is special: it controls routing-engine (control-plane) primacy. The node that holds RG0 primary is the one whose routing protocols, management plane and jsrpd process are authoritative. You cannot enable preempt on RG0 — if you want to move RG0, you do a manual failover.

RG1 through RG127 are data-plane groups. Each one has a priority value configured per node (higher wins). When priorities tie, the lower node-id wins. Preempt, when enabled on a data-plane RG, lets the higher-priority node take back primary automatically after recovery — with an optional delay timer (introduced in Junos 17.4R1) to prevent flapping.

Interface monitoring

Each RG can monitor physical interfaces with assigned weights (0–255). If monitored interfaces fail and the cumulative weight falls to zero, the RG fails over to the other node. Juniper advises not to apply interface monitoring to RG0, because interface flaps would then trigger control-plane switchovers — a disruptive event.

Figure 2 — Redundancy group hierarchy
RG0 sits above data-plane groups; each RG is primary on exactly one node and owns a set of reth interfaces.Redundancy group hierarchyRG0 — Control planeRE primacy, no preempt allowedRG1 — Data-plane groupreth1/reth2, preempt optionalRG2+ — More data groupsload-balance across nodes
RG0 sits above data-plane groups; each RG is primary on exactly one node and owns a set of reth interfaces.
🔗
Control link
tap to flip

Carries heartbeat pulses, configuration sync and session state between node0 and node1. Loss triggers a failover arbitration.

🌐
Fabric link
tap to flip

Data-plane forwarding path between nodes. Packets received on the standby node are forwarded across the fabric to the active node for processing.

🔁
Redundancy group
tap to flip

The unit of failover — primary on one node at a time. RG0 owns the routing engine; RG1–127 own data-plane groups and reth interfaces.

🔌
reth interface
tap to flip

A logical pseudo-interface that spans both nodes, inheriting active/standby state from its RG. The network sees one stable IP and MAC regardless of which node is active.

Never enable interface monitoring on RG0

Interface monitoring on RG0 means an interface flap triggers a control-plane switchover — which is far more disruptive than a data-plane failover. Use interface monitoring only on RG1+ data-plane groups.

Quick check · Q2 of 10 · Remember

Which redundancy group controls routing-engine (control-plane) primacy in an SRX cluster?

Correct: c. RG0 is the dedicated control-plane redundancy group — it determines which node's routing engine and management plane are active. RG1 through RG127 are data-plane groups. Preempt cannot be enabled on RG0.
👉 So far: RG0 = control-plane primacy (no preempt). RG1–127 = data-plane groups with priority, optional preempt, and interface-weight monitoring.

③ reth interfaces — one logical interface across both nodes

A reth (redundant ethernet) interface is a pseudo-interface that binds at least one physical child interface from each node. You configure the reth at the logical level (IP, zone, family) once, and the cluster decides which node's physical child is the active forwarding path based on the redundancy group the reth belongs to.

When the redundancy group fails over, the reth's active child switches from node0's physical port to node1's physical port — but the IP address, MAC address and zone membership stay identical. To the upstream switch or router nothing changes: no ARP flush, no topology update, no route recalculation. That is the key interview line: reth makes the failover invisible to the network.

A reth interface inherits the active/standby state from its redundancy group. You bind a reth to an RG with set interfaces reth<N> redundant-ether-options redundancy-group <N>. You also set redundant-ether-options minimum-links 1 to keep the reth up as long as one child is alive. Multiple reth interfaces can belong to the same RG, and that RG can be primary on different nodes than other RGs — so you can load-balance data-plane groups across the two nodes.

Figure 3 — reth interface — one IP, two physical children
The reth presents one IP and one MAC to the network; the active child switches silently during failover.reth interface — one IP, two physical childrenreth0one IP, one MACge-0/0/1 (node0)ge-5/0/1 (node1)RG1 owns primacyZone: untrustmin-links 1
The reth presents one IP and one MAC to the network; the active child switches silently during failover.

▶ Watch an SRX cluster failover in real time

Step through a healthy active-standby data path, then Break it to see what happens when the control link fails — and how to fix it.

① Client SYNA client sends a SYN to the cluster's reth0 IP. node0 is primary for RG1, so its physical child ge-0/0/1 receives the packet.
② Session syncnode0 processes the SYN, creates a session entry, and synchronises it to node1 over the control link so node1 has a full copy.
③ Fabric forwardA later packet arrives at node1's physical child ge-5/0/1. Since RG1 is primary on node0, node1 encapsulates the packet and forwards it across the fabric link to node0.
④ RG failoverA maintenance event triggers a manual failover. RG1 moves to node1. The reth0 IP and MAC remain the same — the upstream switch sees no change and traffic resumes within seconds.
Press Play to step through the healthy cluster path. Then press Break it to see a control link failure.
Quick check · Q3 of 10 · Apply

A reth interface fails over from node0 to node1. What does the upstream router need to do?

Correct: b. The reth interface presents one stable IP and MAC to the network regardless of which node's physical child is active. Failover is transparent — no ARP flush, no route change, no topology update needed.
👉 So far: reth = one IP, one MAC, physical children from both nodes. The network sees no change during failover — reth makes it invisible.

④ Failover triggers, timing and hardening with dual control links

A redundancy group failover happens for three reasons: the control link goes down (heartbeat loss — each node assumes split-brain and the secondary takes primary if it wins arbitration), interface monitoring weight reaches zero (configured thresholds on the RG), or a manual failover is triggered (request chassis cluster failover redundancy-group <N> node <N>). Session sync over the fabric means established TCP/UDP sessions survive a data-plane RG failover with minimal interruption.

Dual control links

On high-end SRX platforms (SRX5600, SRX5800) you can configure dual control links. The jsrpd process sends and receives heartbeats on both links simultaneously. If one control link fails, the other keeps the cluster alive — preventing a spurious split-brain failover caused by a cable or port fault. This removes the control link as a single point of failure and is strongly recommended for carrier or data-centre deployments.

Preempt delay (Junos 17.4R1+) adds a configurable wait (1–21 600 seconds) before a recovering high-priority node reclaims primary, giving convergence time to settle before another RG move. Always test failover in a maintenance window: show chassis cluster status before, trigger, then verify RG primacy and traffic flow after.

Figure 4 — Single control link vs dual control links
Dual control links eliminate the single point of failure that can cause a false split-brain failover.Single control link vs dual control linksSingle control linkOne cable failure = split-brainBoth nodes may go primarySupported on all SRX modelsSimpler to cableDual control linksBoth links must fail forjsrpd heartbeats on both linksSRX5600/SRX5800 onlyRecommended for carrier/DC
Dual control links eliminate the single point of failure that can cause a false split-brain failover.

Rohan at a Mumbai financial services firm faces this

After a scheduled maintenance window, the SRX cluster keeps oscillating — RG1 flips between node0 and node1 every few minutes, causing brief traffic interruptions.

Likely cause

Preempt is enabled on RG1 but no preempt delay is set; node0 (higher priority) keeps recovering and immediately reclaiming primary before convergence stabilises.

Diagnosis

show chassis cluster status shows rapid RG1 primary changes; show log messages reveals repeated jsrpd preempt events triggered by node0 coming back online.

show chassis cluster status ▸ show log messages ▸ chassis cluster configuration ▸ redundancy-group 1
Fix

Add a preempt delay of 60–120 seconds: set chassis cluster redundancy-group 1 preempt delay 90. This gives node0 time to fully converge before it reclaims RG1 primary.

Verify

Commit, trigger a test failover, confirm RG1 waits the configured delay before moving back to node0; traffic interruptions drop to a single brief event per failover.

Always verify failover in a maintenance window

Run 'show chassis cluster status' before and after a manual failover. Confirm RG primacy moved, traffic resumed, and 'show security flow session' shows sessions re-synced. Never declare HA working without a live test.

Quick check · Q4 of 10 · Analyze

An SRX cluster keeps flip-flopping — RG1 moves back and forth between nodes every few minutes. Which feature would you enable to dampen this?

Correct: d. A preempt delay timer (set chassis cluster redundancy-group 1 preempt delay <seconds>, available from Junos 17.4R1) makes the recovering high-priority node wait before reclaiming primary, preventing rapid RG flapping. Dual fabric links address data-plane capacity, not preempt flapping.
👉 So far: Failover triggers: heartbeat loss, interface weight → 0, or manual command. Dual control links remove the single point of failure; preempt delay prevents flapping.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from vendor docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

📝 Wrap-up assessment — six more

You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.

Q5 · Remember

Which command assigns a device as node0 in an SRX chassis cluster?

Correct: a. The bootstrap command 'set chassis cluster cluster-id <id> node 0 reboot' assigns the node ID and cluster ID, then reboots the device into cluster mode. The other commands are for interface binding, manual failover, and NSSU — not initial cluster formation.
Q6 · Understand

Why is interface monitoring not recommended for RG0?

Correct: b. RG0 controls the routing engine. An interface flap that triggers an RG0 failover resets routing adjacencies and the management plane — far more disruptive than a data-plane RG failover. Interface monitoring should be applied only to data-plane RGs (RG1+).
Q7 · Apply

You want reth1 to be active on node1 during normal operation, and node0 to take over only on failure. How do you configure this?

Correct: a. Redundancy group primary is determined by priority — higher priority wins. To keep reth1 primary on node1, set node1's priority higher for that RG. Node0 will only take over if node1 fails.
Q8 · Analyze

Both SRX nodes are powered on after a simultaneous reboot. node0 priority is 200, node1 priority is 100 for RG1. Preempt is enabled. Which node holds RG1 primary?

Correct: a. With preempt enabled and node0 having the higher priority (200 vs 100), node0 will reclaim RG1 primary even if node1 was temporarily primary during boot. Preempt makes the higher-priority node authoritative.
Q9 · Evaluate

A carrier customer asks how to eliminate the control link as a single point of failure. Best answer?

Correct: b. Dual control links are the supported solution on SRX5600/5800: jsrpd sends heartbeats on both links simultaneously. As long as one is alive the cluster stays up. Preempt is not allowed on RG0; LACP is not the supported control-link mechanism.
Q10 · Evaluate

After a test failover, how do you confirm the cluster is healthy and sessions survived?

Correct: c. 'show chassis cluster status' confirms which node holds each RG. 'show security flow session' confirms sessions are still active and synced. A second failover test validates bidirectionality. Ping alone cannot confirm session sync or correct RG placement.
Lesson complete — saved to your profile.
Almost! You need 70% (7 of 10) — re-read the path that tripped you up and tap "Try again".

🧠 In your own words

Type one line: what is the difference between a control link and a fabric link in an SRX chassis cluster? Then compare with the expert version.

Expert version: The control link is the cluster's management and signalling channel — it carries heartbeat pulses that prove the other node is alive, configuration synchronisation so both nodes run the same policy, and session-state sync so established sessions can survive a failover. The fabric links are the data-plane forwarding path — when a packet lands on the standby node's physical interface, rather than dropping it the standby node encapsulates it and sends it across the fabric to the active node for processing. You can think of the control link as the nervous system and the fabric links as the circulatory system of the cluster.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📖 Glossary

Chassis cluster
An HA configuration pairing two SRX devices (node0 and node1) to act as a single logical firewall with shared configuration, session tables and policies.
Control link
The dedicated inter-node link carrying cluster heartbeat, configuration synchronisation and session-state updates.
Fabric link
The data-plane inter-node link that forwards traffic from the standby node to the active node when a packet arrives on the wrong node.
Redundancy group (RG)
The unit of failover — primary on exactly one node at a time. RG0 owns control-plane primacy; RG1–127 own data-plane resources and reth interfaces.
reth interface
A redundant ethernet pseudo-interface spanning both nodes, presenting one stable IP and MAC to the network. Active/standby state is inherited from its redundancy group.
Preempt
A redundancy-group setting that allows the higher-priority node to automatically reclaim primary after recovery. Not available on RG0.
Dual control links
Two physical control links between nodes (SRX5600/SRX5800 only); jsrpd heartbeats run on both, so one cable failure cannot cause split-brain.
Interface monitoring
A per-RG mechanism that assigns weights to physical interfaces; when cumulative weight falls to zero, the RG fails over. Recommended for RG1+ only, not RG0.
jsrpd
The Juniper Services Redundancy Protocol process — manages heartbeats, session sync, and redundancy group state across cluster nodes.
Split-brain
A failure mode where both cluster nodes believe the other is dead and both attempt to hold primary for the same RG, potentially causing traffic black holes.

📚 Sources

  1. Juniper Networks — Chassis Cluster Security Devices: Overview and Configuration. juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/
  2. Juniper Networks — Chassis Cluster Redundancy Groups. juniper.net/documentation/en_US/junos/topics/topic-map/security-chassis-cluster-redundancy-groups.html
  3. Juniper Networks — Chassis Cluster Redundant Ethernet Interfaces (reth). juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/topic-map/security-chassis-cluster-redundant-ethernet-interfaces.html
  4. Juniper Networks — Configuring Cluster Failover Parameters (preempt, delay timer, interface monitoring). juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/topic-map/security-chassis-cluster-failover-parameters.html
  5. Juniper Networks — Chassis Cluster Dual Control Links (SRX5600/SRX5800). juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/topic-map/security-chassis-cluster-dual-control-links.html
  6. Juniper Networks — Troubleshoot Redundancy Group Failover in a Chassis Cluster. juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/task/troubleshoot-srx-chassis-cluster-redundancy-group-not-failing-over.html

What's next?

Got clustering down? Next, master SRX security zones, policies and address books — the building blocks you configure on top of your HA pair.