Which command assigns a device as node0 in an SRX chassis cluster?

Correct: a. The bootstrap command 'set chassis cluster cluster-id node 0 reboot' assigns the node ID and cluster ID, then reboots the device into cluster mode. The other commands are for interface binding, manual failover, and NSSU — not initial cluster formation.

Why is interface monitoring not recommended for RG0?

Correct: b. RG0 controls the routing engine. An interface flap that triggers an RG0 failover resets routing adjacencies and the management plane — far more disruptive than a data-plane RG failover. Interface monitoring should be applied only to data-plane RGs (RG1+).

You want reth1 to be active on node1 during normal operation, and node0 to take over only on failure. How do you configure this?

Correct: a. Redundancy group primary is determined by priority — higher priority wins. To keep reth1 primary on node1, set node1's priority higher for that RG. Node0 will only take over if node1 fails.

Both SRX nodes are powered on after a simultaneous reboot. node0 priority is 200, node1 priority is 100 for RG1. Preempt is enabled. Which node holds RG1 primary?

Correct: a. With preempt enabled and node0 having the higher priority (200 vs 100), node0 will reclaim RG1 primary even if node1 was temporarily primary during boot. Preempt makes the higher-priority node authoritative.

A carrier customer asks how to eliminate the control link as a single point of failure. Best answer?

Correct: b. Dual control links are the supported solution on SRX5600/5800: jsrpd sends heartbeats on both links simultaneously. As long as one is alive the cluster stays up. Preempt is not allowed on RG0; LACP is not the supported control-link mechanism.

After a test failover, how do you confirm the cluster is healthy and sessions survived?

Correct: c. 'show chassis cluster status' confirms which node holds each RG. 'show security flow session' confirms sessions are still active and synced. A second failover test validates bidirectionality. Ping alone cannot confirm session sync or correct RG placement.

Juniper SRX Chassis Cluster HA — node0/node1, reth & Failover Guide

Q: What does the fabric link carry in an SRX chassis cluster?

Correct: b. Fabric links carry data-plane forwarding traffic. When the standby node receives a packet it cannot process locally (the session is active on the other node), it sends it across the fabric link to the active node. The control link handles heartbeat and sync.

Q: Which redundancy group controls routing-engine (control-plane) primacy in an SRX cluster?

Correct: c. RG0 is the dedicated control-plane redundancy group — it determines which node's routing engine and management plane are active. RG1 through RG127 are data-plane groups. Preempt cannot be enabled on RG0.

Q: A reth interface fails over from node0 to node1. What does the upstream router need to do?

Correct: b. The reth interface presents one stable IP and MAC to the network regardless of which node's physical child is active. Failover is transparent — no ARP flush, no route change, no topology update needed.

Q: An SRX cluster keeps flip-flopping — RG1 moves back and forth between nodes every few minutes. Which feature would you enable to dampen this?

Correct: d. A preempt delay timer (set chassis cluster redundancy-group 1 preempt delay , available from Junos 17.4R1) makes the recovering high-priority node wait before reclaiming primary, preventing rapid RG flapping. Dual fabric links address data-plane capacity, not preempt flapping.

Most engineers think…

Most people picture SRX HA as 'an active/standby pair where the backup box just sits there'. That mental model fails both in the interview chair and when you're troubleshooting at 2 am.

SRX chassis clustering is a distributed single logical device: node0 and node1 share one configuration, one session table and one set of policies. The control link synchronises state; fabric links forward data-plane traffic from the standby node to the active node. Interfaces are bundled into redundancy groups and presented to the outside world as reth interfaces — so the network never sees a topology change during a failover. Understanding the split between RG0 (routing engine control) and RG1+ (data-plane groups), and how reth inherits its active node from its redundancy group, is what separates a confident answer from a guess.

① Cluster basics — node0, node1, control link and fabric links

An SRX chassis cluster joins exactly two SRX devices. One is assigned node0 and the other node1 via a bootstrap command (set chassis cluster cluster-id <id> node <0|1> reboot). After reboot both nodes load the same Junos configuration and behave as a single logical firewall to the rest of the network.

Two types of inter-node links connect them. The control link (typically a dedicated management-grade port, historically fxp1 on branch models) carries heartbeat pulses, configuration synchronisation and session state. If the control link goes silent, each node assumes the other has failed. The fabric links (ge/xe interfaces bound as fab0 on node0 and fab1 on node1) carry data-plane traffic: when the standby node receives a packet for an active session, it forwards it across the fabric to the active node rather than dropping it.

Because both nodes share one configuration, you manage the pair from a single point. A show chassis cluster status tells you which node holds each redundancy group right now.

Figure 1 — Chassis cluster data path — normal operation

Packets arrive at the active reth child on node0; the fabric link forwards session state so node1 can take over instantly.

Name both links and their jobs

In an interview, always separate the control link (heartbeat + sync) from the fabric links (data-plane forwarding). Mixing them up is the most common mistake — and interviewers test it directly.

Quick check · Q1 of 10 · Understand

What does the fabric link carry in an SRX chassis cluster?

a) Heartbeat and configuration synchronisationb) Data-plane traffic forwarded from the standby node to the active nodec) Management SSH sessions to the clusterd) Routing protocol updates only

Correct: b. Fabric links carry data-plane forwarding traffic. When the standby node receives a packet it cannot process locally (the session is active on the other node), it sends it across the fabric link to the active node. The control link handles heartbeat and sync.

👉 So far: Two nodes, two link types: control link (heartbeat + sync) and fabric links (data-plane forwarding). Both must be up for the cluster to work correctly.

② Redundancy groups — RG0 controls the RE, RG1+ control data

A redundancy group (RG) is the unit of failover. Every resource — interfaces, session tables — belongs to a redundancy group that is primary on exactly one node at a time. RG0 is special: it controls routing-engine (control-plane) primacy. The node that holds RG0 primary is the one whose routing protocols, management plane and jsrpd process are authoritative. You cannot enable preempt on RG0 — if you want to move RG0, you do a manual failover.

RG1 through RG127 are data-plane groups. Each one has a priority value configured per node (higher wins). When priorities tie, the lower node-id wins. Preempt, when enabled on a data-plane RG, lets the higher-priority node take back primary automatically after recovery — with an optional delay timer (introduced in Junos 17.4R1) to prevent flapping.

Interface monitoring

Each RG can monitor physical interfaces with assigned weights (0–255). If monitored interfaces fail and the cumulative weight falls to zero, the RG fails over to the other node. Juniper advises not to apply interface monitoring to RG0, because interface flaps would then trigger control-plane switchovers — a disruptive event.

Figure 2 — Redundancy group hierarchy

RG0 sits above data-plane groups; each RG is primary on exactly one node and owns a set of reth interfaces.

🔗

Control link

tap to flip

Carries heartbeat pulses, configuration sync and session state between node0 and node1. Loss triggers a failover arbitration.

🌐

Fabric link

tap to flip

Data-plane forwarding path between nodes. Packets received on the standby node are forwarded across the fabric to the active node for processing.

🔁

Redundancy group

tap to flip

The unit of failover — primary on one node at a time. RG0 owns the routing engine; RG1–127 own data-plane groups and reth interfaces.

🔌

reth interface

tap to flip

A logical pseudo-interface that spans both nodes, inheriting active/standby state from its RG. The network sees one stable IP and MAC regardless of which node is active.

Never enable interface monitoring on RG0

Interface monitoring on RG0 means an interface flap triggers a control-plane switchover — which is far more disruptive than a data-plane failover. Use interface monitoring only on RG1+ data-plane groups.

Quick check · Q2 of 10 · Remember

Which redundancy group controls routing-engine (control-plane) primacy in an SRX cluster?

a) RG1b) RG127c) RG0d) Any RG can be designated as the control-plane group

Correct: c. RG0 is the dedicated control-plane redundancy group — it determines which node's routing engine and management plane are active. RG1 through RG127 are data-plane groups. Preempt cannot be enabled on RG0.

👉 So far: RG0 = control-plane primacy (no preempt). RG1–127 = data-plane groups with priority, optional preempt, and interface-weight monitoring.

③ reth interfaces — one logical interface across both nodes

A reth (redundant ethernet) interface is a pseudo-interface that binds at least one physical child interface from each node. You configure the reth at the logical level (IP, zone, family) once, and the cluster decides which node's physical child is the active forwarding path based on the redundancy group the reth belongs to.

When the redundancy group fails over, the reth's active child switches from node0's physical port to node1's physical port — but the IP address, MAC address and zone membership stay identical. To the upstream switch or router nothing changes: no ARP flush, no topology update, no route recalculation. That is the key interview line: reth makes the failover invisible to the network.

A reth interface inherits the active/standby state from its redundancy group. You bind a reth to an RG with set interfaces reth<N> redundant-ether-options redundancy-group <N>. You also set redundant-ether-options minimum-links 1 to keep the reth up as long as one child is alive. Multiple reth interfaces can belong to the same RG, and that RG can be primary on different nodes than other RGs — so you can load-balance data-plane groups across the two nodes.

Figure 3 — reth interface — one IP, two physical children

The reth presents one IP and one MAC to the network; the active child switches silently during failover.

▶ Watch an SRX cluster failover in real time

Step through a healthy active-standby data path, then Break it to see what happens when the control link fails — and how to fix it.

① Client SYNA client sends a SYN to the cluster's reth0 IP. node0 is primary for RG1, so its physical child ge-0/0/1 receives the packet.

▼

② Session syncnode0 processes the SYN, creates a session entry, and synchronises it to node1 over the control link so node1 has a full copy.

▼

③ Fabric forwardA later packet arrives at node1's physical child ge-5/0/1. Since RG1 is primary on node0, node1 encapsulates the packet and forwards it across the fabric link to node0.

▼

④ RG failoverA maintenance event triggers a manual failover. RG1 moves to node1. The reth0 IP and MAC remain the same — the upstream switch sees no change and traffic resumes within seconds.

Press Play to step through the healthy cluster path. Then press Break it to see a control link failure.

Quick check · Q3 of 10 · Apply

A reth interface fails over from node0 to node1. What does the upstream router need to do?

a) Flush its ARP cache and update its routing tableb) Nothing — the reth keeps the same IP and MAC addressc) Re-establish all TCP sessions from scratchd) Send a gratuitous ARP to reclaim the IP

Correct: b. The reth interface presents one stable IP and MAC to the network regardless of which node's physical child is active. Failover is transparent — no ARP flush, no route change, no topology update needed.

👉 So far: reth = one IP, one MAC, physical children from both nodes. The network sees no change during failover — reth makes it invisible.

④ Failover triggers, timing and hardening with dual control links

A redundancy group failover happens for three reasons: the control link goes down (heartbeat loss — each node assumes split-brain and the secondary takes primary if it wins arbitration), interface monitoring weight reaches zero (configured thresholds on the RG), or a manual failover is triggered (request chassis cluster failover redundancy-group <N> node <N>). Session sync over the fabric means established TCP/UDP sessions survive a data-plane RG failover with minimal interruption.

Dual control links

On high-end SRX platforms (SRX5600, SRX5800) you can configure dual control links. The jsrpd process sends and receives heartbeats on both links simultaneously. If one control link fails, the other keeps the cluster alive — preventing a spurious split-brain failover caused by a cable or port fault. This removes the control link as a single point of failure and is strongly recommended for carrier or data-centre deployments.

Preempt delay (Junos 17.4R1+) adds a configurable wait (1–21 600 seconds) before a recovering high-priority node reclaims primary, giving convergence time to settle before another RG move. Always test failover in a maintenance window: show chassis cluster status before, trigger, then verify RG primacy and traffic flow after.

Figure 4 — Single control link vs dual control links

Dual control links eliminate the single point of failure that can cause a false split-brain failover.

Rohan at a Mumbai financial services firm faces this

After a scheduled maintenance window, the SRX cluster keeps oscillating — RG1 flips between node0 and node1 every few minutes, causing brief traffic interruptions.

Likely cause

Preempt is enabled on RG1 but no preempt delay is set; node0 (higher priority) keeps recovering and immediately reclaiming primary before convergence stabilises.

Diagnosis

show chassis cluster status shows rapid RG1 primary changes; show log messages reveals repeated jsrpd preempt events triggered by node0 coming back online.

show chassis cluster status ▸ show log messages ▸ chassis cluster configuration ▸ redundancy-group 1

Fix

Add a preempt delay of 60–120 seconds: set chassis cluster redundancy-group 1 preempt delay 90. This gives node0 time to fully converge before it reclaims RG1 primary.

Verify

Commit, trigger a test failover, confirm RG1 waits the configured delay before moving back to node0; traffic interruptions drop to a single brief event per failover.

Always verify failover in a maintenance window

Run 'show chassis cluster status' before and after a manual failover. Confirm RG primacy moved, traffic resumed, and 'show security flow session' shows sessions re-synced. Never declare HA working without a live test.

Quick check · Q4 of 10 · Analyze

An SRX cluster keeps flip-flopping — RG1 moves back and forth between nodes every few minutes. Which feature would you enable to dampen this?

a) Disable preempt on RG0b) Enable dual fabric linksc) Configure a preempt delay timer on RG1 (Junos 17.4R1+)d) Add more weight to the interface monitor

Correct: d. A preempt delay timer (set chassis cluster redundancy-group 1 preempt delay <seconds>, available from Junos 17.4R1) makes the recovering high-priority node wait before reclaiming primary, preventing rapid RG flapping. Dual fabric links address data-plane capacity, not preempt flapping.

👉 So far: Failover triggers: heartbeat loss, interface weight → 0, or manual command. Dual control links remove the single point of failure; preempt delay prevents flapping.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from vendor docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

🧠 In your own words

Type one line: what is the difference between a control link and a fabric link in an SRX chassis cluster? Then compare with the expert version.

Expert version: The control link is the cluster's management and signalling channel — it carries heartbeat pulses that prove the other node is alive, configuration synchronisation so both nodes run the same policy, and session-state sync so established sessions can survive a failover. The fabric links are the data-plane forwarding path — when a packet lands on the standby node's physical interface, rather than dropping it the standby node encapsulates it and sends it across the fabric to the active node for processing. You can think of the control link as the nervous system and the fabric links as the circulatory system of the cluster.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📩 Quiz me on this in 7 days. Opt in and we'll email 3 micro-questions on Juniper SRX at Day 1, Day 7 and Day 30 — spaced repetition is how this sticks. Un-tick any time.

📖 Glossary

Chassis cluster: An HA configuration pairing two SRX devices (node0 and node1) to act as a single logical firewall with shared configuration, session tables and policies.
Control link: The dedicated inter-node link carrying cluster heartbeat, configuration synchronisation and session-state updates.
Fabric link: The data-plane inter-node link that forwards traffic from the standby node to the active node when a packet arrives on the wrong node.
Redundancy group (RG): The unit of failover — primary on exactly one node at a time. RG0 owns control-plane primacy; RG1–127 own data-plane resources and reth interfaces.
reth interface: A redundant ethernet pseudo-interface spanning both nodes, presenting one stable IP and MAC to the network. Active/standby state is inherited from its redundancy group.
Preempt: A redundancy-group setting that allows the higher-priority node to automatically reclaim primary after recovery. Not available on RG0.
Dual control links: Two physical control links between nodes (SRX5600/SRX5800 only); jsrpd heartbeats run on both, so one cable failure cannot cause split-brain.
Interface monitoring: A per-RG mechanism that assigns weights to physical interfaces; when cumulative weight falls to zero, the RG fails over. Recommended for RG1+ only, not RG0.
jsrpd: The Juniper Services Redundancy Protocol process — manages heartbeats, session sync, and redundancy group state across cluster nodes.
Split-brain: A failure mode where both cluster nodes believe the other is dead and both attempt to hold primary for the same RG, potentially causing traffic black holes.

📚 Sources

Juniper Networks — Chassis Cluster Security Devices: Overview and Configuration. juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/
Juniper Networks — Chassis Cluster Redundancy Groups. juniper.net/documentation/en_US/junos/topics/topic-map/security-chassis-cluster-redundancy-groups.html
Juniper Networks — Chassis Cluster Redundant Ethernet Interfaces (reth). juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/topic-map/security-chassis-cluster-redundant-ethernet-interfaces.html
Juniper Networks — Configuring Cluster Failover Parameters (preempt, delay timer, interface monitoring). juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/topic-map/security-chassis-cluster-failover-parameters.html
Juniper Networks — Chassis Cluster Dual Control Links (SRX5600/SRX5800). juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/topic-map/security-chassis-cluster-dual-control-links.html
Juniper Networks — Troubleshoot Redundancy Group Failover in a Chassis Cluster. juniper.net/documentation/us/en/software/junos/chassis-cluster-security-devices/topics/task/troubleshoot-srx-chassis-cluster-redundancy-group-not-failing-over.html

What's next?

Got clustering down? Next, master SRX security zones, policies and address books — the building blocks you configure on top of your HA pair.

Next · All interview lessons → Practice on exam.techclick.in →

Juniper SRX Chassis Cluster HA — node0/node1, Redundancy Groups & Failover

🎯 By the end you will be able to

Pick where you want to start

Cluster basics

Redundancy groups

reth interfaces

Failover & hardening

① Cluster basics — node0, node1, control link and fabric links

② Redundancy groups — RG0 controls the RE, RG1+ control data

Interface monitoring

③ reth interfaces — one logical interface across both nodes

▶ Watch an SRX cluster failover in real time

④ Failover triggers, timing and hardening with dual control links

Dual control links

🤖 Ask the AI Tutor

📝 Wrap-up assessment — six more

🧠 In your own words

🗣 Teach a friend

📖 Glossary

📚 Sources

What's next?