TTechclick ⚡ XP 0% All lessons
BeyondTrust · Operations · Deployment, HA & DRInteractive · L1 / L2 / L3

BeyondTrust Deployment, HA & DR: — Appliances, Clusters & Surviving Failures

Your vault now guards every other system's keys — so what guards the vault? This lesson walks the deployment shapes (HA pairs, cold spares, cloud brokers, Atlas), the failover you must rehearse, backups that live off the box, and the sealed-envelope plan for the day PAM itself is down.

📅 2026-06-10 · ⏱ 13 min · 3 live demos · 4 infographics · 🏷 10-Q assessment + AI Tutor inline

🎯 By the end you will be able to

Read as:

Pick where you want to start

1

Deployment shapes

HA pairs, cold spares, cloud brokers, Atlas — pick your failure domain.

2

HA mechanics

Heartbeat, replication, and testing failover without breaking production.

3

Backups & upgrades

Off-appliance backups, SUPI-first order, sane maintenance windows.

4

DR & break-glass

Tier-0 thinking, sealed-envelope escrow, RTO/RPO management signs off.

🧠 Warm-up — 3 questions, no score

Just notice which ones make you pause. We answer all three inside the lesson.

1. One appliance dies at 2 AM. With an HA pair in place, what happens to the credentials database?

Answered in Deployment shapes.

2. Where should the break-glass admin password live?

Answered in Backups & upgrades.

3. What does RPO mean for a password vault, in practice?

Answered in HA mechanics.

Most engineers think…

Most engineers think an HA pair means we have DR — if one appliance dies the other takes over, so backups and break-glass can wait for next year's budget.

Wrong — HA protects against a dead box, nothing else. A botched upgrade, a deleted Smart Rule storm, ransomware or an admin mistake replicates to the secondary just as faithfully as good data. And if the whole vault is gone, every admin password is rotated, vaulted and unknown to humans — only an offline break-glass escrow gets you in. HA, backups and DR answer three different failures; a tier-0 vault needs all three.

① Deployment shapes — vault pairs, cold spares, brokers and Atlas

Aditya at HCL has just been handed the deployment decision: the new Password Safe estate will hold every domain admin, every root password, every firewall enable secret in the company. The uncomfortable maths: the vault is now the single most attractive failure point in the building. The deployment shape you pick decides what happens the night it dies — so let's walk the shapes BeyondTrust actually ships, not the marketing diagram.

For Password Safe on-prem, the unit is the U-Series appliance. Resilience comes in two flavours. An HA pair is two appliances: the primary serves everything, internal database replication copies data across, and a heartbeat from the primary tells the secondary when to take over. A cold spare is the budget cousin: a standby appliance you restore from backup — recovery measured in hours and your data loss equals the age of the last backup. Honest numbers, lower bill.

Third shape: Password Safe Cloud (your tenant at hcl.ps.beyondtrustcloud.com). BeyondTrust runs and patches the vault; you run Resource Brokers inside your network, grouped into resource zones. Each broker installs nine services (rotation, discovery, directory auth, session monitoring and friends) and makes outbound-443-only connections to the cloud — no inbound firewall holes. Cloud moves the vault's uptime onto BeyondTrust's shoulders, but your zone is still yours: docs recommend 2+ brokers per zone, each with 16 GB RAM minimum and a 64 GB session-cache disk.

Figure 1 — Three deployment shapes, three failure domains
Three panels. Left: an on-prem U-Series HA pair — primary and secondary appliances with database replication and a heartbeat between them. Middle: Password Safe Cloud with on-prem Resource Brokers dialing out on port 443 only. Right: a PRA Atlas cluster — one primary node holding configuration and two traffic nodes serving regions, with TCP 443 open both ways between nodes. Three deployment shapes — three different failure domains removed On-prem: U-Series HA pair PRIMARY10.20.5.11 SECONDARY10.20.5.12 DB replication → heartbeat (dashed) heartbeat stops → secondary takes over automatically ⚠ replicates ONLY features enabled at pairing time removes: single-box death not: bad data, site loss Password Safe Cloud + brokers icici.ps.beyondtrustcloud.comBeyondTrust runs the vault your datacenter (zone: Mumbai) Broker 19 services Broker 2redundancy outbound 443 ONLY ↑ no inbound firewall hole rotation · discovery · AD auth · session proxy 2+ brokers per zone · 16 GB RAM · 64 GB cache disk removes: vault box ops + patching not: your broker zone dying PRA: pair → Atlas cluster PRIMARY NODEall configuration lives here traffic nodeMumbai DC traffic nodeSingapore TCP 443 open BOTH ways sync uses the Internal Address Atlas tier: 300–3,000 users 50k–250k endpoints removes: one-site bottleneck untrusted/attackertrusted/vaultedpolicy/decisionkey insightallowed/audited
Read left to right: the HA pair removes single-box death (but replicates bad data too), cloud + brokers removes vault box ops (but your zone can still die), and PRA Atlas removes the one-site bottleneck for global estates.
👉 So far: Password Safe = HA pair, cold spare, or cloud + Resource Brokers. Next: what PRA does for the same problem — pairs and Atlas clusters.

PRA (the B Series appliance, physical or virtual, usually in the DMZ) follows the same logic: a failover pair of appliances for resilience, configured under /login > Management > Failover. When the estate goes global, PRA scales out with Atlas clustering: one primary node is the configuration point, and traffic nodes in each region carry session load. TCP 443 must be open bidirectionally between all cluster appliances, and each node can publish a separate Public Address (for users) and Internal Address (for appliance-to-appliance sync). The Atlas capacity tier is real scale: 300–3,000 users and 50,000–250,000 endpoints.

Sizing is the part interviews love because it is money. Sessions: a Jumpoint host with 8–12 cores and 32–64 GB RAM handles roughly 20–25 concurrent RDP sessions (or ~200 SSH/Telnet) — recording RDP with your own tools costs about 4 cores per 5 sessions. Recordings: session logs and recordings stay on the PRA appliance for up to 90 days, then you export via the Integration Client (recordings as .flv, logs as .xml) to SQL Server or a file share. Undersize the broker's 64 GB session-cache disk and session monitoring dies first — a classic week-one surprise.

🖥️ This is the screen you'll use for cloud resilience — BeyondInsight → Configuration → Privileged Access Management Agents → Resource Zones. Note the built-in Default zone and the broker count per zone. (Recreated for clarity — your console matches this.)
hcl.ps.beyondtrustcloud.com · Configuration
1
Zone
Default (built-in — cannot be edited)
2
Resource Brokers online
2 of 2 · work round-robins across the zone
Zone
Mumbai-DC (brokers: 2)
3
Connectivity
Outbound 443 → hcl.ps.beyondtrustcloud.com
Create New Resource Zone

Four shapes you will quote in design meetings

Tap each card — these four words decide your 2 AM experience.

🛡️
HA pair
tap to flip

Two U-Series appliances, DB replication, heartbeat. Box dies, twin takes over in minutes. So: covers hardware death, not bad data.

🧊
Cold spare
tap to flip

Standby appliance restored from backup. RTO in hours, RPO = last backup. So: the honest budget option — write the numbers down.

📡
Resource zone
tap to flip

Brokers dial OUT on 443 to PS Cloud. 2+ per zone or local rotation and proxy stop. So: cloud moves the vault, not your duty.

🌐
Atlas cluster
tap to flip

Primary node owns config; traffic nodes serve regions; 443 both ways. So: global PRA scale with one control point.

Quick check · Q1 of 10

Aditya at HCL moves Password Safe to the cloud (hcl.ps.beyondtrustcloud.com). Which piece still runs inside HCL's datacenter?

Correct: b. PS Cloud still needs hands inside your network: Resource Brokers do local AD/LDAP auth, discovery, rotation and session proxying — all over outbound 443 only. No U-Series pair or SQL replica is required on-prem, and 'zero footprint' is the trap answer: kill your broker zone and local rotation stops even though the cloud portal is up.

Pause & Predict

Predict: if BeyondTrust runs the vault in the cloud, what still breaks when YOUR datacenter loses power tonight? Type your guess.

Answer: Everything the brokers do: rotation against local systems, discovery scans, AD/LDAP authentication and the session proxy path into your network. The cloud portal stays up — but it cannot reach your targets. That is exactly why docs say 2+ brokers per zone, and why serious shops put them in separate racks or sites.

② HA mechanics — heartbeat, replication and the failover you rehearse

The U-Series HA model is active/passive. The primary serves users, agents and APIs; the secondary replicates databases and otherwise stays quiet. The trigger is beautifully simple: the primary sends a heartbeat, and when that heartbeat stops arriving, the secondary takes over. Silence is the signal — no human presses a failover button at 2 AM.

Now the gotcha that fails real drills: HA replicates only the databases of features that were enabled when HA was configured. Enable Password Safe (or Secrets Safe) six months after pairing, and its database quietly never joins replication — the HA dashboard still says Healthy, because the pair itself is healthy. The rule to tattoo: enable features first, pair last — or re-establish HA after turning anything new on.

Figure 2 — The failover minute — machine lane vs human lane
A two-lane timeline. The top green lane shows what the appliance does automatically: the heartbeat stops, the secondary waits its threshold, then promotes itself and brings the replicated feature databases online. The bottom amber lane shows what the team must still do by runbook: swing DNS or confirm the service address, watch agents and brokers reconnect, run smoke tests like SignAppIn and a test checkout, and inform approvers and stakeholders. The failover minute — machine lane vs human lane AUTOMATIC (the appliance does this) 1 · heartbeat stops primary 10.20.5.11 dies — silence IS the trigger, no human presses a button 2 · threshold check secondary waits the missed- heartbeat threshold so a network blip does not cause a false flip 3 · takeover secondary promotes itself; replicated feature DBs online (only features paired into HA!) hand-off to humans YOUR RUNBOOK (no appliance will do this for you) 4 · point the world at it swing DNS / confirm the shared service address; watch brokers, agents and consoles reconnect 5 · smoke tests POST Auth/SignAppIn → 200 one test checkout · RDP via 4489 rotate one lab credential 6 · tell the humans approvers, SOC, change board; log the timeline; plan the fail-back window If steps 4–6 are not written down and rehearsed, you do not have HA — you have hope.
Top lane is automatic: heartbeat loss, threshold wait, takeover. Bottom lane never happens by itself: DNS/address swing, reconnect checks, smoke tests, comms. If the bottom lane is not written down, you have hope, not HA.
👉 So far: heartbeat decides takeover, replication carries only the features paired in. Next: watching one failover second by second — then breaking it.

▶ One failover, second by second

Power off the primary in a drill window and watch what is automatic — and what is not. Press Play for the healthy path, then Break it to see the failure.

① Healthyprimary 10.20.5.11 serves · replication → secondary 10.20.5.12 · heartbeat ✓
② Heartbeat lostprimary down → heartbeat missed · secondary waits its threshold
③ Takeoversecondary promotes itself · replicated feature DBs come online
④ Verifyrunbook: DNS/address swing · SignAppIn 200 · checkout + rotate test cred
Press Play to step through the healthy path. Then press Break it.

Meera at ICICI faces this

Quarterly DR drill: the team powers off the primary U-Series appliance. The secondary takes over, BeyondInsight loads — but the Password Safe menu is empty. No managed systems, no managed accounts. The HA dashboard said Healthy all year.

Likely cause

Password Safe was licensed and enabled months AFTER the HA pair was configured. U-Series HA replicates only the databases of features enabled at pairing time — the Password Safe database never joined replication, and nothing alarms about it.

Diagnosis

On the U-Series appliance management software, compare the enabled-features list against the date HA was configured; confirm the Password Safe database exists on the primary but is absent on the secondary.

U-Series Appliance Management > High Availability (paired features) · BeyondInsight > Configuration
Fix

Re-establish the HA pairing now that all features are enabled, let the initial replication complete fully, then schedule a fresh drill window.

Verify

Power off the primary again inside the window: the secondary must show managed systems and accounts, one test checkout must succeed, and a lab credential must rotate cleanly.

VERIFY — the quarterly failover drill (steal this checklist)

Announce the window → power off (do not gracefully migrate) the primary → confirm takeover → run smoke tests: POST Auth/SignAppIn returns 200, GET Configuration/Version answers, one test-account checkout succeeds, one RDP session rides the proxy on 4489, one lab credential rotates → fail back → write the actual timings next to the RTO you promised management. A failover you have never rehearsed is a diagram, not a capability.

Password Safe REST API — post-failover smoke test (any REST client)
POST https://ps.hcl-lab.in/BeyondTrust/api/public/v3/Auth/SignAppIn
Authorization: PS-Auth key=c0ffee9a...e1; runas=HCL\\svc-drcheck;

GET https://ps.hcl-lab.in/BeyondTrust/api/public/v3/Configuration/Version
Expected output
HTTP/1.1 200 OK            <- SignAppIn: session established on the NEW active node
HTTP/1.1 200 OK
{ "Version": "25.2.0.1234" }
-> vault is answering after takeover; now run one checkout + one rotation test
Quick check · Q2 of 10

A U-Series HA pair replicates…

Correct: c. Replication covers the feature databases that existed in the pairing at setup time. Enable a new feature later and its DB silently stays out until you re-establish HA — the root cause of Meera's empty Password Safe menu. 'Every byte' and 'restore at failover' both misread the model; recordings are not the special case.

Pause & Predict

Predict: you enabled Secrets Safe last month; the HA pair was configured last year. The primary dies tonight. What exactly is missing tomorrow? Type your guess.

Answer: Secrets Safe data on the secondary — its database never joined replication because it was enabled after pairing. Everything paired in (BeyondInsight assets, Password Safe if it predates the pairing) fails over fine, which makes the gap easy to miss. Fix before the disaster: re-establish HA after enabling any new feature.

③ Backups & upgrades — copies you can restore, windows you survive

Two backup species, two jobs. A config backup captures settings, policies and wiring — small, fast, take it before every change window. A full backup carries the databases: managed systems, accounts, the credential store, audit history, recordings metadata — large, scheduled, and the thing you rebuild a vault from. The U-Series Business Continuity guidance and every grown-up DR standard agree on placement: backups do not live on the appliance. Encrypted copies go to separate storage, ideally a second site — because whatever kills the appliance (fire, ransomware, a disk controller with opinions) kills anything stored on it in the same instant.

COMMON MISTAKE — the backup that dies with the patient

Symptom: after an appliance failure, the team goes to restore and discovers every backup lived on the appliance's own disk — same blast radius, all gone. Second symptom, quieter: backups exist off-box but nobody has ever restored one, and the first restore attempt happens during the outage. Fix: off-appliance encrypted storage at a second site, plus a calendar entry that restores one backup into a lab every quarter. A backup you never restored is a rumour.

Upgrades are the other planned emergency. The field-tested order for an on-prem U-Series estate: (1) suspend HA failover first — otherwise a mid-upgrade reboot looks exactly like a dead primary and the pair flips underneath you; (2) update SUPI (the update installer itself) before the appliance software; (3) upgrade the appliance/BeyondInsight + Password Safe; (4) resume HA and verify replication; (5) then the outer ring — Resource Brokers, agents and Jump Clients (PRA pushes client auto-upgrades in bandwidth-throttled waves), and desktop consoles last. Servers before agents, agents before consoles.

Version policy is part of the plan, not trivia: direct upgrades to BI/PS 25.2 are supported from 23.2 or later — older estates hop through an intermediate version first — and the platform wants SQL Server 2016 SP2+. Watch deprecations in the release notes too (mTLS is being phased out; client certificates are being retired as an API auth method). And remember the patching split from the CVE lesson: cloud tenants were auto-patched on 2024-12-16 during the CVE-2024-12356 emergency, while on-prem owners applied the fix themselves via the /appliance interface. Self-hosted means you own the patch SLA.

🖥️ On-prem patches land here, not in /login — B Series Appliance → /appliance → Updates. Subscribe to the auto-update service so emergency fixes arrive like cloud's did. (Recreated for clarity — your console matches this.)
pra.hcl-lab.in/appliance · Updates
1
Current version
PRA 24.3.4
2
Update available
PRA 25.1.1 (security update)
3
Auto-update (btupdate.com)
Enabled — outbound 443
Install This Update

Rahul at Infosys faces this

The morning after the PRA appliance upgrade (24.1.4 → 24.2.3), 40 of 100 Jump Clients show Active [Offline]. On the affected machines the old client is uninstalled and the new one never appeared.

Likely cause

The EDR (SentinelOne, in the original field report) blocked the upgrade's stop-service → uninstall → install → start sequence mid-flight, leaving those endpoints with no client at all.

Diagnosis

Windows Event Viewer on an affected endpoint shows the service stop and then nothing; the EDR console shows blocked installer executions timestamped to the upgrade wave.

/login > Status (Jump Client list, sort by last-seen) · endpoint Event Viewer + EDR console
Fix

Whitelist the new installer hash in the EDR BEFORE the next upgrade; redeploy the offline installer from the appliance Update tab to the 40 orphans. One admin reported 100% clean upgrades after also logging off all console users before starting.

Verify

All 100 clients report online on the new version; duplicate entries cleaned by sorting on last-seen; the next upgrade gets a 5-endpoint pilot ring before the full wave.

TIP — certificates count as maintenance too

Replacing the appliance SSL certificate looks harmless until hundreds of Jump Clients drop offline: docs say allow 24–48 hours for clients to pick up a changed certificate. Never combine a cert swap with a hostname change or an upgrade in the same window — when three things change at once, you cannot tell which one broke the fleet.

PowerShell — proxy-port reachability after failover or upgrade (run from an admin jump host)
Test-NetConnection ps-secondary.hcl-lab.in -Port 4489   # RDP proxy
Test-NetConnection ps-secondary.hcl-lab.in -Port 4422   # SSH proxy
Expected output
ComputerName     : ps-secondary.hcl-lab.in
RemoteAddress    : 10.20.5.12
RemotePort       : 4489
TcpTestSucceeded : True
(repeat shows 4422 True — both session-proxy listeners answering on the new active node)
👉 So far: backups live off the box and get restore-tested; upgrades go SUPI-first with HA failover suspended. Next: the day none of this is enough.
Quick check · Q3 of 10

Sneha at Wipro opens a 4-hour window to upgrade an on-prem U-Series HA pair to 25.2. Her first TWO moves?

Correct: a. Suspend HA failover or a mid-upgrade reboot looks like a dead primary and the pair flips underneath you; then SUPI updates before the appliance software (documented U-Series sequencing). Consoles and agents come AFTER the server side, and disabling backups before risky change is exactly backwards.

Pause & Predict

Predict: a critical RS/PRA CVE drops tonight. Whose estate is patched by morning — the cloud tenant's or the on-prem team's — and why? Type your guess.

Answer: The cloud tenant's. In the December 2024 incident (CVE-2024-12356), BeyondTrust auto-patched all SaaS instances on 2024-12-16 — cloud customers woke up fixed. On-prem owners had to apply the patch themselves via the /appliance interface. Self-hosting trades control for owning the patch SLA — say that sentence in interviews.

④ DR thinking — the vault is tier-0, plan for the day it is gone

Here is the trap your own success builds: once PAM is rolled out properly, no human knows any privileged password — they are vaulted, rotated, injected. Brilliant on a normal Tuesday. Catastrophic logic on disaster day: if PAM is down, nobody can log in to fix anything — including the systems PAM runs on. AD recovery needs credentials that live in the vault; the vault may need AD to authenticate its admins. That circular dependency is why the vault is tier-0 infrastructure, same shelf as your domain controllers.

Figure 3 — HA ≠ backup ≠ DR — three failures, three answers
Three columns compare HA pair, backup with cold spare, and DR with break-glass. For each: what it saves you from, typical recovery time objective, recovery point objective, and what it does not save you from. HA covers a dead box in minutes but replicates corruption. Backups cover bad data with hours of recovery and last-backup data loss. DR plus break-glass covers a dead site and a dead vault, letting humans in when everything else is gone. HA ≠ backup ≠ DR — three failures, three answers HA PAIR saves you from: a dead appliance (hardware, OS crash, single-box failure) RTO: minutes (auto takeover) RPO: ≈0 (continuous replication) does NOT save you from: bad data — deletion, ransomware, botched upgrade REPLICATE TOO (and not from a burnt site) cost: 2nd appliance + licence BACKUP + COLD SPARE saves you from: bad data — you roll back to a point BEFORE the damage RTO: hours (restore + verify) RPO: last good backup does NOT save you from: a backup stored ON the appliance, a restore you never rehearsed, passwords rotated after backup cost: storage + drill time DR + BREAK-GLASS saves you from: a dead SITE — and a dead VAULT (humans still get in) RTO/RPO: whatever management signed — written, not assumed does NOT save you from: break-glass stored INSIDE the vault (circular dependency), escrow nobody ever tested cost: process + discipline The vault guards every other system's keys — so it gets all three layers, in writing.
Column by column: HA fixes a dead box in minutes but faithfully replicates corruption; backups roll you back at the cost of hours and rotated-after-backup passwords; DR + break-glass is the only layer that still works when the vault itself is gone.

The answer is the oldest control in banking, done digitally: break-glass. Keep a tiny set of emergency accounts (a domain admin, a hypervisor root, the vault's own local admin) whose credentials live in an offline escrow that does not depend on the vault. The classic form is sealed envelopes in a physical safe — the modern form is the same ceremony, digital: an encrypted file or offline password store whose unlock is split between two custodians, exactly like an SBI bank locker needing your key and the manager's key. Three non-negotiables: alarmed (any use pages the SOC — the fire-alarm glass box rings when smashed), rotated after every use (the envelope is single-use), and tested on a schedule — a sealed envelope with last year's password inside is theatre.

COMMON MISTAKE — restore ≠ done: the stale-password trap

Symptom: vault restored from last night's backup, dashboards green — then rotation jobs and checkouts start failing with wrong-password errors. Cause: every account that rotated AFTER the backup point now has a different real password than the vault remembers. That gap is your RPO, made concrete. Fix: enable Check Password / scheduled password tests to detect mismatches, then reconcile — re-rotate via the functional account, or Reset on Mismatch where configured. Budget reconciliation time into the RTO you promise.

Password Safe REST API — find accounts rotated after the backup point (RPO check)
GET https://ps.hcl-lab.in/BeyondTrust/api/public/v3/ManagedAccounts
# filter client-side: LastChangeDate newer than the restored backup's timestamp
Expected output
HTTP/1.1 200 OK
[ { "AccountName": "adm-db01",  "LastChangeDate": "2026-06-10T02:14:00Z" },
  { "AccountName": "root-web04", "LastChangeDate": "2026-06-10T03:02:00Z" } ]
-> both rotated AFTER last night's 01:00 backup: vault now holds stale secrets
-> queue these for password test + re-rotation before declaring recovery done

Now the management conversation, because DR is a business decision wearing a technical costume. RTO: how many hours can the company run with no checkouts, no brokered sessions, no rotations? Frozen change windows, stalled vendor access, engineers locked out — that is money per hour. RPO: how many hours of rotations and audit trail can you lose — knowing each lost rotation is a reconciliation job? The shapes map cleanly: HA pair buys minutes of RTO and near-zero RPO; cold spare buys hours and last-backup; backup-only means a day or more. Present those three rows with prices and let management pick — then get the chosen numbers signed. An RTO that lives only in your head is yours to be blamed for; one signed in writing is a budget.

Quick check · Q4 of 10

2 AM: the vault is fully down and a domain controller needs an urgent fix. What gets Karthik in?

Correct: d. Only an escrow OUTSIDE the vault works when the vault is gone. The API rides the same dead appliance as the GUI; support never holds your secrets; and the functional account is the vault's internal rotation worker — its password is managed by (and inside) the very system that is down.

Pause & Predict

Predict: you restore the vault from last night's 01:00 backup. This morning 60 accounts rotated on schedule before the crash. What is silently broken, and what is the fix? Type your guess.

Answer: The vault now holds yesterday's passwords for those 60 accounts — the targets moved on at rotation time, so checkouts and rotation jobs will fail with mismatches. Detect with Check Password / the Password Test Agent, then re-rotate via the functional account (or Reset on Mismatch where configured). That reconciliation window is your RPO in real life — quote it to management.
Figure 4 — HA/DR planning card
A six-box quick reference card. HA facts: heartbeat triggers takeover, enable features before pairing. Upgrade order: suspend HA failover, SUPI first, appliance, then agents and consoles. Backups: off-appliance, encrypted, restore-tested. Drill: quarterly failover rehearsal with smoke tests. Break-glass: offline dual-control escrow, alarmed, rotated after use. RTO and RPO lines signed by management. HA/DR PLANNING CARD — pin above your desk HA PAIR FACTS • 2 U-Series appliances, active/passive • heartbeat stops → secondary takes over • replicates ONLY features enabled at pairing time — pair LAST • PRA: /login > Management > Failover · Cluster (Atlas) • Atlas: 443 BOTH ways between nodes, sync rides the Internal Address UPGRADE ORDER (on-prem) 1. read release notes + supported path (25.2 needs 23.2+ · SQL 2016 SP2+) 2. SUSPEND HA failover first 3. update SUPI, then appliance software 4. resume HA, verify replication 5. brokers / agents / Jump Clients (throttled waves) → consoles last EDR: whitelist installers BEFORE BACKUPS • config backup = small, frequent • full backup = DBs + recordings • NEVER on the appliance — encrypted, off-box, second site • a restore you never tested = a rumour • on-prem patches: /appliance interface • PRA recordings: 90 days on box → Integration Client (.flv/.xml) export QUARTERLY DRILL • announce window · power off primary • SignAppIn → 200 · Configuration/Version • 1 checkout · RDP 4489 · SSH 4422 • rotate ONE lab credential • fail back · write the timings down • never drilled = does not work BREAK-GLASS RULES • lives OUTSIDE the vault (offline escrow / sealed envelope, digital ok) • dual control — two custodians, like a bank locker's two keys • alarmed on use · rotated after use • tested twice a year, logged RTO / RPO — IN WRITING RTO: how long can checkouts + sessions stay down? ____ hours RPO: how many rotations/audit entries can we lose? ____ hours signed by: management, not assumed by the PAM admin HA saves a box · backups save the data · break-glass saves the humans
The one-card summary: HA facts, upgrade order, backup rules, the quarterly drill, break-glass non-negotiables, and the RTO/RPO blanks management must fill in and sign.
TIP — say it like an SRE in the interview

One closing sentence that lands: 'I treat the vault as tier-0 — HA pair for box failure, off-appliance restore-tested backups for data failure, offline dual-control break-glass for vault failure, and RTO/RPO numbers that management signed, not numbers I assumed.' That is the whole lesson in 35 seconds.

🎮 Hands-on: BeyondTrust PAM Essentials room

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from BeyondTrust docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

📝 Wrap-up assessment — six more

You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.

Q5 · Remember

In a U-Series HA pair, what tells the secondary appliance to take over?

Correct: a. The primary sends a heartbeat; when it stops, the secondary takes over — silence is the trigger. DNS probes, Jump Client broadcasts and SQL alarms play no role in the U-Series HA decision (Jump Clients belong to PRA anyway).
Q6 · Apply

Karthik at Wipro has a 4-hour window to take an on-prem U-Series HA pair to 25.2. Which sequence is right?

Correct: d. Suspend failover first (a mid-upgrade reboot looks like a dead primary and the pair flips), SUPI before appliance software per U-Series docs, server side before the outer ring of brokers/agents/consoles. Options a and c invert the order; b invites a mid-upgrade failover.
Q7 · Apply

Priya at Flipkart must choose where nightly Password Safe backups live. Which option survives BOTH an appliance-room disaster and a compromise of the appliance?

Correct: b. Anything ON the appliance shares its blast radius (options a and c die with the box); the same-rack NAS survives a software failure but not the room, and unencrypted vault backups are a breach of their own. Off-box, off-site, encrypted, restore-tested is the only answer that covers both failure modes.
Q8 · Analyze

During Meera's failover drill at ICICI the secondary comes up, BeyondInsight loads, but Password Safe shows zero managed systems. The HA dashboard says Healthy. Most likely root cause?

Correct: c. U-Series HA replicates only the databases of features enabled at pairing time — a later-enabled feature stays out while the pair itself reports Healthy. Licensing does not strip data (a), a long heartbeat delays takeover but the takeover happened (b), and brokers are a PS Cloud concept irrelevant to an on-prem pair (d).
Q9 · Analyze

Sneha designs break-glass for TCS: two emergency domain-admin accounts, passwords stored as secrets inside Password Safe behind a strict approver policy. What is the design flaw?

Correct: b. Break-glass exists precisely for the day the vault is down — storing it inside the vault is a circular dependency that fails exactly when needed. Two accounts is fine (redundancy), approver policies apply to any managed account, and domain admins are routinely vaulted; the flaw is location, not count or policy.
Q10 · Evaluate

Management gives Aditya budget for ONE of: a second appliance (HA pair) OR a mature backup + break-glass program. The estate has frequent config change and a strict audit. Which reasoning is strongest?

Correct: d. HA without backups leaves the worst failures (logical corruption — which replicates) unrecoverable, and no break-glass means a vault outage locks every admin out. Backups + escrow cover box death too, just with a slower RTO; add HA next cycle. Cloud (b) moves patching, not RTO/RPO accountability, and improvised snapshots (c) are untested restores of a hardened appliance.
Lesson complete — saved to your profile.
Almost! You need 70% (7 of 10) — re-read the path that tripped you up and tap "Try again".

🧠 In your own words

Type one line: Your CISO asks: if the Password Safe appliance room floods tonight, exactly how do admins log in tomorrow morning? Answer in three steps. Then compare to the expert version.

Expert version: Step 1 — open the break-glass escrow under dual control: two custodians, offline copy, SOC alerted by design, every access logged. Step 2 — use those emergency credentials to stabilise tier-0 (AD, hypervisor) and stand the vault back up: HA secondary if it survived, otherwise restore last night's off-site backup onto the cold spare via the appliance interface. Step 3 — reconcile before declaring victory: run Check Password to find accounts rotated after the backup point, re-rotate them through the functional account, rotate the break-glass credentials themselves, and write the measured timeline against the signed RTO/RPO for the audit.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📖 Glossary

U-Series appliance
BeyondTrust's hardened appliance (physical or virtual) running BeyondInsight + Password Safe on-prem.
HA pair
Two appliances in active/passive with database replication; the secondary takes over on heartbeat loss.
Heartbeat
The periodic signal from the primary appliance — its absence is what triggers failover.
Cold spare
A standby appliance restored from backup when the primary dies — hours of RTO, last-backup RPO.
Resource Broker
On-prem worker for Password Safe Cloud (auth, discovery, rotation, session proxy) — dials out on 443 only.
Resource zone
A group of brokers serving one network segment; Default zone is built-in; 2+ brokers per zone recommended.
Atlas cluster
PRA scale-out: one primary node owns configuration, traffic nodes carry regional sessions, 443 open both ways.
SUPI
The U-Series software update installer package — update it first, before the appliance software.
/appliance
The B Series appliance/OS management interface where on-prem patches are applied — separate from /login.
RTO
Recovery Time Objective — how long the service may stay down before the business hurts.
RPO
Recovery Point Objective — how much data (rotations, audit trail) you can afford to lose.
Break-glass account
Emergency credential kept OUTSIDE the vault — dual-controlled, alarmed on use, rotated after every use.

📚 Sources

  1. BeyondTrust U-Series Deployment & Failover Guide — appliance roles, HA pair, replication + heartbeat. docs.beyondtrust.com/bips/docs/u-series-deployment-and-failover-guide
  2. BeyondTrust U-Series Business Continuity Guide — backup, restore and continuity planning. docs.beyondtrust.com/bips/docs/u-series-business-continuity
  3. BeyondTrust PRA Atlas Cluster Guide — primary + traffic nodes, bidirectional 443, public vs internal addresses. docs.beyondtrust.com/pra/docs/atlas
  4. Password Safe Cloud Resource Broker Installation — zones, the 9 broker services, outbound-443 model, sizing. docs.beyondtrust.com/bips/docs/ps-cloud-resource-broker-install
  5. BeyondInsight and Password Safe 25.2 Release Notes — direct upgrade from 23.2+, SQL Server 2016 SP2+, deprecations. docs.beyondtrust.com/bips/changelog/beyondinsight-and-password-safe-25-2-release-notes
  6. BeyondTrust advisory BT24-10 (CVE-2024-12356) + CISA KEV — cloud auto-patched 2024-12-16; on-prem patches via /appliance. beyondtrust.com/trust-center/security-advisories/bt24-10
  7. BeyondTrust PRA SSL Certificates guide — allow 24–48 h for Jump Clients to pick up a changed certificate. docs.beyondtrust.com/pra/docs/on-prem-ssl-certificates
  8. PeerSpot — BeyondTrust Password Safe reviews: lengthy upgrades, suspend-HA-before-upgrade field practice. peerspot.com/products/beyondtrust-password-safe-pros-and-cons
  9. BeyondTrust Beekeepers community — Jump Clients offline after appliance upgrade (EDR blocked reinstall). beekeepers.beyondtrust.com/general-51/jump-clients-offline-5503

What's next?

The vault survived the disaster drill — now for the everyday fires: rotations failing, sessions dropping, brokers sulking. Next lesson is the troubleshooting playbook — symptom to root cause to fix, the way interviews want it.