In an ansible-lockdown CIS role, what does setting run_audit: true with audit_only: true do?

Correct: a. run_audit produces the goss compliance report and audit_only makes the run check-only — it scores the host and writes a report without remediating. It doesn't rebuild hosts, doesn't enforce anything (audit_only is the opposite of enforce), and doesn't disable SSH.

You want to harden only the SSH controls on a single canary host first, changing nothing else. Which command is correct?

Correct: b. --limit canary scopes to one host and --tags ssh runs only the SSH-tagged tasks. --skip-tags ssh does the opposite; --tags level2-server runs the stricter set (not just SSH); and running with no flags on the whole inventory is the blind, fleet-wide run that causes outages.

A CIS control disables the SFTP subsystem in sshd. After enforcing it, your next ansible-playbook run can't copy files to the host. What's the right fix?

Correct: c. Ansible copies files over SFTP by default, so disabling the SFTP subsystem breaks file transfer — setting scp_if_ssh = True makes Ansible use SCP instead (or keep SFTP and except that control). Abandoning Ansible or disabling SSH defeats the purpose; rebooting doesn't change the sshd config.

Mid-enforce over SSH, your live session freezes and you can't reconnect. Control connections to the box are gone. Which CIS change is the most likely culprit, and what's the safe recovery path?

Correct: a. SSH cipher/MAC hardening can remove the very algorithm your live session is using, cutting you off. The safe recovery is an out-of-band console (cloud serial / iLO / iDRAC) — not SSH, which is down — then loosen or except the cipher control. The other options either can't cut SSH like this or assume an SSH path that no longer works.

Two ways to roll out CIS to 100 prod servers: (A) run the full role with no tags, all Level 1 + Level 2 at once, to 'just get compliant fast'; (B) audit-only first, review the diff, except the app-breakers with tickets, enforce Level 1 on a canary then the fleet in a window, re-run to changed=0, and schedule it in AWX. Which is stronger and why?

Correct: b. B is the safe, provable, auditable path: audit-then-review catches breakers before they hit, exceptions are documented, a canary + window limits blast radius, changed=0 is real compliance evidence, and AWX scheduling stops drift. A is exactly how teams lock themselves out and break apps; Level 2 controls absolutely can break things; and the two paths do NOT end the same — A risks an outage and leaves no evidence trail.

Ansible for CIS Hardening: Securing 100 Servers

Q: Rahul at TCS hardened 80 servers by hand last quarter. The auditor now asks: "prove server-57 still matches the CIS SSH baseline today." Why is an idempotent Ansible re-run the strongest answer?

Correct: a. Idempotence is the whole point: re-running the role and getting 'changed=0' is machine-checkable evidence the host matches the baseline, and any drift would show as 'changed' and be corrected. Ansible doesn't rebuild the host; hand edits are exactly what drift, and the report is meant to be read by the auditor.

Q: Priya at Wipro wants to harden only the SSH controls on a single test box first, without touching password policy, auditd or firewall. Which flag does that cleanly?

Correct: b. --tags ssh runs ONLY the tasks tagged ssh, which is exactly a scoped SSH-only run. --skip-tags ssh does the opposite (everything except SSH). --check is a dry run that never changes anything — useful, but it wouldn't apply SSH either. Re-writing by hand throws away the maintained role's value.

Q: Aditya is about to enforce a CIS role on 50 production servers he reaches over SSH. Which single habit most reduces the chance of locking himself (or Ansible) out?

Correct: c. Dry-running with --check --diff lets you see the sshd/cipher/SFTP changes before they hit, and an out-of-band console is your way back in if SSH does break. Enforcing everything blind is exactly the cause of the outage; disabling SSH locks you out immediately; and skipping --check removes your safety net, it doesn't help.

Q: An interviewer asks Meera: "What single piece of evidence would you show me to prove a server is CIS-compliant right now?" Best answer?

Correct: d. Compliance evidence is two things together: a re-run with changed=0 (the state matches the coded baseline) and a goss audit report showing each control passing, with any failures being documented exceptions. Uptime is irrelevant, an email is not evidence, and 'the playbook ran' only proves execution — not that the host's state actually matches the benchmark.

Q: You enforce a CIS role; the first run reports changed=40. You re-run the identical play and it reports changed=11, not 0. What is the most likely explanation?

Correct: d. A non-zero second run means the end state isn't stable: either a task re-applies every run (non-idempotent) or another process/cron reverts a setting between runs. That's a flag to find the specific 'changed' tasks and fix them. It's not about the control count, the run isn't fully compliant yet, and Ansible's changed count is deterministic, not random.

Content-specific feature visual for this lesson: use it as the 60-second map before reading the full detail.

Most engineers think…

Most engineers think CIS hardening with Ansible is a one-time, one-button job: "point the role at the servers, hit enforce, done — they're compliant forever."

Wrong — and that mindset is how teams lock themselves out of SSH and break production apps on a Friday evening. Real CIS automation is a loop, not a button: audit-only first to score where you stand, review which of the hundreds of controls will change, exclude the few that break your app (with a written exception), enforce in a maintenance window, then re-run to prove idempotence — a clean second run with zero changes is your evidence of compliance. And because servers drift, you schedule the re-run (e.g. in AWX) so the baseline self-heals.

① Why automate hardening — the pain of doing CIS by hand

Meet Sneha, an L2 engineer at Infosys. Her audit team hands her a line that sounds simple: "bring all 100 Linux servers up to the CIS Benchmark." Then she opens the benchmark PDF. The CIS Ubuntu 22.04 Benchmark v2.0.0 has 244 individual controls; RHEL 9 v2.0.0 and Ubuntu 24.04 have their own hundreds. Each control is a small edit — a line in sshd_config, a password-policy value, a file permission, a kernel parameter, an auditd rule, a service to disable.

Doing that by hand on 100 boxes is four problems stacked on top of each other. Slow: even five minutes per control per server is days of clicking and SSHing. Inconsistent: Sneha sets MaxAuthTries 4 on 97 servers and fat-fingers 14 on three. Unprovable: when the auditor asks "prove server-42 is compliant," she has no evidence except "I think I did it." And the quiet killer — drift: a teammate edits sshd_config by hand next week to debug something and forgets to revert, and now the server silently falls out of compliance with nobody watching.

👉 So far: a CIS Benchmark is hundreds of small controls, and hand-applying them across a fleet is slow, inconsistent, unprovable and drifts. Next: what Ansible changes about all four.

Here is the shift. Compliance-as-code with Ansible expresses the entire CIS baseline as code once. You run it against all 100 servers in parallel, so 'slow' becomes minutes. Because it's the same code everywhere, 'inconsistent' disappears — every box gets MaxAuthTries 4, full stop. And the move that makes Sneha's auditor happy is idempotence: a second run that reports zero changes is proof the fleet already matches the baseline. Drift is caught and corrected on the next scheduled run.

Figure 1 — By hand vs Ansible — the same CIS baseline across a fleet

Left (red) = each server hardened by hand, so values drift and there's no proof. Right (green) = one role applied to all, identical values, and an idempotent re-run that proves compliance.

The four pains of hand-hardening, one tap each

Tap each card — these are exactly the problems an auditor (and a CIS interview) starts from.

🐌

Slow

tap to flip

Hundreds of controls × dozens of servers, done by hand. So: it's never finished, and re-doing it after a rebuild hurts.

🎲

Inconsistent

tap to flip

One typo and 'MaxAuthTries 4' becomes '14' on three boxes. So: 'compliant' servers quietly aren't.

🕵️

Unprovable

tap to flip

No report, just 'trust me'. So: when the auditor asks for evidence, you have none.

💨

Drift

tap to flip

A later manual edit slips the box out of baseline. So: compliance rots silently until the next audit.

Daily-life analogy — the society gate-pass register vs a printed master list

Hardening by hand is like every flat's guard writing the visitor rules from memory in their own notebook — slightly different in each tower, and impossible to audit. Ansible is the printed master gate-pass policy the society office prints once and posts at every gate: identical rules everywhere, and you can walk to any gate and check the printout matches. Re-printing it next month (the re-run) instantly catches any guard who quietly changed a rule.

Quick check · Q1 of 10

Rahul at TCS hardened 80 servers by hand last quarter. The auditor now asks: "prove server-57 still matches the CIS SSH baseline today." Why is an idempotent Ansible re-run the strongest answer?

a) A re-run that reports 0 changed proves the host already matches the coded baseline — and self-heals it if it driftedb) Because Ansible deletes the server and rebuilds it from scratch each timec) Because hand edits are always more reliable than coded) Because the auditor can't read Ansible output

Correct: a. Idempotence is the whole point: re-running the role and getting 'changed=0' is machine-checkable evidence the host matches the baseline, and any drift would show as 'changed' and be corrected. Ansible doesn't rebuild the host; hand edits are exactly what drift, and the report is meant to be read by the auditor.

Pause & Predict

Predict: you harden 100 servers with a role today and they all pass. Six weeks later, with NObody touching Ansible, name ONE reason a few servers could fall out of compliance — and the one habit that catches it. Type your guess.

Answer: Drift. Someone SSHes in to debug an app and edits sshd_config, a package update resets a default, or a new service writes a world-readable file. Nothing in Ansible changed, but the box no longer matches. The habit that catches it: schedule the role to re-run on a cadence (e.g. nightly/weekly in AWX). The next run reports the drifted controls as 'changed' and corrects them — so the baseline self-heals instead of rotting until the next audit.

② The building blocks — a maintained CIS role, toggles & tags

You could hand-write 244 tasks yourself — and you'll learn a lot doing it once — but for a real fleet most teams start from a maintained role. The best-known is the open-source ansible-lockdown project — roles like UBUNTU22-CIS, UBUNTU24-CIS, RHEL8-CIS and RHEL9-CIS. Each role already maps every CIS control to a task, so you spend your judgement on which controls to apply and which to skip, not on re-writing the benchmark.

The first building block is a toggle per control. Every CIS rule has its own boolean in defaults/main.yml, named after the rule number: on the Ubuntu 22 role it's ubtu22cis_rule_X_X_X, on RHEL 9 it's rhel9cis_rule_5_2_1 and so on. Set a toggle to false and that single control is skipped — this is exactly how you carve out a control that would break your app, with a comment recording why. There are also master switches: run_audit turns on the built-in compliance check, and system_is_audit / audit_only make the run check-only instead of changing anything.

👉 So far: a maintained role gives you one toggle per control plus master audit switches. Next: how tags let you run only Level 1, only Level 2, or just the SSH section.

The second building block is tags, organised by CIS structure. Every task in the role carries several: the level (level1-server, level2-server, level1-workstation…), the section/component (ssh, services, firewall, auditd), whether it's a change or a check (patch vs audit), and the rule number (rule_5_2_1). That lets you scope a run precisely. Level 1 is the safe baseline; Level 2 is stricter and more likely to break something — so you usually roll out Level 1 fleet-wide first and Level 2 only where you've tested it.

ansible-playbook — scope a CIS run with tags (Level 1 only, then just the SSH section)

# Level 1 server controls only (the safe baseline, fleet-wide)
ansible-playbook site.yml -i prod_inventory --tags level1-server

# Just the SSH section, and only a single rule, for a careful first test
ansible-playbook site.yml -i prod_inventory --tags ssh
ansible-playbook site.yml -i prod_inventory --tags rule_5_2_1

# Apply everything EXCEPT the stricter Level 2 controls
ansible-playbook site.yml -i prod_inventory --skip-tags level2-server

Expected output

PLAY [Apply CIS Benchmark - Ubuntu 22.04] **************************************
TASK [UBUNTU22-CIS : 5.2.1 | Ensure permissions on /etc/ssh/sshd_config] *******
ok: [web-mum-01]
TASK [UBUNTU22-CIS : 5.2.5 | Ensure SSH MaxAuthTries is set to 4 or less] *******
changed: [web-mum-01]
PLAY RECAP *********************************************************************
web-mum-01  : ok=63  changed=9  unreachable=0  failed=0  skipped=171

Figure 2 — Inside a CIS role — toggles + tags carve the run

Read it top-down: the benchmark splits into sections; each task has a toggle (on/off) AND tags (level, section, patch/audit, rule). --tags and --skip-tags slice the run; a false toggle is your documented exception.

Common mistake — "I ran the whole role on prod and it changed 200 things at once"

Symptom: you run ansible-playbook site.yml with no tags on a production fleet and it remediates every Level 1 AND Level 2 control in one go — including ones that break your app — and now you're firefighting. Cause: no scoping. Fix: always scope your first runs. Start with --check --tags level1-server (or audit_only: true) to see what would change, roll out Level 1 before Level 2, and use --skip-tags level2-server until you've tested the stricter controls in staging.

Quick check · Q2 of 10

Priya at Wipro wants to harden only the SSH controls on a single test box first, without touching password policy, auditd or firewall. Which flag does that cleanly?

a) --skip-tags ssh (skips SSH, runs everything else)b) --tags ssh (runs only the SSH-tagged tasks)c) --check (changes nothing at all, ever)d) delete the role and write SSH by hand

Correct: b. --tags ssh runs ONLY the tasks tagged ssh, which is exactly a scoped SSH-only run. --skip-tags ssh does the opposite (everything except SSH). --check is a dry run that never changes anything — useful, but it wouldn't apply SSH either. Re-writing by hand throws away the maintained role's value.

Pause & Predict

Predict: an app on one server genuinely needs a setting that a CIS Level 2 control forbids. You don't want to disable that control on the whole fleet. What's the clean way to make ONE host an exception — and what must you not forget? Type your guess.

Answer: Set that rule's toggle to false for just that host (e.g. a host_vars/ entry or a group for that app: ubtu22cis_rule_4_1_1: false). The rest of the fleet still enforces it. What you must not forget: a written, dated exception — a comment in the vars file and a ticket/record saying which control, which host, why it's excepted, and who approved it. An undocumented 'false' is indistinguishable from a mistake at the next audit.

③ Running it safely — audit, review, exclude, enforce, prove

Hardening is the one job where 'move fast' gets you locked out of production. The safe sequence is a ladder, and you climb it in order: audit → review → exclude → enforce → prove → schedule. Skipping a rung is how the Friday-evening outage happens.

Step 1 — audit-only, to score where you stand. Run the role in a mode that only checks and changes nothing. With the ansible-lockdown role you set audit_only: true (or system_is_audit: true) with run_audit: true; under the hood it runs goss against the same controls and writes a report. Ansible's own --check mode is the lighter built-in version of the same idea: ansible-playbook site.yml --check --diff shows every line that would change, host by host.

👉 So far: step 1 is audit-only / --check to score without changing anything. Next: review the diff, exclude the breakers, then enforce in a window.

Step 2 — review what would change. Read the audit report or the --diff output and ask of each big change: will this break a running app? The usual suspects are SSH controls (could lock you out), cipher/MAC restrictions (could break old clients or SFTP), firewall defaults (could drop a port your app uses) and disabling a 'unused' service that isn't actually unused. Step 3 — exclude the breakers with documented exceptions. For each control that would break something, set its toggle to false with a dated comment and a ticket — that's your audit-trail. Step 4 — enforce in a maintenance window, ideally Level 1 first, on a canary host, then the fleet. Step 5 — re-run to prove idempotence: a clean second run with changed=0 is your compliance evidence. Step 6 — schedule it in AWX / Automation Controller so drift gets corrected automatically.

Figure 3 — The audit-then-enforce flow — one control's journey

Follow the arrows: every control flows audit → does it break the app? If yes, branch to a documented exception (amber); if no, enforce, then the re-run proves it. The whole loop ends in 'scheduled' so drift comes back to audit.

▶ Walk the audit-then-enforce ladder on one host

Follow a single Ubuntu host, web-mum-01, through the safe sequence. Watch the run change from 'check only' to 'enforce' to 'prove'. Press Play for the healthy path, then Break it to see the failure.

① Auditaudit_only: true → goss checks 244 controls; report says 62 fail, 182 pass

▼

② Review + excluderead --diff: rule_4_1_1 (auditd) would break the app → set toggle false + ticket

▼

③ Enforceaudit_only: false, --tags level1-server in the window → changed=58

▼

④ Provere-run the same play → changed=0; idempotent, compliant; schedule it in AWX

Press Play to step through the healthy path. Then press Break it.

ansible-playbook — the safe ladder (dry-run audit, then a scoped enforce, then prove)

# 1) Audit / dry-run: score the host, change NOTHING
ansible-playbook site.yml -i prod_inventory --check --diff --tags level1-server

# 2) Enforce Level 1 in the window, on the canary first
ansible-playbook site.yml -i prod_inventory --limit web-mum-01 --tags level1-server

# 3) Prove idempotence — a clean re-run = compliance evidence
ansible-playbook site.yml -i prod_inventory --limit web-mum-01 --tags level1-server

Expected output

# run 2 (enforce):
PLAY RECAP ********************************************************************
web-mum-01  : ok=66  changed=58  unreachable=0  failed=0  skipped=120

# run 3 (prove): zero changes = idempotent + compliant
PLAY RECAP ********************************************************************
web-mum-01  : ok=66  changed=0   unreachable=0  failed=0  skipped=120

🖥️ This is the audit report you read in step 2 — the goss results the role writes to /opt on the host: audit_<hostname>-CIS-UBUNTU22_<epoch>.json. (Recreated for clarity — your output matches this shape.)

sneha@control:~/UBUNTU22-CIS — goss audit summary

Audit content dir

/opt (AUDIT_CONTENT_LOCATION)

Report file

audit_web-mum-01-CIS-UBUNTU22_1749600000.json

rule_5_2_5 SSH MaxAuthTries

FAIL → expected 4, found 14

rule_5_2_22 SSH PermitRootLogin

FAIL → expected no, found yes

rule_4_1_1 auditd enabled

FAIL (excepted — ticket OPS-4821)

Summary

Count: 244 Failed: 62 Passed: 182

▶ run audit

Prove it's really compliant, not just 'it ran'

A green play recap only means the tasks executed. To prove compliance, do two things: (1) re-run and confirm changed=0 — idempotence is your evidence the state matches the baseline; and (2) read the post-enforce goss audit report (the role writes pre_audit_outfile and post_audit_outfile) and confirm the previously-failing rules now pass, except the ones you deliberately excepted. 'The playbook finished' is not the same as 'the host is compliant.'

Karthik at HCL faces this

Karthik enforces the full CIS role on a staging box over SSH. Halfway through, his SSH session freezes and he can't reconnect — and the next Ansible run to that host fails to copy files with an SFTP error.

Likely cause

Two SSH-hardening controls bit him at once. A cipher/MAC restriction dropped the algorithm his live session was using, and another control disabled the SFTP subsystem — but Ansible copies files to the host over SFTP by default, so subsequent runs can't transfer files.

Diagnosis

He recognises a classic CIS SSH gotcha: control-plane (his login) and Ansible's own transport both ride sshd, so SSH controls can cut the very connection doing the hardening. He checks which sshd controls changed in the --diff he should have read first.

Console/iLO out-of-band login → /etc/ssh/sshd_config + journalctl -u ssh; on the control node set scp_if_ssh in ansible.cfg

Fix

Get back in via the out-of-band console (iLO/iDRAC/cloud serial), revert or loosen the cipher control to keep a working algorithm, and either keep the SFTP subsystem or set scp_if_ssh = True in ansible.cfg so Ansible uses SCP. Record both as documented exceptions if the app needs them.

Verify

Re-run with --check --diff first (no freeze), confirm the SSH session stays up, and confirm a normal ansible-playbook run can copy files to the host again; then enforce for real in a window.

Quick check · Q3 of 10

Aditya is about to enforce a CIS role on 50 production servers he reaches over SSH. Which single habit most reduces the chance of locking himself (or Ansible) out?

a) Enforce all Level 1 and Level 2 controls at once to get it over withb) Disable SSH entirely before running the rolec) Run --check --diff first, review the sshd changes, and keep an out-of-band console open before enforcingd) Run the role twice as fast by removing --check

Correct: c. Dry-running with --check --diff lets you see the sshd/cipher/SFTP changes before they hit, and an out-of-band console is your way back in if SSH does break. Enforcing everything blind is exactly the cause of the outage; disabling SSH locks you out immediately; and skipping --check removes your safety net, it doesn't help.

Pause & Predict

Predict: you enforce a CIS role and the FIRST run reports changed=40. You re-run the identical play and it reports changed=12, not 0. What does a non-zero second run most likely mean — and is the role 'broken'? Type your guess.

Answer: It usually means a control isn't truly idempotent yet, or something on the host is fighting it: a non-idempotent task that 'changes' every run (e.g. re-templating a file that has a timestamp), or a service/cron/another tool that reverts a setting between runs so the role keeps re-applying it. The role isn't necessarily broken — but changed=12 on the second run is a red flag to investigate: find the specific tasks reported as 'changed', and fix the task or remove the thing reverting the state. True compliance evidence is changed=0.

④ A real pass — harden a fleet, get a report, handle a conflict

Time to put it together on a real group. Sneha's target is app_servers — a mix of Ubuntu 22.04 and RHEL 9 hosts at Flipkart. Her plan covers the controls that matter most on day one: SSH hardening (disable root login, MaxAuthTries 4, strong ciphers), password + sudo policy (faillock lockout, password quality, NOPASSWD audit), auditd (capture logins and privileged commands), and the host firewall (ufw on Ubuntu, firewalld/nftables on RHEL, default-deny inbound).

She points the right role at the right hosts — UBUNTU22-CIS for the Ubuntu group, RHEL9-CIS for the RHEL group — using the rule toggles and tags from section 2. The SSH controls live in section 5.2 (e.g. rule_5_2_5 MaxAuthTries, root-login disable); auditd is section 4; the firewall and network controls in their own sections. She runs audit-only first, reads the goss report, and only then enforces.

site.yml — assign the maintained CIS role per OS group, audit_only first

- name: Harden Ubuntu app servers to CIS
  hosts: app_servers_ubuntu
  become: true
  vars:
    run_audit: true          # produce the goss compliance report
    audit_only: true         # STEP 1: check only, change nothing
    ubtu22cis_rule_5_2_5: true     # SSH MaxAuthTries <= 4
    ubtu22cis_rule_4_1_1: false    # auditd start — excepted (ticket OPS-4821)
  roles:
    - UBUNTU22-CIS

- name: Harden RHEL app servers to CIS
  hosts: app_servers_rhel
  become: true
  vars:
    run_audit: true
    audit_only: true
    rhel9cis_rule_5_2_1: true      # sshd_config permissions
  roles:
    - RHEL9-CIS

Expected output

PLAY RECAP ********************************************************************
web-mum-01 (ubuntu)   : ok=182 changed=0 failed=0   # audit_only: nothing changed
db-blr-04  (rhel9)    : ok=171 changed=0 failed=0
# goss report written to /opt/audit_web-mum-01-CIS-UBUNTU22_1749600000.json

The audit report flags 62 failing controls on web-mum-01. Reviewing the diff, one control would break a legacy monitoring agent that needs an older SSH cipher. This is the conflict: enforce the strict cipher list and the agent loses its connection. Sneha doesn't disable the control fleet-wide — she scopes the exception to just the hosts running that agent, with a dated comment and a ticket, then plans to fix the agent so she can remove the exception later. Then she flips audit_only to false and enforces Level 1 in the window.

🖥️ The enforce run she watches in the window — a real ansible-playbook site.yml --tags level1-server terminal. Note PermitRootLogin and MaxAuthTries flip to 'changed', and the excepted control is skipped. (Recreated for clarity.)

sneha@control:~/cis $ ansible-playbook site.yml -i prod --tags level1-server

TASK 5.2.22 PermitRootLogin no

changed: [web-mum-01]

TASK 5.2.5 MaxAuthTries 4

changed: [web-mum-01]

TASK 4.1.1 auditd enabled

skipping: [web-mum-01] (toggle false)

TASK firewall default-deny (ufw)

changed: [web-mum-01]

PLAY RECAP web-mum-01

ok=66 changed=58 failed=0

▶ enforce

Figure 4 — The audit-then-enforce ladder — cheat-sheet

Your one-card map of this lesson: the safe ladder, the CIS sections covered, the key role variables, the ports/commands, and the lockout traps. Keep it open during your first real hardening job.

Common mistake — over-enforcing a 'disable unused service' control

Symptom: after a clean CIS run, an app that worked yesterday can't resolve names or send mail, and the change log shows a service was stopped/masked. Cause: a CIS control disabled a 'recommended-off' service (e.g. rpcbind, an MTA, avahi) that your stack actually depends on — 'unused' is the benchmark's assumption, not your reality. Fix: in the audit/--check review, list every service the role would disable, check it against what your apps need, and except the ones in use (toggle false + ticket). The benchmark is a starting point you tailor, not gospel you apply blind.

One sober, current note: hardening roles touch the most sensitive files on the box, so treat the role itself as production code. Pull a pinned, reviewed version of the maintained role (don't git clone main straight onto 100 prod servers), test every upgrade of the role in staging, and keep your ansible-vault secrets and SSH keys locked down — a compromised control node that can harden every server can also mis-configure every server. For RHEL teams, recent CIS coverage is strong: the RHEL 9 v2.0.0 profile is now ~99% automatable via the SCAP Security Guide, so the gap between 'audit says fail' and 'Ansible can fix it' is smaller than ever.

👉 So far: a real two-OS pass — SSH/auditd/firewall, a goss report, and a documented exception for the conflicting control. Next: the exam + career angle, then the final recap.

For certification, this capstone sits at the intersection of two tracks. The RHCE EX294 is a 4-hour performance exam: you write and debug real playbooks, use system roles, roles from Galaxy, Vault and Jinja2 — exactly the muscles you used to apply a role with toggles and templated config. The CIS Benchmark side gives you the security vocabulary employers want: Level 1 vs Level 2, scored controls, audit vs remediate, exceptions and evidence. Put together — 'I can take a 244-control benchmark and apply, exclude and prove it across a mixed fleet with Ansible' — that is a job-ready sentence on an Indian infra/security résumé.

Prove you own the capstone

Cold, in 30 seconds: name the six rungs of the ladder (audit → review → exclude → enforce → prove → schedule); say how you'd run only Level 1 and only SSH (--tags level1-server / --tags ssh); say what makes a control an exception (toggle false + dated ticket); and say what proves compliance (a re-run with changed=0 plus a clean goss report). If you can do that without notes, you've finished the Ansible series ready for the job and the exam.

Revisit: Jinja2 & Idempotency (the proof behind changed=0)

Quick check · Q4 of 10

An interviewer asks Meera: "What single piece of evidence would you show me to prove a server is CIS-compliant right now?" Best answer?

a) A screenshot of the server's uptimeb) An email saying 'I hardened it last month'c) The fact that the playbook ran without crashingd) An idempotent re-run of the CIS role showing changed=0 plus the goss audit report with all controls passing (bar documented exceptions)

Correct: d. Compliance evidence is two things together: a re-run with changed=0 (the state matches the coded baseline) and a goss audit report showing each control passing, with any failures being documented exceptions. Uptime is irrelevant, an email is not evidence, and 'the playbook ran' only proves execution — not that the host's state actually matches the benchmark.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from Ansible docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

🧠 In your own words

Type one line: In one line, what single piece of output proves a server is CIS-compliant right now, and why is it stronger than 'I ran the playbook'? Then compare to the expert version.

Expert version: An idempotent re-run reporting changed=0 (plus a clean goss audit report) — because it shows the host's actual state already matches the coded baseline and would self-correct any drift, whereas 'the playbook ran' only proves the tasks executed, not that the end state is compliant.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📩 Quiz me on this in 7 days. Opt in and we'll email 3 micro-questions on CIS Hardening Automation at Day 1, Day 7 and Day 30 — spaced repetition is how this sticks. Un-tick any time.

📖 Glossary

CIS Benchmark: A consensus hardening standard for an OS, split into Level 1 (safe baseline) and Level 2 (stricter, can break apps). Hundreds of controls.
Compliance-as-code: Expressing the hardening baseline as Ansible code so it's version-controlled, reviewable, re-appliable and auditable instead of done by hand.
ansible-lockdown: Community Ansible roles (UBUNTU22-CIS, RHEL9-CIS, etc.) that remediate and audit a host against the CIS Benchmark.
Idempotence: Running the role twice gives the same end state; a second run with changed=0 is your proof the host already matches the baseline.
goss: A small (~12 MB) Go binary that runs YAML checks to verify each control is actually in effect; ansible-lockdown uses it for the audit step.
audit_only / run_audit: Role switches: run_audit produces the compliance report; audit_only (or system_is_audit) makes the run check-only, changing nothing.
Rule toggle: A per-control boolean in defaults/main.yml (e.g. ubtu22cis_rule_5_2_5). Set false to skip one control — your way to make a documented exception.
Tags (level1-server / ssh): Labels on tasks so --tags / --skip-tags run a subset: only Level 1, only SSH, or a single rule, instead of the whole role.
--check / --diff: Ansible dry-run flags: --check reports what would change without changing it; --diff shows the exact lines. Your pre-enforce review.
auditd: The Linux audit daemon — records logins, file changes and privileged commands for forensics and CIS logging controls (section 4).
Drift: When a host silently falls out of the baseline after a manual edit, update or new service; caught by a scheduled re-run.
AWX / Automation Controller: Web UI + scheduler for Ansible — runs the role on a cadence with logging and RBAC, so the baseline self-heals against drift.

📚 Sources

ansible-lockdown UBUNTU22-CIS role — README + defaults/main.yml (per-rule toggles like ubtu22cis_rule_X_X_X; tags level1-server/level2-server/ssh/patch/audit/rule_X_X_X; run --tags level1-server / --skip-tags level2-server). github.com/ansible-lockdown/UBUNTU22-CIS
ansible-lockdown RHEL9-CIS role — control variable naming rhel9cis_rule_5_2_1, section layout (1 Initial Setup, 3 Network, 4 auditd, 5 Access/SSH 5.2.x), excluding app-breaking controls by setting the rule var to false. github.com/ansible-lockdown/RHEL9-CIS
Ansible Lockdown docs — Audit (getting started): goss binary at /usr/local/bin/goss, audit content in /opt (AUDIT_CONTENT_LOCATION), report file audit_{hostname}-{BENCHMARK}-{OS}_{epoch}.{format}, audit_only / pre_audit_outfile / post_audit_outfile, and Known Issues (GRUB-password lockout). ansible-lockdown.readthedocs.io/en/latest/audit/getting-started-audit.html
dev-sec ansible-ssh-hardening role — community/forum lockout gotchas: user-account lockout on the ec2 ubuntu account, SFTP deactivated by default (set scp_if_ssh = True in ansible.cfg), crypto cipher/MAC incompatibilities breaking older clients. github.com/dev-sec/ansible-ssh-hardening · danivovich.com/blog/2017/08/31/ansible-ssh-hardening-lockout/
Canonical / Ubuntu Security — CIS Benchmark versions and counts (Ubuntu 22.04 v2.0.0 = 244 controls; Ubuntu 24.04 v1.0.0 = 232; Level 2 extends Level 1), USG hardening tooling. ubuntu.com/security/certifications/docs/usg/cis · cisecurity.org/benchmark/ubuntu_linux
Red Hat — 'High automation coverage for CIS in RHEL 9' (RHEL 9 v2.0.0 profile ~99% automatable via SCAP Security Guide; covers access control, logging, network, system hardening). redhat.com/en/blog/high-automation-coverage-cis-rhel-9 · access.redhat.com/compliance/cis-benchmarks
Red Hat RHCE EX294 exam objectives — 4-hour performance exam: roles, roles from Galaxy, system roles, Ansible Vault, Jinja2 templates. redhat.com/en/services/training/ex294-red-hat-certified-engineer-rhce-exam-red-hat-enterprise-linux

What's next?

That's the full Ansible automation track — from your first ad-hoc command to applying, excepting and proving an entire CIS Benchmark across a fleet. Step back up a level now: how all of this network and host security stitches into one cloud-delivered edge with SASE.

Next · Back to: SASE Architecture Explained → Practice on exam.techclick.in →

Ansible for CIS Hardening: — Securing 100 Servers to Benchmark in Minutes

🎯 By the end you will be able to

Pick where you want to start

Why automate

Building blocks

Run it safely

A real pass

① Why automate hardening — the pain of doing CIS by hand

The four pains of hand-hardening, one tap each

② The building blocks — a maintained CIS role, toggles & tags

③ Running it safely — audit, review, exclude, enforce, prove

▶ Walk the audit-then-enforce ladder on one host

④ A real pass — harden a fleet, get a report, handle a conflict

🤖 Ask the AI Tutor

📝 Wrap-up assessment — six more

🧠 In your own words

🗣 Teach a friend

📖 Glossary

📚 Sources

What's next?