Most engineers think…
Most engineers think "config backup" is a solved, boring problem — just copy run start on each box, or let the device save to a TFTP server, and you are done. So they treat compliance as a once-a-year spreadsheet the auditor emails around.
Wrong — and that gap is where outages and audit findings live. A local copy run start gives you no history, no diff, and no central record; when the Pune router breaks you cannot see what changed and when. The real move is a nightly Ansible job that pulls every config, versions it in Git (so every change is a dated, attributable diff), and then asserts the running state against a written policy so the play fails the host that drifted. Backup answers "what changed"; compliance answers "is it still allowed" — and both run unattended every night, not once a year.
① The drift + backup problem — the midnight change nobody recorded
Picture Rahul, an L1 network engineer at Infosys, on a Monday morning. A core switch at the Pune campus stopped passing a VLAN over the weekend. Someone — nobody is owning up — logged in around 2 a.m. on Saturday and "fixed" something by hand on the CLI. There is no ticket, no email, no record of what they typed. Rahul opens the switch and sees the current config, but he has nothing to compare it against. He cannot answer the only question that matters: what changed, and what did it look like before?
This is configuration drift, and it is the quiet killer of network ops. Every undocumented hand-edit — a quick no shutdown, a temporary ACL line "just for testing", an SNMP community string added for a vendor — pushes the live running-config a little further from what your design documents say it should be. Each one is harmless alone. Together they mean that on the day of an outage, nobody can roll back to a known-good state, because nobody recorded one.
The second half of the same problem is compliance. Your security policy says: no telnet, SNMPv3 only (no clear-text SNMPv1/v2c community strings), an approved NTP server set, and AAA configured for login. But who checks that on all 200 devices, every night? In most shops the honest answer is "nobody, until the auditor asks" — and by then the drift has been live for months.
Here is the shift. Ansible can log into every device on a schedule, pull the full running-config, and save it to a timestamped file — that is the backup. Commit each night's pull into Git and you suddenly have a full, dated change history with diffs. Then, in the same nightly run, Ansible reads the running state and asserts it against your written policy — and the play fails the host that drifted. Backup answers "what changed"; compliance answers "is it still allowed". Both, unattended, every night.
The four faces of the drift problem
Tap each card — these are the four pains every "we need config backup and compliance" project starts from.
A hand-edit with no ticket means the only record is the live config. So: you can never prove what changed or when.
Without a saved known-good config, an outage has nothing to revert to. So: recovery becomes guesswork at 2 a.m.
Telnet, SNMPv2 community strings and weak logins sneak in over time. So: your hardened baseline quietly rots.
Compliance checked once a year, by hand, on a sample. So: most drift is live for months before anyone notices.
A backup playbook is your apartment society's gate-pass register. Every visitor who enters is written down with a timestamp and who let them in. Months later, when something goes missing, the secretary opens the register and sees exactly who came, when, and on whose authority. A device with no config history is a society with no register — anyone walked in, nobody wrote it down, and now you are arguing about what happened with zero evidence. Ansible + Git is the register, written automatically every night.
Sneha at TCS says: "We run copy run start on every router after changes, so we already have backups." Why is that NOT enough for drift detection and rollback?
Pause & Predict
Predict: if you only run config backups but never run a compliance check, what class of problem stays completely invisible to you? Type your guess.
② Backup playbooks — ios_config backup, timestamped files, committed to Git
The fastest backup uses the cisco.ios.ios_config module with one switch: backup: true. When set, Ansible logs into each device, grabs the running-config, and writes it to a file on the control node. By default that file lands in a backup/ folder next to your playbook, named <hostname>_config.<date>@<time>. That default naming is already timestamped — but in real life you control it with backup_options.
backup_options takes two sub-options: dir_path (where the file goes) and filename (what it is called). This is how you organise backups by hostname and date so the folder stays sane across 200 devices. A common pattern: one directory per device, file named with a date stamp — so backups/BR-Mumbai-rtr01/2026-06-11.cfg sits next to yesterday's file and git diff between them is meaningful.
---
- name: Nightly config backup
hosts: ios_devices
gather_facts: false
vars:
backup_root: "/home/netauto/backups"
tasks:
- name: Pull running-config and save a timestamped copy
cisco.ios.ios_config:
backup: true
backup_options:
dir_path: "{{ backup_root }}/{{ inventory_hostname }}"
filename: "{{ lookup('pipe','date +%Y-%m-%d') }}.cfg"
register: backup_result
- name: Show where each backup landed
ansible.builtin.debug:
var: backup_result.backup_pathPLAY [Nightly config backup] ***************************************
TASK [Pull running-config and save a timestamped copy] *************
ok: [BR-Mumbai-rtr01]
ok: [BR-Pune-rtr01]
TASK [Show where each backup landed] *******************************
ok: [BR-Mumbai-rtr01] => { "backup_result.backup_path": "/home/netauto/backups/BR-Mumbai-rtr01/2026-06-11.cfg" }
PLAY RECAP *********************************************************
BR-Mumbai-rtr01 : ok=2 changed=0 unreachable=0 failed=0Notice changed=0 in the recap — pulling a backup does not change the device, so a read-only backup run is safe to schedule. (If you do not use a Cisco IOS device, the same idea works with *_command modules: run show running-config, then write the captured output to a file with copy_to/ansible.builtin.copy. The *_config backup option is just the tidy shortcut where it exists.)
A folder of dated files is useful; a Git repository of them is powerful. After the backup task, run a couple of shell tasks (or the ansible.builtin.git-adjacent pattern) to git add, git commit with a dated message, and push. Now every night's configs are a commit. When the Pune router breaks, you run git log on its file, find Saturday's 2 a.m. commit, run git diff, and see the exact line someone added — with a timestamp and author. That is the difference between "something changed" and "this line changed, at this time".
cd /home/netauto/backups git add -A git commit -m "Nightly config backup 2026-06-11" || echo "No changes to commit" git push origin main
[main 7c4e1a9] Nightly config backup 2026-06-11 2 files changed, 11 insertions(+), 3 deletions(-) rewrite BR-Pune-rtr01/2026-06-11.cfg (78%) To github.com:infosys-netauto/network-backups.git 3b21f0c..7c4e1a9 main -> main
Symptom: git log shows a fresh commit every single morning for a device nobody touched, and your "what changed" diffs are full of noise. Cause: the running-config contains a line that changes on its own — a timestamp, an uptime counter, an ntp clock-period value, or certificate/cron metadata. Each backup looks "different", so Git commits it. Fix: strip the volatile lines before diffing/committing (filter out timestamp/clock lines), or back up with a normalised view. This is the #1 false-drift source in real pipelines — a stray line makes everything look changed.
Pause & Predict
Predict: you back up with backup: true but your saved .cfg files contain SNMP community strings and TACACS keys in clear text, and you push them to a shared Git repo. What new risk did you just create? Type your guess.
Aditya wants each device's backups in their own folder, named by date, so git diff between days is clean. Which ios_config setting does that?
③ Compliance checking — read the running state, assert the policy, fail the drifter
Backup is the camera. Compliance is the auditor. The pattern is always the same three moves: (1) read the running state, (2) assert it against a written policy, (3) fail the host that violates it. To read state you use either ios_facts (structured facts under ansible_net_* keys) or ios_command (raw show output you grep with filters).
The enforcer is the assert module. You hand it a list of conditions under that: and a fail_msg. If every condition is true, the host passes; if any is false, that host fails the play and your message names the violation. So your policy — no telnet, SNMPv3 only, NTP set, AAA configured — becomes a list of assert conditions, and the play recap turns red for exactly the devices that drifted.
---
- name: Network security compliance check
hosts: ios_devices
gather_facts: false
tasks:
- name: Grab the full running-config
cisco.ios.ios_command:
commands: ["show running-config"]
register: rc
- name: Assert the device meets policy
ansible.builtin.assert:
that:
- "'transport input telnet' not in rc.stdout[0]" # no telnet
- "'snmp-server community' not in rc.stdout[0]" # SNMPv3 only, no v1/v2c
- "'ntp server 10.10.0.10' in rc.stdout[0]" # approved NTP set
- "'aaa new-model' in rc.stdout[0]" # AAA configured
fail_msg: "NON-COMPLIANT: telnet/SNMPv2/NTP/AAA policy violated"
success_msg: "Compliant"TASK [Assert the device meets policy] ******************************
ok: [BR-Mumbai-rtr01] => { "msg": "Compliant" }
fatal: [BR-Pune-rtr01]: FAILED! => {"assertion": "'transport input telnet' not in rc.stdout[0]",
"evaluated_to": false, "msg": "NON-COMPLIANT: telnet/SNMPv2/NTP/AAA policy violated"}
PLAY RECAP *********************************************************
BR-Mumbai-rtr01 : ok=2 changed=0 failed=0
BR-Pune-rtr01 : ok=1 changed=0 failed=1Read the recap: Mumbai passed, Pune failed=1 — and the failing assertion text tells you which condition broke. That is the whole point: a single play run, across every device, produces a precise red/green list. No human eyeballing 200 configs.
Now the design decision that trips up every beginner: report-or-remediate. A report-only play asserts and fails but touches nothing — safe to run nightly, unattended, against production. A remediate play goes further: when it finds telnet, it removes it with ios_config. Remediation is powerful but it changes production, so you gate it behind a change window, --check dry-runs first, and human approval. The classic newbie error is wiring auto-remediation into the nightly cron and discovering, at 3 a.m., that the play "fixed" a temporary change an engineer needed.
▶ Watch one compliance run decide pass vs fail
A nightly compliance play hits two devices. Follow how it reads state, tests each policy line, and turns the recap red for exactly the drifter. Press Play for the healthy path, then Break it to see the failure.
A compliance play that always passes is worse than none — it gives false confidence. Before trusting it, test it against a known-bad device: deliberately add transport input telnet to a lab router and run the play. If the host does not turn red, your assert condition is wrong (loose match, wrong index, or testing the saved file instead of the live state). A good check must fail when it should — verify the failure path, not just the happy path.
Priya at ICICI faces this
Priya, an L1 analyst, gets the morning compliance report: BR-Pune-rtr01 is flagged NON-COMPLIANT, assertion "'transport input telnet' not in rc.stdout[0]" evaluated to false.
Someone re-enabled telnet on the Pune router's VTY lines during a weekend troubleshooting session and never backed it out. The running-config now has "transport input telnet ssh", so the no-telnet assert fails for that host only.
She confirms it is real drift (not a false match) by reading the actual line, and cross-checks the Git backup to see when it appeared.
git log -p backups/BR-Pune-rtr01/ → finds the line added in the 2026-06-08 commit; ansible-playbook compliance.yml --limit BR-Pune-rtr01 to reproduceIn a scheduled change window she runs the remediate play (or hand-edits): ios_config sets "transport input ssh" on the VTY lines, removing telnet; she does a --check dry-run first.
Re-run compliance.yml --limit BR-Pune-rtr01 → host now ok="Compliant", failed=0; the next nightly Git backup shows the telnet line removed.
Karthik at HCL writes a nightly play that detects drift AND auto-pushes ios_config fixes, then schedules it in cron. What is the dangerous flaw?
Pause & Predict
Predict: ios_facts can return parsed facts under ansible_net_* keys. Why might you still prefer reading raw text with ios_command for a "no telnet" check? Type your guess.
ios_facts exposes a fixed set of parsed facts (version, interfaces, neighbors, and resource-module sections) — it may not surface every arbitrary line you care about, like the exact VTY transport input statement. ios_command with show running-config returns the raw config text, so you can assert on any literal line. Facts are cleaner and structured when the data you need is in them; raw command output is the catch-all when your policy targets a specific config line that facts do not model.④ A real pipeline — backup → commit → assert → report, end to end
Now we stitch the pieces into one nightly pipeline. The shape every shop converges on: (1) backup every device → (2) git commit the pull → (3) compliance assert against policy → (4) report the red/green result to the team (email, Slack, or an artifact). Steps 1–2 give you history; steps 3–4 give you the audit. Run it from cron tonight, and tomorrow you have both — for every device, automatically.
A worked example — "detect a device still running telnet and remediate it". The nightly report flags BR-Pune-rtr01 (telnet found). That is the detect. The next morning, in the approved change window, an engineer runs the remediate play with a --check dry-run, eyeballs the diff, then applies it. The play sets transport input ssh, telnet is gone, and that evening's backup + compliance run both come back green. Detect at night, remediate by day, prove it with the next backup.
---
- name: Remediate telnet on flagged hosts
hosts: "{{ target | default('ios_devices') }}"
gather_facts: false
tasks:
- name: Force SSH-only on VTY lines (no telnet)
cisco.ios.ios_config:
lines:
- transport input ssh
parents: line vty 0 15
register: fix
- name: Report what changed
ansible.builtin.debug:
msg: "{{ 'remediated' if fix.changed else 'already compliant' }}"# dry run first: ansible-playbook remediate-telnet.yml -e target=BR-Pune-rtr01 --check --diff
--- before
+++ after
@@ line vty 0 15 @@
- transport input telnet ssh
+ transport input ssh
changed: [BR-Pune-rtr01]
TASK [Report what changed] => { "msg": "remediated" }There is a slicker, declarative way to detect drift too: ios_config can take diff_against: intended with an intended_config baseline, and Ansible reports the difference between the device and your golden config. Its sibling choices are diff_against: running (before/after of your change) and diff_against: startup (running vs startup). For a "does this match my approved baseline" check, intended is the one that maps to drift detection.
Three gotchas that bite real pipelines. One — secrets in saved configs. Running-configs carry SNMP communities and TACACS keys; if your Vault password or those configs leak you have a breach. This is not theoretical: CVE-2024-8775 showed Ansible Vault secrets exposed in plaintext in playbook output, and CVE-2024-0690 showed no_log not being respected in some loop scenarios — so put no_log: true on any task touching secrets and keep backup repos private. Two — huge diffs from timestamps. Strip volatile clock/timestamp lines or every backup looks changed. Three — false drift from loose matches. Assert on the real config line, not a substring, or harmless text triggers a red.
ram@netauto:~/playbooks$ ansible-playbook -i inventory.ini backup.yml && \ bash commit-backups.sh && \ ansible-playbook -i inventory.ini compliance.yml
PLAY RECAP (backup.yml) *** BR-Mumbai-rtr01: ok=2 changed=0 BR-Pune-rtr01: ok=2 changed=0 [main 7c4e1a9] Nightly config backup 2026-06-11 | 2 files changed, 11 insertions(+), 3 deletions(-) TASK [Assert the device meets policy] *** ok: [BR-Mumbai-rtr01] => "Compliant" fatal: [BR-Pune-rtr01]: FAILED! => "NON-COMPLIANT: telnet/SNMPv2/NTP/AAA policy violated" PLAY RECAP (compliance.yml) *** BR-Mumbai-rtr01: failed=0 BR-Pune-rtr01: failed=1
On the RHCE EX294 blueprint, this lesson lives where it counts: writing playbooks with tasks, variables and conditionals, using modules correctly, and — directly relevant here — protecting credentials with Ansible Vault. The 2026 exam runs on ansible-navigator with execution environments. The backup/assert pattern you just learned is the same task-and-module muscle the exam tests; the Vault + no_log discipline is the security objective. Career-wise, "we automate config backup and nightly compliance" is one of the most common first real jobs handed to a junior on a network automation desk.
The whole pipeline is two familiar habits stacked. The Git backup is your bank passbook: every entry is dated and you can flip back to any day and see the exact balance (config) and what moved (the diff). The nightly compliance assert is the dabbawala's end-of-day tally: every tiffin (device) is checked against the manifest, and the one that does not match is flagged loudly — not all 200, just the drifter. Passbook for history, daily tally for "is everything still right". Together they are config backup + compliance.
Cold, in 30 seconds: name the four pipeline stages (backup → git commit → assert → report); say which ios_config option pulls the config (backup: true + backup_options) and which detects drift against a baseline (diff_against: intended); state the four assert policy lines (no telnet, SNMPv3 only, NTP set, AAA on); and explain why the nightly job is report-only while remediation needs a change window. If you can do that without notes, you are ready for AWX and for the Vault objectives on EX294.
An interviewer asks Meera: "Walk me through your nightly network-config pipeline and the single biggest safety rule in it." Best answer?
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from Ansible docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: In one line, what is the difference between what a backup playbook proves and what a compliance playbook proves? Then compare to the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- Configuration drift
- When a device's live config slowly diverges from the approved baseline because of un-tracked manual changes.
- running-config
- The config a device is running right now, in memory; on IOS you read it with show running-config.
- Backup (config)
- A point-in-time copy of a device's config saved off-box so you can compare, audit and restore it.
- ios_config backup
- cisco.ios.ios_config option (backup: true) that pulls the running-config to a file; backup_options sets dir_path and filename.
- diff_against
- ios_config option to compare the device config against running, startup, or intended (a golden baseline) — drift detection.
- intended_config
- The golden/approved baseline config you supply so diff_against: intended can show how the device differs from it.
- ios_facts
- Module that collects structured facts (version, interfaces, parsed sections) from an IOS device under ansible_net_* keys.
- ios_command
- Module that runs show commands and returns the raw text output for you to search/assert on.
- assert module
- ansible.builtin.assert — checks a list of conditions; if any is false the host fails the play, with your fail_msg.
- Git
- Version-control system; each commit is a dated, attributable snapshot, so git diff shows exactly what changed between backups.
- Report vs remediate
- Report-only detects + flags drift but changes nothing (safe nightly); remediate pushes the fix (change-window only).
- Ansible Vault / no_log
- Vault encrypts secrets at rest; no_log: true hides them from task output at run time — use both for credentials.
📚 Sources
- Ansible Community Documentation — cisco.ios.ios_config module (backup: true default false; backup_options.dir_path / filename with default
_config. @ - Ansible Community Documentation — cisco.ios.ios_facts and cisco.ios.ios_command modules (ansible_net_* fact keys; raw show-command output for compliance assertions). docs.ansible.com/ansible/latest/collections/cisco/ios/ios_facts_module.html · docs.ansible.com/ansible/latest/collections/cisco/ios/ios_command_module.html
- CellStream — "Two Ansible Network Compliance Examples" + PacketCoders — "Automating Network Config Backups with Ansible and Git" (real ios_command + assert compliance pattern; git commit of timestamped backups; reporting). cellstream.com/2025/06/19/two-ansible-network-compliance-examples · packetcoders.io/automating-network-config-backups-with-ansible-and-git
- jwkenney — "Using Ansible to audit configuration drift in a brownfield environment" + ansiblebyexample — "Managing Compliance Drift with Ansible" (false-drift from timestamp/volatile lines, md5sum noise, detect-vs-remediate modes). jwkenney.github.io/auditing-configuration-drift · ansiblebyexample.com/articles/managing-compliance-drift-with-ansible
- Red Hat / NIST — Ansible Vault + no_log guidance and CVE-2024-8775 (vaulted secrets exposed in plaintext in playbook output) + CVE-2024-0690 (ANSIBLE_NO_LOG not respected in some loop scenarios). docs.ansible.com/ansible/latest/vault_guide/index.html · nvd.nist.gov/vuln/detail/CVE-2024-8775 · nvd.nist.gov/vuln/detail/CVE-2024-0690
- Red Hat EX294 — Red Hat Certified Engineer (RHCE) exam objectives: create/use playbooks with tasks, variables, conditionals and modules; protect sensitive data with Ansible Vault; 2026 toolset ansible-navigator + execution environments. redhat.com/en/services/training/ex294-red-hat-certified-engineer-rhce-exam-red-hat-enterprise-linux
What's next?
You can pull and audit configs from the CLI now — but who runs this nightly, stores the Vault password safely, and shows the whole team a red/green dashboard? That is the control room. Next we move the pipeline into AWX / Automation Controller.