Most engineers think…
Most engineers first hear "Ansible for networks" and picture installing some software on every router, then writing a giant script that runs commands top-to-bottom like a bash file.
Wrong on both counts — and it will cost you in interviews. Ansible is agentless (nothing installed on the device — it logs in over SSH/API), declarative (you describe the desired state, not the keystrokes), and idempotent (re-running a playbook that already matches makes zero changes). For routers and switches it does not even use the normal SSH module — it loads a network_cli connection that knows how to drive a Cisco/Arista/Juniper CLI. The mental model is a checklist of desired states, not a script of commands.
① Why Ansible for networks — drift, typos and 2 a.m. callbacks
Meet Rahul, an L1 engineer at Infosys looking after 500 branch switches. The standard rollout — say, a new login banner and an NTP server on every box — means he opens PuTTY, SSHes in, pastes four lines, saves, logs out, and repeats. Five hundred times. By switch 347 he fat-fingers an IP, by switch 480 he is copy-pasting yesterday's banner, and a week later no two configs are identical. That slow divergence is called configuration drift, and it is the quiet disease of every hand-run network.
There is also no record. When something breaks at 2 a.m., nobody can say who changed what on which box. Rahul is not lazy or careless — he is human, and humans do not scale to 500 identical tasks. This is exactly the gap Ansible was built to fill.
Three properties define Ansible, and you must be able to say each one in an interview. Agentless: you install no agent on the device — Ansible logs in over SSH (or an API) the same way you do, runs its work, and disconnects. Nothing to install, patch or break on 500 switches. Declarative: you describe the desired state ("this interface should have this description"), not the keystrokes to get there. Idempotent: run the same playbook ten times and it makes a change only the first time — after that it sees the state already matches and does nothing.
One more thing that trips up beginners: for routers and switches, Ansible does not use its normal SSH/shell module (the one it uses for Linux servers). Network gear has a CLI, not a shell, so Ansible loads a special network_cli connection. You point it at the right OS with ansible_network_os, and it knows how to handle the Cisco prompt, paging, and enable mode. We will set those variables in the next section.
The three words, one tap each
Tap each card — these are the answers to "what is Ansible?" that a hiring manager wants to hear.
No software on the device — Ansible logs in over SSH/API and leaves. So: nothing extra to install or patch on 500 boxes.
You declare the end state, not the keystrokes. So: the playbook reads like a checklist, and the tool works out the diff.
Re-running a matching playbook changes nothing. So: it is safe to run on a schedule and to prove a box is compliant.
For routers/switches Ansible uses a CLI connection, not the Linux SSH module. So: it handles the prompt + enable mode for you.
Hand-config is like every flat owner phoning the guard with their own instructions — nobody agrees, and there is no record. Ansible is the society gate-pass register: one written list of who is allowed in (the desired state), the guard checks each visitor against it, and if a name is already on the list he does nothing new. Re-reading the same register tomorrow changes nothing — that is idempotency. And the guard (Ansible) carries the list to every gate; he does not install a copy at each flat (agentless).
Sneha at TCS says: "We run our compliance playbook every night. Won't that re-apply the banner and NTP config 365 times a year and risk breaking something?" What is the correct reassurance?
Pause & Predict
Predict: Ansible is "agentless" and just uses SSH. Name ONE thing that gets easier because there is no agent, and ONE new thing you now depend on instead. Type your guess.
② The pieces — control node, inventory, modules, collections, plays
Ansible has a small number of moving parts. Learn the names once and the rest of the series clicks. The control node is the only box with Ansible on it — your laptop or a jump host. Everything else is a managed device with nothing installed.
The inventory lists your devices and sorts them into groups — [routers], [switches], [mumbai]. Groups matter because you attach shared settings to a group once via group_vars, instead of repeating them per host. A task calls a module; a group of tasks aimed at a set of hosts is a play; one or more plays in a YAML file is a playbook.
Modules do not all ship in the box anymore. They live in collections you install from Ansible Galaxy. For Cisco IOS you install cisco.ios; for Arista, arista.eos; for Juniper, junipernetworks.junos; for VyOS, vyos.vyos. The Cisco collection depends on the shared ansible.netcommon collection, so installing cisco.ios pulls it in.
Now the network-specific part. By default Ansible would try to SSH and run a shell — useless on a router. So in your inventory or group_vars you set three things. ansible_connection: ansible.netcommon.network_cli loads the CLI connection. ansible_network_os: cisco.ios.ios tells it the platform. And to get into enable mode, you add become: yes with ansible_become_method: enable and an ansible_become_password (the enable secret).
# inventory.ini
[routers]
rtr-mum-01 ansible_host=10.10.1.1
rtr-pun-01 ansible_host=10.10.2.1
# group_vars/routers.yml
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.ios.ios
ansible_user: netauto
ansible_become: yes
ansible_become_method: enable
ansible_become_password: "{{ vault_enable_secret }}"# group_vars apply to every host in [routers], # so you set the connection ONCE, not per device. # The enable secret is pulled from an Ansible Vault # variable — never hard-coded in the file.
Symptom: you wrote a clean playbook, but the run hangs and ends in a timeout or returns garbage. Cause is almost always a wrong or missing ansible_network_os, or forgetting ansible_connection: ansible.netcommon.network_cli — so Ansible tries to open a Linux shell on a device that only has a Cisco CLI. Fix: set both in group_vars and confirm the platform string exactly (cisco.ios.ios, not ios or cisco_ios). A second classic: the very first connection prompts to accept the SSH host key and the run stalls — set host_key_checking = False in ansible.cfg for a lab.
A word on YAML, because it causes more first-day pain than any module. YAML uses spaces, never tabs, and indentation defines structure — two spaces in is "inside" the line above. A list item starts with - (dash-space). A key: value needs the space after the colon. Mis-indent one line and you get a cryptic "could not find expected ':'" error. When in doubt, run ansible-playbook --syntax-check site.yml before anything else.
Aditya at Wipro writes a playbook for a Cisco router but leaves the connection at the default. The run hangs and times out. Which single change is the fix?
Pause & Predict
Predict: you put ansible_connection and ansible_network_os in group_vars/routers.yml instead of repeating them on every host line in the inventory. Why is that the better habit? Type your guess.
③ Your first playbook — gather facts, push one line, run with --check
Time to build something real. A first playbook should do two safe things: read state and write one line. We will use ios_facts to gather facts, then ios_config to set an interface description. Both come from the cisco.ios collection.
---
- name: First IOS playbook
hosts: routers
gather_facts: false
tasks:
- name: Collect device facts
cisco.ios.ios_facts:
- name: Show the IOS version we found
ansible.builtin.debug:
msg: "Running IOS {{ ansible_net_version }} on {{ ansible_net_model }}"
- name: Set the WAN interface description
cisco.ios.ios_config:
parents: interface GigabitEthernet0/1
lines:
- description WAN-uplink-to-PunePLAY [First IOS playbook] ****************************** TASK [Collect device facts] *************************** ok: [rtr-mum-01] TASK [Set the WAN interface description] ************* changed: [rtr-mum-01] PLAY RECAP ******************************************** rtr-mum-01 : ok=3 changed=1 unreachable=0 failed=0
Notice gather_facts: false at the play level — the normal Linux fact-gathering does not work on a router, so we disable it and use the network-specific ios_facts task instead. The facts land in variables like ansible_net_version and ansible_net_model, which you can print or reuse. In ios_config, parents is the section header (interface GigabitEthernet0/1) and lines are the commands inside it.
You never run a new playbook straight at production. The flags are your seatbelt. --check is a dry run — it reports what would change and touches nothing. --diff shows the precise lines. --limit scopes the run to one box for a first test. And -v (up to -vvvv) turns up verbosity when something misbehaves.
▶ Watch one ios_config task decide what to do
You run the playbook against rtr-mum-01. Follow how ios_config compares your desired line to the live config and chooses to change or skip. Press Play for the healthy path, then Break it to see the failure.
# 1) dry run on ONE box, show the diff — changes nothing ansible-playbook -i inventory.ini site.yml --limit rtr-mum-01 --check --diff # 2) apply for real ansible-playbook -i inventory.ini site.yml --limit rtr-mum-01 # 3) run again — proof it is idempotent ansible-playbook -i inventory.ini site.yml --limit rtr-mum-01
# run 2 -> changed: [rtr-mum-01] ok=3 changed=1 failed=0 # run 3 -> ok: [rtr-mum-01] ok=3 changed=0 failed=0 # changed=0 the second time = the box already matches # desired state. That is idempotency, proven.
Priya at ICICI faces this
Priya, an L1 analyst, runs her first playbook and it stalls, then fails with a host-key error / prompt to accept the SSH key on the new lab routers.
On the very first SSH to a device, the control node has never seen its host key, and Ansible (like ssh) refuses or prompts. The playbook is fine — the connection trust is not set up.
She separates "did the task fail?" from "did the connection even open?". The error is a connection/host-key error, before any module runs, so it is an ansible.cfg / SSH trust issue, not a YAML or module bug.
control node → ansible.cfg → [defaults] host_key_checking (or env ANSIBLE_HOST_KEY_CHECKING=False)For a lab, set host_key_checking = False in ansible.cfg [defaults]. For production, pre-seed the known_hosts file with the real device keys instead of disabling the check.
Re-run ansible-playbook site.yml --limit rtr-mum-01 --check -> the connection opens, ios_facts returns ok, and the recap shows unreachable=0.
Karthik at HCL is about to run a new config playbook against 80 production routers for the first time. Which single command should he run FIRST?
Pause & Predict
Predict: you run the playbook once and get changed=1. You run the exact same playbook again with no edits. What does the PLAY RECAP show the second time, and what does that prove? Type your guess.
④ From toy to real — variables, idempotency, ad-hoc & where to grow
Hard-coding one description in one playbook is a demo. Real automation drives data. You move per-device values into group_vars and host_vars, and your tasks reference variables instead of literals. Now the same playbook configures 500 different switches, each with its own NTP server or VLAN, just by changing the data.
# group_vars/routers.yml (added)
ntp_server: 10.20.0.10
banner_text: "Authorised access only - Infosys NetOps"
# task in site.yml
- name: Enforce NTP + banner from group_vars
cisco.ios.ios_config:
lines:
- ntp server {{ ntp_server }}
- banner motd ^{{ banner_text }}^changed: [rtr-mum-01] (first run, lines added) ok: [rtr-mum-01] (second run, already present) # change ntp_server in group_vars -> next run # re-converges every router to the new value.
The reason re-running is safe bears repeating because it is the heart of the job: idempotency means "changed: false" on the second run. ios_config reads the running-config, compares your lines, and only sends what is missing. So a playbook is not a one-shot script — it is a statement of how the box should look that you can apply as often as you like.
Not everything needs a playbook. For a quick one-off — "show me the version on every router right now" — use an ad-hoc command. It is the ansible command (not ansible-playbook) with -m for the module and -a for arguments.
# gather facts from every router, one line ansible routers -i inventory.ini -m cisco.ios.ios_facts # run a show command on one box ansible rtr-mum-01 -i inventory.ini \ -m cisco.ios.ios_command -a "commands='show ip int brief'"
rtr-mum-01 | SUCCESS => {
"ansible_facts": { "ansible_net_version": "17.9.4a",
"ansible_net_model": "ISR4331", ... },
"changed": false }Symptom: the playbook reports changed=1, the config is live, but after the device reloads it is gone. Cause: ios_config changes the running-config; by default it does not save to startup-config (save_when defaults to never). Fix: add save_when: modified (or changed) to the task so Ansible copies running-config to startup-config when it makes a change. Two more first-week gotchas: a wrong ansible_network_os causes hangs/garbage, and forgetting ansible_become_password (the enable secret) means config tasks fail with an authorization error even though login worked.
Where do you grow next? Three steps. Roles package your tasks into reusable bundles. Templates (Jinja2) build a whole per-device config from variables — one template, every switch. And Ansible Vault encrypts your enable passwords and tokens so the repo is safe to commit. Those are the next lessons.
For certification, this lesson maps onto the RHCE EX294 and the broader Ansible Network track. EX294 is a hands-on exam: you write real playbooks, use inventories and group_vars, run with the right modules, and rely on idempotency — exactly what you just did. The same building blocks (inventory, connection vars, modules, check mode) are what every "Ansible Network Automation" blueprint tests, and what a netauto interviewer probes in the first ten minutes.
Cold, in 30 seconds: define agentless / declarative / idempotent; name the connection vars a Cisco router needs (ansible_connection=network_cli, ansible_network_os=cisco.ios.ios, become for enable); and say what --check --diff does and why changed=0 on a second run is good news. If you can do that without notes, you are ready for the Cisco IOS at-scale lesson and for the EX294 Ansible basics.
An interviewer asks Meera: "Give me the single biggest reason a team trusts running the same Ansible playbook against production on a nightly schedule." Best answer?
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from Ansible docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: In one line, why is it safe to run the same Ansible config playbook against a production router every single night? Then compare to the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- Ansible
- Red Hat's agentless automation tool — configures servers and network devices from a central control node over SSH/API.
- Agentless
- No software is installed on the managed device; Ansible connects over SSH/API, does its work, and disconnects.
- Declarative
- You describe the desired end state, not the step-by-step commands; the module computes and applies the difference.
- Idempotent
- Running the same playbook again makes no change if the state already matches — the second run reports changed=0.
- Control node
- The single machine with Ansible installed, from which you run playbooks against everything else.
- Inventory
- A file listing managed devices, organised into groups like [routers] and [switches]; hosts can carry per-host variables.
- group_vars / host_vars
- YAML files of variables that apply to a whole group (group_vars/routers.yml) or a single host (host_vars/rtr-mum-01.yml).
- Module
- The code that does one unit of work, e.g. cisco.ios.ios_config (write config) or cisco.ios.ios_facts (read facts).
- Collection
- A packaged bundle of modules/plugins/roles for a platform, installed from Ansible Galaxy — e.g. cisco.ios, arista.eos.
- network_cli
- The ansible.netcommon connection that drives a vendor CLI over SSH (prompt, paging, enable mode) — used instead of the Linux SSH module.
- ansible_network_os
- Variable that tells network_cli which platform it is talking to, e.g. cisco.ios.ios, arista.eos.eos, junipernetworks.junos.junos.
- become (enable)
- Privilege escalation; with ansible_become_method=enable Ansible enters Cisco privileged EXEC (#) before making config changes.
- --check / --diff
- Run flags: --check is a no-change dry run; --diff prints the exact config lines that would be added or removed.
- Ansible Vault
- ansible-vault encrypts secrets (enable passwords, tokens) so they can live safely in a version-controlled repo.
📚 Sources
- Ansible Community Documentation — "Network Getting Started: Run Your First Command and Playbook" (ansible_connection=ansible.netcommon.network_cli, ansible_network_os, gather_facts:false, the ansible-playbook example and the PLAY RECAP ok=5 changed=1). docs.ansible.com/projects/ansible/latest/network/getting_started/first_playbook.html
- cisco.ios.ios_config module — Ansible Community Documentation (parameters lines/parents/match/replace/backup/save_when with defaults match=line, replace=line, save_when=never; interface description example; network_cli requirement). docs.ansible.com/projects/ansible/latest/collections/cisco/ios/ios_config_module.html
- cisco.ios.ios_facts module + cisco.ios collection index — Ansible Community Documentation (facts prefixed ansible_net_; ansible-galaxy collection install cisco.ios pulls ansible.netcommon; become with become_method=enable). docs.ansible.com/ansible/latest/collections/cisco/ios/ios_facts_module.html · docs.ansible.com/projects/ansible/latest/collections/cisco/ios/index.html
- Ansible Network Best Practices — Ansible Community Documentation (use resource modules where possible, ios_config for the rest; test with --check --diff to review the running-config diff before applying). docs.ansible.com/projects/ansible/latest/network/user_guide/network_best_practices_2.5.html
- cisco.ios collection Releases — GitHub ansible-collections/cisco.ios (2025-2026: ios_config gains a content parameter; minimum ansible.netcommon >=8.5.2; test matrix adds stable-2.20, drops stable-2.16 libssh; ios_ action-plugin renames). github.com/ansible-collections/cisco.ios/releases · galaxy.ansible.com/cisco/ios
- Red Hat EX294 — Red Hat Certified Engineer (RHCE) exam objectives (hands-on: install/configure a control node, build inventories, use variables and group_vars, write playbooks with modules, idempotency; dynamic inventory, Vault, handlers, loops, roles). redhat.com/en/services/training/ex294-red-hat-certified-engineer-rhce-exam-red-hat-enterprise-linux
What's next?
You can stand up one playbook against one router. Next we scale it: real interface/VLAN/ACL config across a whole fleet with resource modules, loops over group_vars, and handlers that only save when something actually changed.