In a OSPF interview, structure beats memorisation — when a question stretches you, reason out loud from fundamentals instead of guessing. Use the visual cheat-sheets below to lock in the diagrams interviewers love, and note that every answer ends with a 👉 Interview tip giving the exact line to say.
Visual cheat-sheets — the whiteboard answers
OSPF Fundamentals & Theory (10)
L11. What is OSPF and what category of routing protocol does it belong to? Why is it classified as a link-state protocol?
OSPF (Open Shortest Path First) is an open-standard, interior gateway protocol (IGP) used to route inside a single organization or autonomous system. It is a link-state protocol, in contrast to distance-vector protocols like RIP.
It earns the link-state name because every router describes the state of its own directly connected links (neighbor, cost, network type) inside Link-State Advertisements (LSAs) and floods them to all routers in the area. Each router then assembles an identical Link-State Database (LSDB) — a full map of the topology.
Think of it like every router sharing its piece of a jigsaw puzzle; once everyone has all pieces, each builds the same complete picture and computes its own shortest paths independently.
👉 Interview tip: Say "IGP, link-state, uses Dijkstra/SPF on a shared LSDB" — that one line shows you understand the category and the mechanism.
L12. What is the administrative distance of OSPF, and how does it compare to EIGRP, RIP, and external BGP? Why does AD matter when the same prefix is learned from two protocols?
Administrative distance (AD) is a Cisco trustworthiness ranking — lower is more trusted. Defaults:
- OSPF = 110
- EIGRP (internal) = 90 (more trusted than OSPF)
- RIP = 120 (less trusted)
- eBGP = 20 (external BGP, more trusted than all the IGPs)
- iBGP = 200
AD only matters when the same prefix is learned from two different routing protocols. The router installs the route from the protocol with the lower AD into the routing table; the other becomes a backup. Within a single protocol, the metric (cost) decides, not AD.
Analogy: AD is which advisor you trust first; metric is how that advisor ranks the routes.
👉 Interview tip: Remember the ladder eBGP 20 < EIGRP 90 < OSPF 110 < RIP 120 < iBGP 200 — a classic quick-fire question.
L13. Which algorithm does OSPF use to calculate the best path, and at a high level what does each router feed into it? What is the role of the LSDB?
OSPF uses Dijkstra's Shortest Path First (SPF) algorithm. Each router runs SPF with itself as the root and builds a shortest-path tree to every destination in the area.
The input fed into SPF is the Link-State Database (LSDB). The LSDB is the collection of all LSAs flooded in the area, so it is effectively a complete graph: which routers exist, how they connect, and the cost of each link. Because flooding makes every router's LSDB identical, all routers in an area share the same topology view, which prevents inconsistent routing.
Analogy: the LSDB is the road map everyone shares; Dijkstra is each driver computing the cheapest route from their own house.
👉 Interview tip: Stress that SPF runs per area and is rooted at the local router — a common follow-up is "does every router compute the same tree?" (No — same LSDB, but each tree is rooted differently.)
L14. What multicast addresses and IP protocol number does OSPF use? Distinguish 224.0.0.5 (AllSPFRouters) from 224.0.0.6 (AllDRouters).
OSPF rides directly on IP as IP protocol number 89 (not TCP or UDP — no port number). For IPv6, OSPFv3 uses the equivalent Next Header 89.
It uses two reserved link-local multicast groups:
224.0.0.5— AllSPFRouters: every OSPF-speaking router listens here. Hellos and most flooding go to this address. (IPv6:FF02::5.)224.0.0.6— AllDRouters: only the DR and BDR listen. On a multi-access (broadcast) segment, non-DR routers send their updates to224.0.0.6so only the DR/BDR process them; the DR then re-floods to224.0.0.5. (IPv6:FF02::6.)
👉 Interview tip: The clean way to say it: "DROthers talk to the DR on .6, the DR talks back to everyone on .5."
L25. Why does link-state (OSPF) avoid routing loops and converge faster than a distance-vector protocol like RIP? Explain in terms of each router having the full topology versus 'routing by rumor'.
In a distance-vector protocol like RIP, a router only knows what its neighbors tell it — "network X is 3 hops that way." It never sees the real topology. This is routing by rumor: a router trusts second-hand information and can re-advertise a route back toward its source, creating loops and counting-to-infinity. RIP relies on slow safeguards (split horizon, route poisoning, hold-down timers, max hop 15) and periodic 30-second updates, so it converges slowly.
In OSPF, every router floods LSAs and builds an identical LSDB — the full topology map. Each router then runs SPF on first-hand data, so it can never be fooled into a loop by a neighbor's stale summary. When a link fails, the change is flooded immediately and SPF recomputes — convergence is event-driven, not timer-driven.
Analogy: RIP is following "turn left, someone said it's faster"; OSPF is reading the whole map yourself.
👉 Interview tip: Tie loop-freedom to "full topology = no second-hand trust," and fast convergence to "event-driven flooding + incremental SPF."
L26. How is OSPF cost/metric calculated on an interface? What is the default reference bandwidth, and why do all links of 100 Mbps or faster end up with a cost of 1 by default?
OSPF cost is the metric SPF sums along a path; the lowest total cost wins. Per interface the formula is:
cost = reference-bandwidth / interface-bandwidth
The default reference bandwidth is 100 Mbps (100,000,000 bps), equal to fast Ethernet. The result is rounded down to a minimum of 1.
So a 10 Mbps link = 100/10 = 10; a 100 Mbps link = 100/100 = 1. But for anything faster than 100 Mbps — 1G, 10G, 40G, 100G — the formula gives a value below 1, which OSPF floors to 1. That means a 1G link and a 100G link both score a cost of 1 by default, so SPF cannot tell them apart.
Analogy: a speedometer that maxes out at 100 km/h can't distinguish a car from a jet.
👉 Interview tip: Call this out as the classic reason to raise auto-cost reference-bandwidth in modern fabrics.
L27. Why and how would you change the auto-cost reference-bandwidth in a network with 1G/10G/40G links, and what must you do consistently across the domain to avoid suboptimal routing?
With the default 100 Mbps reference bandwidth, every link of 1G and above floors to cost 1, so SPF treats a 1G and a 40G link as equal — leading to suboptimal path selection. You fix this by raising the reference so fast links get distinct, meaningful costs.
On Cisco IOS: router ospf 1 then auto-cost reference-bandwidth 100000 (the value is in Mbps, so 100000 = 100 Gbps). With that reference, costs become: 1G = 100, 10G = 10, 40G = 2 (100000/40000 = 2.5, rounded down), 100G = 1 — so SPF can finally differentiate.
The critical rule: set the same reference bandwidth on every router in the OSPF domain. Cost is locally computed but compared end-to-end, so a mismatch makes one router score a link as 1 and another as 100 — producing asymmetric or suboptimal routing. IOS even warns you to keep it consistent.
Alternatively, set ip ospf cost manually per interface to override the formula entirely.
👉 Interview tip: The grading point is "set it domain-wide and identical" — not just "change it."
L28. Explain the difference between E1 and E2 external metrics. Which is the default, and when would you prefer E1 over E2 in your design?
When routes are redistributed into OSPF (e.g., from BGP, static, or another protocol) by an ASBR, they appear as external (Type 5 or Type 7) LSAs with one of two metric types:
- E2 (default): the metric is only the external cost assigned at the ASBR. The internal OSPF cost to reach the ASBR is not added, so the route's cost looks the same everywhere in the domain.
- E1: the metric = external cost plus the accumulated internal OSPF cost to reach the ASBR. It grows as you move farther from the ASBR.
Use E1 when you have multiple ASBRs advertising the same external prefix and want each router to pick the nearest exit — E1 reflects true end-to-end cost. Use E2 (default) when there's a single exit or you don't want internal cost to influence the choice. In path selection, OSPF always prefers E1 over E2 for the same prefix. (As a tie-break, if two routers advertise the same prefix as E2 with equal external costs, OSPF then compares the internal cost to each ASBR — the forward metric.)
👉 Interview tip: One-liner — "E2 = fixed external cost; E1 = external + internal; E1 for nearest-exit with multiple ASBRs."
L39. Compare OSPF and IS-IS for a large service-provider core. Discuss area model, transport (IP vs CLNS), scalability, and why an SP might pick one over the other.
Both are link-state IGPs running Dijkstra, but they differ in ways that matter at SP scale:
- Area model: OSPF puts the link/interface in an area and forces a strict hub-and-spoke around area 0 (backbone), with ABRs straddling boundaries. IS-IS puts the whole router in a level (L1 intra-area, L2 backbone); an L1/L2 router joins levels. IS-IS's design is generally seen as more flexible for flat or large single-area cores.
- Transport: OSPFv2 runs over IP (protocol 89). IS-IS runs directly over Layer 2 (CLNS/CLNP), so it isn't tied to IP and carries IPv4 and IPv6 in one instance via TLVs — and a problem in the routed protocol can't break the IGP control plane.
- Scalability: IS-IS uses extensible TLVs and tends to flood/scale more gracefully in very large single areas, which is why many tier-1 SPs run it.
SPs often pick IS-IS for the core (scale, dual-stack, L2 transport, MPLS/Segment-Routing maturity); enterprises often pick OSPF (familiarity, broad vendor support). In practice both are first-class SR/MPLS underlays today, so the choice is usually driven by scale and operational preference rather than capability.
👉 Interview tip: Lead with "router-in-area + TLVs + CLNS transport" as the three IS-IS advantages SPs cite.
L310. Walk me through OSPFv2 versus OSPFv3 differences beyond 'IPv6 support' — per-link LSAs, link-local adjacency addressing, address-family support carrying both IPv4 and IPv6 in one process, instance IDs, and the move of authentication to IPsec.
OSPFv3 (RFC 5340) is a rewrite, not just IPv6 bolted on:
- Per-link operation: OSPFv3 runs per link, not per IP subnet. Multiple subnets can share a link, and routers form adjacencies even without a common subnet.
- Link-local addressing: adjacencies and next-hops use the interface's IPv6 link-local address (FE80::/10), keeping the protocol independent of the global addressing plan.
- Topology vs prefix split: OSPFv3 removes addresses from Router/Network LSAs and carries prefixes in new Link LSA (Type 8) and Intra-Area-Prefix LSA (Type 9), so renumbering doesn't force a full SPF run.
- Address families (RFC 5838): using Instance IDs, one OSPFv3 process can carry both IPv4 and IPv6 AFs over the same adjacencies — no separate OSPFv2 needed.
- Authentication: OSPFv3 originally dropped the built-in auth field and relied on IPv6 IPsec (AH/ESP); RFC 7166 later added a native authentication trailer (HMAC-SHA), which is now the commonly deployed option.
👉 Interview tip: The headline distinction is "per-link, link-local, topology/prefix separation, multi-AF via Instance IDs, auth via IPsec or the RFC 7166 trailer."
Advanced, Modern & Design Topics (10)
L211. What is BFD and why is it now treated as a default pairing with OSPF rather than an advanced feature? What does it give you that lowering Hello/Dead timers does not?
BFD (Bidirectional Forwarding Detection, RFC 5880) is a lightweight, protocol-independent "is this link still alive?" check. It sends tiny hello packets at very short intervals (often a few milliseconds) and, the moment they stop, instantly tells OSPF the neighbor is down — driving sub-second failure detection.
It pairs with OSPF (and BGP, IS-IS, EIGRP, static) because OSPF's own dead timer is slow (default 40s on broadcast/point-to-point). You enable it per interface with ip ospf bfd (after defining BFD intervals on the interface).
Why not just lower Hello/Dead timers? Two reasons: (1) OSPF Hellos are processed in the relatively heavy routing-protocol process / control plane, so making them millisecond-fast burns CPU and risks false drops under load; BFD runs in a tiny, often hardware/dataplane-offloaded path built only for liveness. (2) BFD is shared by many protocols at once, so all of them converge fast from one mechanism.
Analogy: BFD is a dedicated heartbeat monitor; cranking Hello timers is making the busy doctor take your pulse every second.
👉 Interview tip: Stress "lightweight, dataplane-offloadable, multi-protocol" — that's why it's now standard, not exotic.
L212. Explain SPF and LSA throttling timers and LSA pacing. Why do you tune them, and what is the tradeoff between fast convergence and stability during a flap storm?
These timers control how aggressively OSPF reacts to change so it converges fast without melting the CPU during instability.
- SPF throttling (
timers throttle spf start hold max): after a topology change, the first SPF runs quickly (start, e.g. 50ms), but if changes keep coming the wait increases (doubling the hold time up to a max ceiling). So a single event is handled fast; a storm is damped to avoid back-to-back SPF runs. - LSA throttling (
timers throttle lsa all): rate-limits how often a router regenerates the same LSA, using the same start/hold/max backoff. - LSA pacing/group pacing: bundles flooding and refresh of many LSAs into timed groups instead of sending each individually, smoothing CPU and bandwidth spikes.
The tradeoff: small initial timers give fast convergence for a clean failure; the exponential backoff and pacing protect stability when a link is flapping, preventing an SPF/flood meltdown.
Analogy: react instantly to the first alarm, but stop re-running the fire drill every second once alarms keep blaring.
👉 Interview tip: Name the start/hold/max exponential-backoff model — that's what interviewers want to hear.
L213. What does OSPFv3 add structurally — the Link LSA (Type 8) and Intra-Area-Prefix LSA (Type 9), link-local addressing for adjacencies, and the separation of topology from prefix information? Why does that separation matter?
OSPFv3 deliberately splits topology from addressing. In OSPFv2, Router and Network LSAs carried both "who connects to whom" and the IP prefixes. OSPFv3 strips prefixes out of those and adds two new LSAs:
- Link LSA (Type 8): link-local scope (never leaves the link). A router uses it to tell neighbors on that link its IPv6 link-local address and the list of IPv6 prefixes configured on the link.
- Intra-Area-Prefix LSA (Type 9): carries IPv6 prefixes associated with a router or a transit network into the area, decoupled from the topology LSAs.
Adjacencies and next-hops use link-local addresses, so OSPFv3 forms neighbors regardless of the global subnet plan.
Why it matters: Router/Network (topology) LSAs no longer change when you only renumber or add/remove a prefix — so an address change refreshes a prefix LSA but does not force a full SPF run (a prefix-only change triggers a cheaper partial route calculation, not a topology SPF), improving scalability and stability. It also cleanly enables multiple IPv6 prefixes per link.
👉 Interview tip: Say "Type 8 = link-local + on-link prefixes; Type 9 = prefixes into the area; prefix-only changes don't trigger a full SPF."
L314. How does OSPF act as the IGP underlay feeding Segment Routing? Describe how Prefix-SID, Adjacency-SID, and the SRGB are advertised (Extended Prefix/Link Opaque Type 10 LSAs for SR-MPLS), and how SRv6 extends this in OSPFv3 (RFC 9513 Locator LSA, SID = Locator+Function+Argument).
In Segment Routing, OSPF is the IGP underlay: it already floods the full topology, so it's the natural place to also distribute SR segment identifiers — no separate label-distribution protocol (no LDP) needed.
SR-MPLS (OSPFv2, RFC 8665): OSPF uses new area-scoped Opaque Type 10 LSAs:
- Prefix-SID — typically a globally significant index, advertised as a sub-TLV in the Extended Prefix LSA, identifying a node/loopback; combined with the SRGB it yields the actual MPLS label.
- Adjacency-SID — locally significant, advertised as a sub-TLV in the Extended Link LSA, pointing at one specific link/next-hop.
- SRGB (Segment Routing Global Block) — the label range each router reserves; advertised via the Router Information Opaque LSA so a Prefix-SID index maps to a consistent label domain-wide.
SRv6 (OSPFv3, RFC 9513): SIDs are IPv6 addresses, not labels, carried in a new SRv6 Locator LSA. Each SID = Locator + Function (+ optional Argument) (a routable prefix plus an instruction).
👉 Interview tip: "Prefix-SID = global index + SRGB → label; Adj-SID = local, per-link; SRv6 = SID is an IPv6 Locator+Function+Argument."
L315. What is TI-LFA and why has it largely superseded classic LFA and remote-LFA for fast reroute? How do SR and OSPF combine to deliver sub-50ms, topology-independent protection?
TI-LFA (Topology-Independent Loop-Free Alternate) is an IP/SR fast-reroute mechanism that pre-computes a backup path for every destination so traffic is rerouted in under ~50ms on a local failure — before SPF even reconverges.
Classic LFA only works if a directly connected neighbor happens to offer a loop-free alternate; in many topologies (rings, square topologies) no such neighbor exists, so coverage is partial. Remote-LFA (rLFA) extends reach using a targeted tunnel to a remote "PQ" node, but still has coverage gaps and (in its original form) relies on a targeted LDP session.
TI-LFA's breakthrough: it uses Segment Routing. OSPF floods the topology and SIDs, so the router can build a backup that follows the exact post-convergence path by stacking SR labels/SIDs — no LDP, no targeted tunnels, and it guarantees a loop-free backup for any topology (hence "topology-independent"), giving near-100% coverage. Because the backup matches where traffic will land after convergence, it also avoids the transient micro-loops that simpler FRR schemes can cause.
Analogy: instead of hoping a neighbor knows a detour, you pre-program the exact final route as a label stack.
👉 Interview tip: Emphasize "post-convergence path via SR label stack → 100% coverage, no LDP" — that's the differentiator over LFA/rLFA.
L316. Discuss OSPF versus BGP as a data-center underlay for EVPN-VXLAN. When would you run OSPF unnumbered point-to-point on a leaf-spine fabric, and what are the arguments each way?
In an EVPN-VXLAN fabric, the underlay just needs to provide loopback-to-loopback (VTEP) reachability so the BGP-EVPN overlay can build tunnels.
OSPF underlay: simple, fast to converge, auto-discovers neighbors, and engineers know it well. With OSPF unnumbered point-to-point links (using ip ospf network point-to-point on /31s, or true unnumbered borrowing the loopback / using IPv6 link-local), you skip DR/BDR election, avoid per-link subnet planning, and get clean p2p adjacencies — ideal for a regular leaf-spine. Good for small/medium fabrics or teams standardizing on OSPF.
BGP underlay (eBGP per RFC 7938): the hyperscaler-favored choice. It scales to very large fabrics, gives explicit per-device policy/ASN control, and you already run BGP for the EVPN overlay — so it's one protocol end to end, easier troubleshooting and consistent operations.
The argument: OSPF = simplicity, near zero-touch neighbor discovery, fast convergence; BGP = massive scale, policy control, single-protocol stack. Large/cloud-scale fabrics lean BGP; many enterprise DCs are happy with OSPF unnumbered.
👉 Interview tip: Frame it as "underlay only carries loopbacks — pick the protocol your team operates best at that scale."
L317. Explain graceful restart / NSF and incremental SPF (iSPF). In what failure and maintenance scenarios do they preserve forwarding or reduce CPU, and what are their dependencies?
Graceful Restart (GR) / NSF (Nonstop Forwarding): on routers with separate control and data planes, the forwarding (data) plane keeps moving traffic while the OSPF control plane restarts (e.g., a supervisor/RP switchover or software restart). The restarting router signals "don't tear down our adjacency" via a Grace-LSA (RFC 3623); its neighbors act as helpers and keep advertising the link as up during the grace period, so no SPF churn ripples through the network. Scenario: planned RP failover or ISSU/control-plane upgrade — forwarding is preserved, convergence event avoided. Dependencies: hardware that separates control/forwarding planes, neighbors that support helper mode, and a stable topology during the restart (a topology change while restarting aborts GR and forces normal convergence).
Incremental SPF (iSPF): when a change affects only part of the topology, iSPF recomputes only the affected branch of the shortest-path tree instead of rebuilding the whole tree — cutting CPU on large LSDBs and speeding convergence. Dependency: it helps most for changes far from the root (leaf changes); a near-root change can still force a near-full recompute.
👉 Interview tip: GR = preserve forwarding during a control-plane restart (needs helper neighbors + stable topology); iSPF = reduce CPU by recomputing only the changed subtree.
L218. What is max-metric router-lsa / stub-router (RFC 6987) and what real operational problem does it solve when you reload or do maintenance on a transit router?
Stub-router / max-metric router-LSA (RFC 6987) lets a router advertise its transit links with the maximum metric (0xFFFF, 65535) instead of the real cost. The router is still reachable for traffic destined to it, but OSPF SPF will avoid using it as a transit (pass-through) path as long as any alternative exists.
The operational problem it solves: when a transit router reboots or comes back online, OSPF often converges and starts forwarding through it before its other protocols are ready — BGP hasn't finished loading the full table, or line cards/IGP aren't fully programmed. Traffic gets attracted to a router that then blackholes it.
You configure max-metric router-lsa on-startup <seconds> (or on-startup wait-for-bgp to hold until BGP converges) so the router stays a "last resort, don't transit me" node during the startup window, then lowers its cost to normal once it's truly ready. It's also used for graceful maintenance drain — push traffic off before a reload.
Analogy: a new toll booth opens with a "lane closed unless emergency" sign until it's fully staffed.
👉 Interview tip: Tie it to "avoid transit blackhole on reload, especially waiting for BGP to converge."
L319. OSPF is an attack surface. Discuss rogue-LSA / route-injection attacks and the defenses: HMAC-SHA authentication, passive-interface on user/edge segments, and TTL/control-plane protections. Why is passive-interface a security control and not just a tidiness setting?
Because OSPF trusts any router that forms an adjacency, an attacker on a reachable segment can inject rogue LSAs — advertising false routes to blackhole, redirect, or man-in-the-middle traffic, or flood the LSDB to exhaust CPU/memory.
Layered defenses:
- HMAC-SHA authentication (OSPFv2 RFC 5709 / OSPFv3 RFC 7166): cryptographically signs OSPF packets so an attacker without the key cannot form an adjacency or forge LSAs. Use SHA over the deprecated plaintext or the legacy MD5 (Cisco supports SHA via key chains).
- Passive-interface on user/edge/LAN segments: stops the router from sending or processing Hellos there, so no adjacency can form — yet the subnet is still advertised. This is a real security control, not tidiness, because it shrinks the attack surface: an attacker plugged into a passive segment simply cannot peer with your routers and inject LSAs. (Best practice: default all interfaces passive with
passive-interface default, then explicitly activate only true router-to-router links.) - TTL / control-plane protections: GTSM-style TTL checks and CoPP / control-plane policing / ACLs rate-limit and filter OSPF toward the CPU, blunting flooding/DoS attacks.
👉 Interview tip: Say passive-interface removes the ability to form an adjacency on untrusted segments — that's why it's defense-in-depth, not cosmetics.
L320. How would you validate and monitor OSPF state in 2026 without CLI screen-scraping — using NETCONF/YANG/gNMI streaming telemetry and Ansible/Python (Nornir, Netmiko) for config push with pre/post checks? Give an example of a pre/post check you would automate around an OSPF change.
The 2026 approach is model-driven, not screen-scraping. Instead of parsing show ip ospf neighbor text, you query structured data:
- NETCONF + YANG: pull OSPF state as structured XML/JSON against the IETF or OpenConfig OSPF YANG models — neighbor states, areas, LSDB counts — reliably parseable, no regex fragility.
- gNMI streaming telemetry: subscribe to OSPF counters/adjacency state so the device pushes changes in near-real-time to a collector (e.g., into Prometheus/Grafana via a pipeline like Telegraf or gnmic), instead of you polling.
- Automation: Ansible (cisco.ios / netconf modules) or Python with Nornir + Netmiko/NAPALM to push config at scale with idempotency and rollback.
Pre/post check example: before changing an OSPF cost or adding a link, the playbook snapshots the baseline — full neighbor list (all in FULL state), neighbor count, and the routing-table prefix count / specific routes. It applies the change, then re-collects the same data and diffs: assert every prior neighbor is still FULL, the count is unchanged, and no expected prefixes disappeared. If the diff fails, auto-rollback.
👉 Interview tip: Lead with "structured YANG/gNMI + pre/post snapshot-diff with auto-rollback" — that signals real modern NetDevOps maturity.
DR/BDR Election & Network Types (10)
L121. What is the purpose of a DR and BDR on a multi-access segment? How does electing a DR reduce the number of adjacencies from roughly n-squared to 2n?
On a multi-access segment (like an Ethernet LAN) many OSPF routers share one wire. Without control, every router would form a full adjacency with every other router and they'd all flood LSAs to each other, wasting CPU and bandwidth. The DR (Designated Router) solves this: every router forms a full adjacency only with the DR and the BDR (Backup Designated Router, a hot standby). The DR becomes the central point that collects LSAs and re-floods them to everyone on the segment.
Think of it like a class WhatsApp group with one admin: instead of every student messaging every other student privately, everyone talks to the admin who relays it.
Math: a full mesh needs n(n-1)/2 adjacencies (which grows roughly with the square of n). With a DR+BDR, each router adjoins only those two, so the count grows linearly — on the order of 2n adjacencies instead.
Interview tip: Two routers that are both DROTHERs stay in the 2-WAY state with each other and never go to FULL — they exchange Hellos but do not form a full adjacency.
L122. Explain the DR/BDR election rules: what is the default interface priority, what does a priority of 0 mean, and what is the tiebreaker when priorities are equal?
Election compares two values per router on the segment. First, OSPF interface priority (ip ospf priority), default 1, range 0–255. Higher priority wins. A priority of 0 means the router is ineligible — it will never become DR or BDR (useful to keep a weak router out of the role).
If priorities tie, the tiebreaker is the highest OSPF Router ID (RID). The RID is chosen as the manually configured router-id, else the highest loopback IP, else the highest active physical interface IP.
The highest priority/RID becomes DR, the second-highest becomes BDR; everyone else is a DROTHER.
Analogy: priority is the candidate's rank; if two candidates rank equal, the one with the higher ID number wins the seat.
Interview tip: Election checks priority FIRST and uses the RID only as a tiebreaker — a common trick question.
L223. OSPF election is described as non-preemptive. What does that mean in practice when a higher-priority router boots up after a DR has already been elected, and how do you force a re-election if you must?
Non-preemptive means once a DR/BDR is elected, a newly-arriving router with a higher priority (or higher RID) will not kick out the existing DR. It waits as a DROTHER until the current DR fails. This is by design — it avoids needless topology churn and flooding every time a 'better' router reboots.
In practice: if your core router was down during election and an access switch became DR, the core stays a DROTHER even after it comes back. To fix it, you must force a re-election. Options: bounce the segment's OSPF by clearing the process with clear ip ospf process on the affected routers, or shut/no-shut the interfaces. There's no graceful 'preempt now' command in standard OSPF.
Analogy: the elected class monitor stays monitor for the term even if a stronger candidate joins late — you'd have to hold a fresh election.
Interview tip: Best practice is to pre-set priorities BEFORE adjacencies form, so re-elections are never needed.
L124. Name the OSPF network types (broadcast, point-to-point, NBMA, point-to-multipoint, loopback). For each, state whether a DR is elected and the default Hello timer.
- Broadcast (Ethernet): DR/BDR elected. Hello
10s(Dead 40s). Neighbors found automatically via multicast. - Point-to-point (e.g. serial, GRE tunnel): No DR — only two routers. Hello
10s(Dead 40s). Neighbors found automatically. - NBMA (Non-Broadcast Multi-Access, e.g. Frame Relay): DR/BDR elected, but neighbors must be defined manually (no Layer-2 multicast). Hello
30s(Dead 120s). - Point-to-multipoint: No DR — treated as a collection of point-to-point links. Hello
30s(Dead 120s). Neighbors auto-discovered (advertises /32 host routes). - Loopback: No DR; advertised as a /32 stub host route. No Hellos and no adjacencies form on a loopback.
Interview tip: Memorize the split — broadcast and NBMA elect a DR; point-to-point, point-to-multipoint, and loopback do not. The multi-access types that elect a DR are exactly broadcast and NBMA.
L225. Why does a point-to-point link not elect a DR, while a broadcast segment does? On which network types does election happen at all?
A DR exists to cut down the adjacency mesh and flooding load on a segment where many routers share one medium. On a point-to-point link there are only ever two routers — there's nothing to optimize. They simply form a direct full adjacency and flood to each other. Adding a DR would add overhead with zero benefit, so OSPF skips election entirely.
A broadcast segment can hold many routers on one wire, so a DR is needed to act as the relay and avoid an n-squared adjacency mesh.
Election happens only on the multi-access types: broadcast and NBMA. It does NOT happen on point-to-point, point-to-multipoint, or loopback.
Analogy: a one-on-one phone call needs no moderator; a 30-person conference call does.
Interview tip: Say it crisply — 'DR election only on multi-access (broadcast + NBMA); point-to-point and point-to-multipoint never elect one.'
L226. Two routers connected back-to-back will not become Full because one side is broadcast and the other is point-to-multipoint. Why does a mismatched network type break adjacency, and how would you diagnose and fix it?
Network type controls two adjacency-affecting behaviors: whether a DR is elected, and the Hello/Dead timers. Broadcast uses Hello 10s/Dead 40s and elects a DR; point-to-multipoint uses Hello 30s/Dead 120s and elects none. Because the Hello and Dead values are carried inside the Hello packet and must match for neighbors to accept each other, this specific pairing has mismatched timers, so the Hellos are rejected and the neighbors never even reach the neighbor table — they appear missing or stuck in INIT, and never reach FULL.
Diagnose: show ip ospf neighbor shows a missing or stuck neighbor; show ip ospf interface on each side reveals different Network Type and Hello/Dead values; debug ip ospf hello shows the parameter mismatch.
Fix: set the same type on both ends, e.g. ip ospf network point-to-point (or matching broadcast / point-to-multipoint) under each interface.
Interview tip: Mismatched Hello/Dead timers, Area ID, subnet/mask, MTU, and authentication are the classic 'stuck adjacency' culprits — name them. Note the gotcha: a broadcast-to-point-to-point mismatch can still form an adjacency because both use 10s/40s timers — it is the timer values, not the type label, that block the neighbor.
L227. How do you control which router becomes the DR on a segment, and on what kind of device (e.g., a core/aggregation router versus an access switch) would you want the DR role to land? Why?
You control the DR by setting OSPF interface priority: ip ospf priority 255 on the router you want as DR, a lower value (e.g. 100) for the intended BDR, and ip ospf priority 0 on routers that should never hold the role. Priority is checked before RID, so this deterministically picks the winner. Configure it before adjacencies form (election is non-preemptive).
You want the DR to land on a stable, well-resourced core/aggregation router, not an access switch. The DR does extra work — collecting and re-flooding LSAs, and originating the Type 2 Network LSA — so it needs spare CPU/memory and high uptime. A core box is more stable and centrally placed; an access switch is lower-spec, reboots more often, and a flap there would trigger costly re-elections.
Analogy: you make your most reliable senior the team lead, not a part-time intern.
Interview tip: Set priority 0 on access devices to guarantee the DR stays on the core.
L328. In an NBMA environment (e.g., Frame Relay hub-and-spoke), why does the default broadcast-style election fail, and what configuration approaches (manual neighbors, ip ospf network point-to-multipoint, priority tuning) do you use to make it work?
NBMA (Frame Relay) is multi-access but has no Layer-2 broadcast/multicast, so OSPF can't auto-discover neighbors via multicast Hellos. Worse, in hub-and-spoke the spokes usually have no direct PVC to each other, so a spoke could be elected DR yet be unable to reach the other spokes — election fails or breaks flooding.
Fixes:
- NBMA + manual neighbors: keep the default NBMA type, define neighbors with
neighborstatements, and force the DR onto the hub (which has full reachability to all spokes) using priority — set spokes toip ospf priority 0so they can never become DR. - Point-to-multipoint:
ip ospf network point-to-multipoint— treats the cloud as a set of point-to-point links, so no DR is needed and neighbors are auto-discovered. This is the simplest, most robust approach for partial-mesh hub-and-spoke.
Interview tip: Recommend point-to-multipoint for partial-mesh — no DR worries, no manual neighbor statements, and it advertises /32 host routes for full reachability.
L329. On a modern leaf-spine routed fabric, why is OSPF point-to-point (often unnumbered) preferred over broadcast on the inter-switch links? What operational and scaling benefits does that give?
In a leaf-spine fabric every inter-switch link is a dedicated link between exactly two switches — effectively point-to-point even though the port is Ethernet. Leaving it at the Ethernet default of broadcast is wasteful: it elects a DR/BDR per link (pointless for two nodes), adds election overhead, and originates an extra Type 2 Network LSA per segment, bloating the LSDB.
Setting ip ospf network point-to-point gives: no DR election (faster, simpler convergence), fewer LSAs (only Router LSAs, no Type 2), and lower CPU/memory at scale across hundreds of links.
Unnumbered (ip unnumbered, borrowing a loopback IP) adds more: no per-link subnet to plan or burn, no /30 or /31 churn, simpler renumbering, and a smaller routing table since fabric links don't each advertise a transit subnet.
Analogy: pre-numbering every hallway is needless when each just connects two rooms.
Interview tip: Say 'P2P unnumbered = no DR, fewer LSAs, no per-link IP-address planning, faster convergence' — exactly what spine-leaf wants. (In greenfield 2026 fabrics BGP is often chosen instead, but where OSPF is used, P2P-unnumbered links are the norm.)
L230. Who originates the Type 2 Network LSA, and what is its relationship to the DR? What happens to the Type 2 LSA if the DR fails?
The Type 2 Network LSA is originated only by the DR of a multi-access (broadcast or NBMA) segment. Only the DR creates it — that's one of the DR's defining jobs. The Type 2 LSA describes the segment as a single 'pseudonode': it lists all routers currently fully adjacent on that network and is flooded within the area, letting other routers compute the SPF tree across the shared segment correctly. Its Link-State ID is the DR's interface IP address on that segment.
If the DR fails, the BDR is promoted to DR immediately (that's why a BDR exists — fast cutover with no full re-sync). The new DR re-originates a fresh Type 2 LSA under its own interface IP, and the old DR's Type 2 LSA is flushed/aged out of the LSDB.
Analogy: the DR is the segment's spokesperson; if it leaves, the deputy steps up and re-issues the roster under its own name.
Interview tip: Type 2 LSA = DR-only, flooded within the local area, Link-State ID = DR's interface IP; a new DR means a new Type 2 LSA.
LSA Types, Areas & Hierarchy (10)
L131. Why does OSPF use areas, and what is special about area 0 (the backbone)? State the rule about how all other areas must connect to area 0.
OSPF uses areas to break one big network into smaller pieces. Inside an area every router runs the full SPF (Dijkstra) calculation on the same link-state database (LSDB). If you put everything in one area, a single link flap forces every router to recalculate, the database grows huge, and CPU/memory suffer. Areas contain flooding and SPF to a local zone, so a change in one area doesn't shake the whole network.
Area 0 is the backbone — the central transit area. The golden rule: every non-backbone area must connect directly to area 0, and all inter-area traffic flows through it. Think of area 0 as the highway and other areas as towns — every town connects to the highway, not town-to-town.
- Same area = same Link-State Database
- Two non-zero areas never exchange routes directly; traffic always transits area 0
Interview tip: Say "areas limit SPF scope and LSA flooding, and area 0 is mandatory transit."
L132. Define the router roles in OSPF: internal router, backbone router, ABR, and ASBR. What makes a router an ABR versus an ASBR?
A router's role comes from which areas its interfaces touch and where its routes come from:
- Internal router — all interfaces are in one single area. It only knows that area in detail.
- Backbone router — has at least one interface in area 0.
- ABR (Area Border Router) — has interfaces in two or more areas, one of which is area 0. It sits on the border and generates Type 3 summary LSAs between areas.
- ASBR (Autonomous System Boundary Router) — injects external routes into OSPF via
redistribute(from BGP, EIGRP, static, etc.). It generates Type 5 external LSAs.
So: ABR sits between areas; ASBR sits between OSPF and another routing source. A router can be both at once.
Interview tip: ABR is about area boundaries; ASBR is about redistribution. Don't confuse them.
L133. Name the basic LSA types you would expect in a simple multi-area design (Type 1, 2, 3, 5) and say in one line who originates each.
In a normal multi-area OSPF design you'll see four core LSA types:
- Type 1 — Router LSA: originated by every router; lists its links and costs within an area (stays inside the area).
- Type 2 — Network LSA: originated by the DR (Designated Router) on a multi-access segment; lists routers attached to that segment.
- Type 3 — Summary LSA: originated by the ABR; advertises networks from one area into another (inter-area routes).
- Type 5 — External LSA: originated by the ASBR; advertises routes redistributed from outside OSPF, flooded throughout the domain.
Memory hook: 1 = me, 2 = my segment, 3 = other areas, 5 = outside world.
Interview tip: Quote "Type 1 every router, Type 2 the DR, Type 3 the ABR, Type 5 the ASBR" — clean and memorable.
L234. Walk through LSA Types 1 through 7: who originates each, what it carries, and what its flooding scope is. Be specific about Type 3 vs Type 4 vs Type 5 vs Type 7.
- Type 1 Router LSA — every router; its links/costs in an area; floods within the area only.
- Type 2 Network LSA — the DR; routers on a multi-access segment; within the area only.
- Type 3 Summary LSA — ABR; advertises inter-area networks/prefixes; floods into the adjacent area.
- Type 4 ASBR Summary LSA — ABR; advertises how to reach the ASBR (a router, not a network) to remote areas.
- Type 5 External LSA — ASBR; redistributed external routes; flooded domain-wide (except stub/NSSA areas).
- Type 7 NSSA External LSA — ASBR inside an NSSA; carries externals where Type 5 is banned; floods within the NSSA only.
Key split: Type 3 = a network in another area, Type 4 = reachability to the ASBR itself, Type 5 = an external network, Type 7 = an external born inside an NSSA.
Interview tip: Type 4 advertises a router, not a prefix — that trips people up.
L235. What is the difference between a Type 4 ASBR Summary LSA and a Type 5 External LSA, and why does an internal router in a remote area need the Type 4 to use the Type 5?
A Type 5 External LSA advertises an external network (for example, a route redistributed from BGP) and floods across the whole OSPF domain. Crucially, the Type 5 keeps its original ASBR's Router ID as the advertising router — it does not tell you how to reach that ASBR.
A Type 4 ASBR Summary LSA is generated by the ABR and advertises how to get to the ASBR itself (the ASBR as a destination router) into other areas.
Here's the catch: a router in a remote area receives the Type 5 and sees "reach this external via ASBR X." But it has no Type 1/2 for ASBR X (different area). Without the Type 4 telling it the cost/path to ASBR X, OSPF cannot compute a valid route and the external is unusable. The Type 4 supplies that missing piece.
Analogy: Type 5 is the parcel's destination; Type 4 is the address of the post office (ASBR) that sent it.
Interview tip: "Type 5 says what; Type 4 says how to reach the originator."
L236. How does a Type 7 NSSA External LSA become a Type 5, and which router performs the translation? Why does NSSA exist at all if Type 5 already does externals?
An NSSA (Not-So-Stubby Area) blocks Type 5 externals but still needs to originate externals if it has its own ASBR. So that ASBR injects them as Type 7 LSAs, which flood only inside the NSSA. To reach the rest of the domain, the NSSA ABR translates Type 7 into Type 5 at the area border, and from there it floods normally.
- If multiple ABRs exist, the one with the highest Router ID does the translation (it becomes the translator).
- Only a Type 7 with the P-bit (propagate) set is translated into a Type 5; the P-bit is normally set automatically on externals that need to leave the NSSA.
Why NSSA exists: a plain stub area gives you the benefit of no Type 5 flooding, but it also forbids any ASBR in the area. If you have a remote site that must redistribute a few local externals (a static route, a small partner link) yet still want stub-like LSA reduction, NSSA is the compromise — stub savings plus a local ASBR.
Interview tip: "Stub forbids externals; NSSA is the exception that allows a local ASBR."
L237. Compare stub, totally stubby, NSSA, and totally NSSA areas. For each, state which LSA types are blocked and what gets injected (e.g., default route). What are the rules about what an area can be (no area 0 as stub, no ASBR in a stub, no virtual link through a stub)?
- Stub: blocks Type 5 (and Type 4). Allows Type 3. The ABR injects a default route as a Type 3 so internal routers can still reach externals.
- Totally Stubby: blocks Type 3, 4, and 5. Only intra-area routes plus a single default route. Smallest database (Cisco feature,
area X stub no-summaryon the ABR). - NSSA: blocks Type 5 but allows local externals as Type 7; allows Type 3. A default route is optional, not automatic (configure
default-information-originateif needed). - Totally NSSA: blocks Type 3, 4, 5, allows Type 7, and the ABR injects a default route (Cisco
area X nssa no-summary).
Rules: area 0 can never be a stub/NSSA; a normal stub may not contain an ASBR (no redistribution — that's exactly why NSSA exists); and a virtual link cannot transit a stub/NSSA (the transit area must carry Type 5, which stub areas block).
Interview tip: "Totally = also kills Type 3; NSSA = stub that allows a local ASBR via Type 7."
L238. Where do you configure inter-area summarization versus external summarization, and why are they done on different routers (ABR 'area range' vs ASBR 'summary-address')? What discard route gets created and why?
The two summarizations happen at different boundaries because they act on different LSAs:
- Inter-area summarization is configured on the ABR with
area <id> range. It collapses intra-area (Type 1/2) routes into one Type 3 summary as they cross into another area. - External summarization is configured on the ASBR with
summary-address. It collapses redistributed externals into one Type 5 (or Type 7) before they enter OSPF.
The logic: only the ABR sees the inter-area boundary where Type 3 is built, and only the ASBR sees externals as they're redistributed — so each summarizes at the point where it owns that LSA.
When you summarize, the router installs a discard route (a route to Null0) for the summary block. This prevents routing loops and black-holing: if a packet matches the summary but no specific subnet exists, it's dropped locally instead of being forwarded back toward the source.
Interview tip: "ABR summarizes between areas (Type 3), ASBR summarizes externals (Type 5) — and both add a Null0 discard route."
L339. Design question: you have a discontiguous backbone or an area with no physical path to area 0. What does a virtual link do, why is it considered a design band-aid, and what would you do instead in a greenfield design?
OSPF requires every area to touch area 0. If an area is physically orphaned from the backbone, or area 0 itself is split into two pieces, a virtual link repairs it. A virtual link tunnels an OSPF backbone adjacency across a non-backbone transit area between two ABRs — logically extending area 0 over that transit area so backbone connectivity is restored.
It's seen as a band-aid because: it adds hidden complexity, the transit area can't be a stub/NSSA (it must carry Type 5), failures are hard to troubleshoot, it relies on the transit area's stability, and it often signals a flawed topology that grew organically.
In a greenfield design you avoid it entirely:
- Plan a physically contiguous area 0 from day one (redundant backbone links).
- Ensure every area connects directly to area 0 with dual paths for redundancy.
- If geography forces a gap, use a real transit link (or an MPLS/GRE underlay) — not a virtual link.
Interview tip: "Virtual link = temporary repair to keep area 0 contiguous; the right fix is topology, not a tunnel."
L340. How would you design area boundaries and a summarization strategy in a large network to limit SPF scope and LSA flooding? Discuss fault isolation and what a single flapping link does with and without summarization.
The goal is to keep each area's link-state database small and stop local instability from rippling out. Core design principles:
- Align areas to topology and IP addressing — assign each area a contiguous block (for example, one site =
10.10.0.0/16) so it summarizes cleanly into one Type 3. - Summarize at every ABR with
area range, and at ASBRs withsummary-address, so other areas see one prefix, not hundreds. - Use stub / totally stubby / NSSA areas at the edges to block Type 3/4/5 and reduce database size.
- Keep area 0 lean and well-connected; cap the number of routers/links per area.
Fault isolation — the flapping link: Without summarization, a flapping link in Area 1 changes its Type 1 LSA. The ABR re-floods that change, the more-specific prefix flaps in/out of every area, and routers domain-wide must rerun SPF (or at least a partial route recalculation) — CPU churn everywhere. With summarization, the specific subnet is hidden behind the summary, so as long as some path to the block survives, the summary Type 3 never changes. The flap is contained to Area 1; the rest of the network never reruns SPF for that event.
Interview tip: "Summarization is what makes OSPF scale — it converts a network-wide SPF event into a local one."
Packets, Neighbors & Adjacencies (10)
L141. Name the five OSPF packet types and state the purpose of each (Hello, DBD/DDP, LSR, LSU, LSAck).
OSPF runs entirely on five packet types, all carried directly in IP protocol 89:
- Hello (Type 1): discovers neighbors, negotiates parameters, elects the DR/BDR, and acts as a keepalive.
- DBD / DDP (Type 2): Database Description — a summary (LSA headers only) of the LSAs each router holds, exchanged so both sides learn what the other knows.
- LSR (Type 3): Link-State Request — "send me the full copy of these LSAs I'm missing or that look newer."
- LSU (Type 4): Link-State Update — carries the actual full LSAs (the real routing data).
- LSAck (Type 5): Link-State Acknowledgement — confirms receipt so OSPF's flooding stays reliable.
Think of it like swapping book collections: Hello = meeting, DBD = comparing catalogs, LSR = requesting titles, LSU = handing over books, LSAck = signing the receipt.
Interview tip: remember the order they appear: Hello, then DBD, then LSR, then LSU, then LSAck.
L142. What are the default Hello and Dead intervals on a broadcast network versus a point-to-point link? What happens if two routers have mismatched timers?
The defaults depend on the OSPF network type, not the physical media:
- Broadcast (Ethernet) and point-to-point: Hello =
10s, Dead =40s(Dead = 4 x Hello). - Non-broadcast (NBMA) and point-to-multipoint: Hello =
30s, Dead =120s.
Both routers must agree on Hello and Dead intervals, because they are carried inside the Hello packet and verified before any adjacency forms. If they mismatch, the routers see each other's Hellos but refuse to become neighbors — the adjacency simply never comes up. The neighbor will not appear (or will not stay) in show ip ospf neighbor.
Analogy: two people agreeing to text "still here" every 10 seconds — if one expects it every 30, they'll assume the other vanished.
Interview tip: mismatched timers are a classic adjacency-failure cause; mention that lowering Hello speeds convergence but costs CPU/bandwidth (and that BFD is the modern way to get sub-second failure detection without ultra-low Hellos).
L243. What is the difference between an OSPF neighbor and an adjacency? Why does every adjacency start as a neighbor but not every neighbor become an adjacency?
A neighbor is any router on the same segment with which you've successfully exchanged Hellos and agreed on the key parameters (area, timers, subnet, auth). Reaching the 2-Way state means you are neighbors — you simply know each other exists.
An adjacency is deeper: two routers that go all the way to the Full state, exchange their complete link-state databases, and synchronize. Only adjacent routers exchange routing information.
Every adjacency begins as a neighbor because you must first discover and agree (2-Way) before you can synchronize databases. But not every neighbor becomes adjacent: on a broadcast/multi-access segment, to avoid an N-squared mesh of full exchanges, routers only become Full with the DR and BDR. Two DROther routers stay neighbors at 2-Way and never go Full with each other — that's by design.
Interview tip: "All adjacencies are neighbors; not all neighbors are adjacencies" — and the reason is the DR/BDR optimization on broadcast networks.
L244. On a broadcast segment, why do two DROther routers stay stuck in the 2-Way state with each other instead of going Full, and is that a problem?
This is normal and intentional, not a fault. On a broadcast (multi-access) segment, OSPF elects a DR and BDR. Every router forms a full adjacency only with the DR and BDR; all other routers are DROther. Two DROthers deliberately stay at 2-Way with each other — they recognize one another but never synchronize databases directly.
The reason is scaling. Without a DR, n routers on one segment would build n(n-1)/2 full adjacencies and flood LSAs everywhere — an explosion of traffic and CPU. The DR acts as a central relay: DROthers send updates to the DR and BDR (via 224.0.0.6, AllDRouters), and the DR re-floods to everyone (via 224.0.0.5, AllSPFRouters).
Analogy: in a class, students don't all brief each other one-to-one; they report to the teacher, who tells the whole class.
Interview tip: if you see DROther–DROther stuck at 2-Way, say "that's expected" — don't try to "fix" it.
L245. List the parameters carried in the Hello packet that MUST match for two routers to form an adjacency (area ID, hello/dead timers, subnet/mask, authentication, stub flags). Which check happens later at the DBD stage instead of in the Hello?
For two routers to become neighbors, these Hello fields must agree:
- Area ID — both interfaces must be in the same area.
- Hello and Dead intervals — must be identical.
- Subnet / mask — interfaces must be on the same IP subnet (so masks match; exception: point-to-point links, which don't check the mask).
- Authentication — type and key must match.
- Stub area flags (E-bit / N-bit / options) — both must agree on the area type (stub, NSSA, etc.).
The MTU check is the famous exception: it is not in the Hello. It's verified during the DBD exchange in ExStart/Exchange. If MTUs differ, Hellos succeed and you become neighbors, but the adjacency gets stuck in ExStart/Exchange and never reaches Full.
Interview tip: the gotcha they want is "MTU is checked at DBD, not Hello" — explaining the classic stuck-in-Exchange symptom.
L246. Walk through the full OSPF neighbor state machine from Down to Full and explain what happens at each state, especially ExStart, Exchange, and Loading.
OSPF builds an adjacency through eight states:
- Down: no Hellos seen yet.
- Attempt: NBMA only — unicast Hellos sent to a manually configured neighbor.
- Init: a Hello received, but my own Router ID isn't yet listed in it (one-way).
- 2-Way: I see my Router ID in the neighbor's Hello (bidirectional). DR/BDR election happens here. DROther pairs stop here.
- ExStart: routers negotiate who is master/slave and the initial DD sequence number; MTU is also compared here.
- Exchange: they swap
DBDpackets describing their LSA headers. - Loading: each sends
LSRs for LSAs it lacks; the other replies withLSUs, acknowledged byLSAck. - Full: databases are synchronized — the adjacency is complete.
Interview tip: stuck in ExStart/Exchange = MTU mismatch; stuck in Init = one-way Hellos (ACL/auth/multicast filter).
L247. During ExStart, how do two routers decide who is master and who is slave, and what role do the MTU value and DD sequence number play at this stage?
In ExStart, both routers send empty DBD packets with the I (Init), M (More), MS (Master) bits set and a proposed DD sequence number. The router with the higher Router ID becomes the master; the other concedes to slave. Master matters because it controls the DD sequence numbering — it sets and increments the sequence, and the slave must acknowledge by echoing the master's sequence number. This makes the database exchange a reliable, lock-step conversation rather than two routers talking over each other.
The MTU is also carried in the DBD header here. If the receiving router's interface MTU is smaller than the value advertised, it silently rejects the DBD, so the pair never leaves ExStart/Exchange. Analogy: master/slave is deciding who calls out page numbers so both read the same book in sync.
Interview tip: higher Router ID = master (a tiebreak only for sequencing — it does not mean DR), and MTU mismatch is the classic ExStart hang.
L148. How does OSPF select its Router ID, in priority order? Once chosen, what does it take to change the Router ID, and why is manually configuring it a best practice?
OSPF picks its 32-bit Router ID in this order:
- A manually configured
router-idunder the OSPF process (always wins). - The highest IP address on an up loopback interface.
- The highest IP address on an up physical interface.
Once OSPF is running, the Router ID is sticky: even if you add a higher loopback later, the RID does not change on its own. To apply a new RID you must clear ip ospf process (or reload), which tears down all adjacencies briefly.
Manually configuring it is best practice because it's stable and predictable — it won't shift when an interface flaps or its address changes, it makes the LSDB and troubleshooting readable, and it avoids surprise re-elections. Loopbacks help too (they never go down), but an explicit router-id is the cleanest.
Interview tip: stress that the RID does not auto-update — you must reset the process, so it's a maintenance-window change.
L249. Two routers will not form an adjacency even though Hellos are arriving. What systematic checklist of mismatches do you run through to find the cause?
Hellos arriving means Layer 1–3 are fine, so it's a parameter or state mismatch. I work a checklist:
- Area ID — both interfaces in the same area?
- Subnet/mask — same subnet, matching mask (except point-to-point)?
- Hello/Dead timers — identical?
- Authentication — same type and key?
- Network type — broadcast vs point-to-point vs NBMA must be compatible.
- Stub/area flags — both agree on stub/NSSA?
- MTU — if neighbors form but stick in ExStart/Exchange, suspect MTU.
- DR priority / one-way Hello — priority 0 on both, or an ACL/multicast filter causing Init.
Tools: show ip ospf interface (timers, area, type, auth), show ip ospf neighbor (state), and debug ip ospf adj if needed.
Interview tip: frame it top-down by stuck state — Init = one-way; ExStart/Exchange = MTU; never forms = area/timers/subnet/auth.
L350. Explain OSPF authentication options from legacy (null/plaintext/MD5) to modern HMAC-SHA key chains. How does an authentication mismatch present itself, and why is authentication a real security control rather than just a checkbox?
OSPFv2 supports three legacy auth types: Type 0 (null) — none; Type 1 (plaintext) — password sent in clear, trivially sniffed; Type 2 (MD5) — a keyed cryptographic hash with a key ID and sequence number to resist replay. MD5 is now considered weak. Modern devices use HMAC-SHA with key chains (RFC 5709 cryptographic authentication, e.g. key chain + cryptographic-algorithm hmac-sha-256), giving stronger hashing and easy key rollover via overlapping keys with lifetimes. For OSPFv3, the modern path is RFC 7166 (built-in authentication trailer) rather than relying on IPsec.
A mismatch is easy to spot: Hellos are dropped, the adjacency never forms (or flaps), and you'll see debug ip ospf adj log a message such as "mismatch authentication type" or "bad authentication".
It's a genuine control because OSPF auth means a rogue device on your LAN can't inject false LSAs to blackhole or reroute traffic — it stops route-injection and some DoS attacks. The keyed hash also detects tampering in transit.
Interview tip: say auth verifies integrity and origin, not confidentiality — the packets aren't encrypted, just authenticated.
Troubleshooting & Real Scenarios (10)
L251. An OSPF adjacency is stuck in ExStart/Exchange and never reaches Full. What is the single most common cause, how does an MTU mismatch produce this exact symptom, and what are the two fixes (match MTU vs ip ospf mtu-ignore) — and why is matching MTU the better one?
The classic cause is an MTU mismatch. MTU isn't in the Hello, so routers happily become neighbors and reach ExStart. But during the DBD exchange the MTU is advertised in the packet header. If a router receives a DBD claiming an MTU larger than its own interface MTU, it silently discards it. Neither side completes the database exchange, so the pair bounces between ExStart and Exchange forever — the master keeps resending DBDs that the slave drops.
Two fixes:
- Match the MTU on both interfaces (the correct fix).
- Configure
ip ospf mtu-ignoreon the interface to skip the MTU check.
Matching MTU is better because mtu-ignore only hides the symptom — the underlying mismatch can still cause fragmentation or black-holing of large packets/LSUs, leading to subtle, intermittent failures. Fixing the MTU removes the real fault.
Interview tip: always name MTU first for ExStart/Exchange hangs, and call mtu-ignore a band-aid.
L252. On a multi-access segment, all routers are stuck in the 2-Way state and no one is forwarding inter-router traffic correctly. What does this point to about DR election, and how do you fix it?
If every router on a broadcast segment sits at 2-Way and there's no Full adjacency anywhere, it means no DR or BDR was elected — and since routers only go Full with the DR/BDR, nobody synchronizes databases. The usual cause is that every interface has OSPF priority set to 0 (ip ospf priority 0), which makes a router ineligible to become DR/BDR. With no eligible candidate, the election produces no DR, so all routers remain DROther stuck at 2-Way.
The fix: give at least one router (ideally two, for DR and BDR) a non-zero priority — ip ospf priority 100 on the preferred router — then clear ip ospf process (or bounce the interface) to re-run the election. Verify with show ip ospf neighbor: you should now see one DR, one BDR, and the rest as DROther all going Full.
Interview tip: 2-Way everywhere on broadcast = no DR; check for priority 0 first.
L153. An adjacency is stuck in the Init state — Router A sees Router B as a neighbor but B does not see A. What does one-way Hello tell you, and what are the likely culprits (ACL/firewall, authentication, multicast filtering)?
Init means a router has received Hellos from a neighbor but hasn't yet seen its own Router ID listed inside those Hellos. So Router A hears B (A is in Init), but B never hears A — communication is one-way. To leave Init and reach 2-Way, each router must see itself in the other's Hello; if A's Hellos never reach B, that never happens.
Likely culprits — anything blocking A's Hellos from reaching B:
- ACL / firewall on B's interface dropping inbound OSPF (IP protocol
89) or the multicast224.0.0.5. - Authentication mismatch — B silently discards A's unauthenticated/wrong-key Hellos.
- Multicast filtering — switch IGMP snooping/storm-control or a unidirectional link dropping
224.0.0.5.
Analogy: A can hear B on the radio, but A's transmitter isn't reaching B.
Interview tip: Init = one-way Hellos — troubleshoot the direction that's NOT being heard.
L254. Routes are flapping and you suspect a duplicate Router ID in the domain. What symptoms would you see, which show/log output confirms it, and how do you remediate without a full reload if possible?
A duplicate Router ID confuses OSPF because the RID uniquely identifies each router in the LSDB. Symptoms: flapping adjacencies, routes appearing and disappearing, intermittent reachability, and LSAs that seem to fight or get overwritten. Adjacencies may bounce or never stabilize on the segment shared by the two RID twins.
Confirm it via logs — Cisco logs %OSPF-4-DUP_RTRID1: Detected router with duplicate router ID — and by inspecting show ip ospf neighbor / show ip ospf database, where you'll see the same Router ID associated with conflicting LSAs or two devices.
Remediate by giving one router a unique RID: under the process, router-id x.x.x.x. Because the RID is sticky, apply it with clear ip ospf process on that one router — this avoids a full device reload and only briefly resets that router's OSPF, not the whole box.
Interview tip: mention the explicit DUP_RTRID syslog as the smoking gun, and that clear ip ospf process is enough — no reboot needed.
L155. Which show commands do you reach for first to triage OSPF — and what does each tell you: show ip ospf neighbor, show ip ospf interface, and show ip ospf database? When do you escalate to debug ip ospf adj / events / hello, and why be careful with debug in production?
My triage order, least invasive first:
show ip ospf neighbor— the quickest health check: who are my neighbors, what state (Full, 2-Way, Init, ExStart), and DR/BDR roles. Stuck states point straight at the problem class.show ip ospf interface— per-interface area, network type, Hello/Dead timers, cost, priority, and authentication. Best for finding parameter mismatches.show ip ospf database— the LSDB: which LSAs (Types 1–5/7) exist, their ages, and sequence numbers. Used for routing/flooding and stub/NSSA issues.
I escalate to debug ip ospf adj (adjacency formation), events, or hello only when the show commands can't explain it — e.g. a silent Hello drop.
Be careful: debugs are CPU-intensive and verbose; on a busy router they can spike CPU, flood the console, and worsen an outage. Use conditional/targeted debugs, log to buffer, and turn them off (undebug all) promptly.
Interview tip: show first, debug last — and never blanket-debug a production core router.
L256. A remote area is not receiving any external (Type 5) routes, but inter-area routes are fine. Walk me through how you'd determine whether this is intentional (stub/NSSA configuration) versus a fault, using the LSDB.
Inter-area routes working but no external (Type 5) routes is the classic signature of a stub area — and that may be entirely by design, so I confirm intent before "fixing" anything.
- Check the area type on the routers:
show ip ospftells me if the area is configured asstub/totally stubby/NSSA. Stub and totally-stubby areas block Type 5 LSAs on purpose and rely on a default route from the ABR instead. - Inspect the LSDB:
show ip ospf database externalshould be empty in a stub area — that's expected, not broken. Then confirm a default route (0.0.0.0) is present (the ABR injects it as a Type 3 summary LSA). - Verify consistency: every router in the area must agree on the stub flag (E-bit), or adjacencies won't even form.
- NSSA case: externals appear as Type 7 and get translated to Type 5 at the ABR — check
show ip ospf database nssa-external.
If it's not meant to be stub and the default is missing too, then it's a fault (ABR/redistribution/filtering).
Interview tip: no Type 5 + working inter-area + a default route present = stub by design, not a bug.
L257. Two neighbors form fine on one VLAN but refuse to form on an adjacent VLAN that looks identically configured. Describe your structured root-cause process covering area ID, mask/subnet, timers, authentication, network type, and stub-flag mismatches.
"Looks identical" is the trap — something subtle differs on the second VLAN. I compare the two interfaces side by side with show ip ospf interface vlan X on both routers and walk a structured list:
- Area ID — is the SVI in the same area on both ends? A copy-paste often lands the wrong area.
- Subnet/mask — same subnet and matching mask? A /24 vs /25 typo breaks it.
- Hello/Dead timers — identical on both SVIs?
- Authentication — per-interface key applied on the working VLAN but missing/mismatched on this one?
- Network type — one side set to point-to-point, the other broadcast?
- Stub-flag / area type — both agree the area is stub/NSSA?
- MTU — if it forms then sticks in Exchange, suspect MTU on the SVI/trunk.
I also confirm the VLAN is actually up end-to-end (trunk allowed-list, STP forwarding) and that Hellos arrive (no ACL).
Interview tip: diff the two show ip ospf interface outputs literally line by line — the mismatch is almost always one field someone assumed was the same.
L258. Traffic is taking a clearly suboptimal path even though a faster link exists. How do you confirm this is a cost/reference-bandwidth issue, and how do you correct it safely across the domain without creating a transient loop or asymmetry?
OSPF picks the path with the lowest total cost, where cost = reference-bandwidth / interface-bandwidth. The default reference is 100 Mbps, so every link at or above 100 Mbps (1G, 10G, 100G) computes a cost of 1 — OSPF can't tell a 1G link from a 100G link, so it may prefer a slower-but-shorter path.
Confirm it: show ip route x.x.x.x for the chosen path, then show ip ospf interface to read each link's cost. If your fast links all show cost 1, it's a reference-bandwidth problem.
Fix it with auto-cost reference-bandwidth 100000 (100 Gbps) so high-speed links get distinct costs. Do it safely and consistently: apply the same reference bandwidth on every OSPF router in the domain, ideally in a change window. A mismatch makes routers compute different costs, which causes asymmetric routing and micro-loops. Alternatively, set explicit per-link ip ospf cost for surgical fixes.
Interview tip: the gotcha is consistency — reference bandwidth must match everywhere, or you trade one problem for loops.
L359. After a maintenance reload, a transit router started black-holing traffic the moment OSPF came up but before BGP/forwarding was ready. What feature would have prevented this, and how do you operationalize it for future reloads?
This is the textbook case for OSPF Stub Router Advertisement (RFC 6987, which obsoleted RFC 3137 — the "max-metric router-lsa" feature). On reload OSPF converged and advertised itself as a usable transit path with normal metrics, but BGP hadn't finished learning its routes and the FIB wasn't fully programmed — so packets handed to this router were dropped (black-holed).
The fix is to make the router advertise its Router-LSA with the maximum metric (0xFFFF on transit links) on startup, so OSPF still forms adjacencies but neighbors treat it as a last-resort transit and route around it until it's truly ready. Configure: max-metric router-lsa on-startup wait-for-bgp (or max-metric router-lsa on-startup with a seconds value). wait-for-bgp holds the high metric until BGP signals convergence, then drops to normal cost.
Operationalize it: bake max-metric router-lsa on-startup wait-for-bgp into the standard config of every transit/edge router so every reload is safe automatically — no manual step.
Interview tip: name it as the OSPF stub-router / max-metric on-startup feature (RFC 6987, formerly RFC 3137), and stress wait-for-bgp for routers running both protocols.
L360. You are TAC on a large SP/enterprise network seeing intermittent full SPF runs spiking CPU and brief convergence storms. Lay out your investigation: identifying the churning LSA/area, applying SPF/LSA throttle and pacing, summarization, and using telemetry to find the flapping source — explain the reasoning, not just commands.
Repeated full SPF runs mean a topology (Type 1/2) LSA keeps changing, forcing the Dijkstra recompute — usually a flapping link or interface somewhere in an area. My reasoning, in order:
- Locate the churn:
show ip ospf statisticsreveals SPF frequency and what triggered each run;show ip ospf databasewith rising sequence numbers / low LSA ages points to the unstable LSA, and its advertising Router ID points to the culprit device/area. - Dampen the impact: apply SPF throttling (
timers throttle spf) and LSA throttling/pacing (timers throttle lsa,timers pacing flood) — exponential back-off so a flap can't trigger back-to-back SPFs and storms. This buys stability, it doesn't fix the root cause. - Contain the blast radius: use area design + route summarization at ABRs so a flap inside one area doesn't ripple as full SPFs across the whole domain — a summarized prefix hides the internal churn. (Incremental/partial SPF helps with leaf-route churn, but a flapping transit link still forces a full SPF — design is the real lever.)
- Find the flapping source: use streaming telemetry / model-driven stats and syslog plus interface error counters to pin the physical link (dirty fiber, errored interface) and fix or shut it.
Interview tip: throttling/pacing and summarization are mitigation; the real fix is killing the flap — say both.
20-minute drill: Pick one question from each section, set a 90-second timer, and answer out loud. If you can sketch the key OSPF diagram from memory and land each 👉 Interview tip, you’re interview-ready.