In a BGP interview, structure beats memorisation — when a question stretches you, reason out loud from fundamentals instead of guessing. Use the visual cheat-sheets below to lock in the diagrams interviewers love, and note that every answer ends with a 👉 Interview tip giving the exact line to say.
Visual cheat-sheets — the whiteboard answers
Path Attributes & Best-Path Selection (9)
L11. Name the three well-known mandatory BGP path attributes and what each one carries (ORIGIN, AS_PATH, NEXT_HOP).
BGP describes every route using path attributes. Three of them are well-known mandatory — every BGP router must recognise them, and they must be present in every UPDATE that carries reachable prefixes. Think of them as the route's required passport stamps.
ORIGIN— tells you how the prefix first entered BGP:i(IGP, e.g. injected by anetworkstatement),e(EGP, a legacy protocol you will essentially never see today), or?(incomplete, usually redistributed from another protocol).AS_PATH— the ordered list of AS numbers the route crossed to reach you. It serves two jobs: loop prevention (a router rejects a route whose AS_PATH already contains its own ASN) and a path-length tie-breaker in best-path selection.NEXT_HOP— the IP address you forward traffic to in order to reach the prefix.
Interview tip: Do not just list the names — add that they are mandatory in every UPDATE and recognised by all routers. That shows you understand the attribute categories, not just the labels.
L22. Classify BGP attributes into the four categories — well-known mandatory, well-known discretionary, optional transitive, optional non-transitive — and give one example of each.
BGP attributes split on two questions: must every router recognise it (well-known vs optional), and may a router that does not recognise it still pass it on (transitive vs non-transitive)? That gives four buckets:
- Well-known mandatory — every router must support it, and it must be in every UPDATE. Example:
AS_PATH(also ORIGIN and NEXT_HOP). - Well-known discretionary — every router understands it, but including it is optional. Example:
LOCAL_PREF(also ATOMIC_AGGREGATE). - Optional transitive — a router may not know it; if it does not, it forwards the attribute unchanged but sets the partial bit. Example:
COMMUNITY(also AGGREGATOR). - Optional non-transitive — if a router does not recognise it, it silently drops it rather than forwarding. Example:
MED(MULTI_EXIT_DISC).
Interview tip: MED is the classic optional non-transitive example and LOCAL_PREF is the classic well-known discretionary one — those two are asked most often, so anchor your answer on them.
L23. Recite the BGP best-path selection order from the top attribute down. Why does Weight come before Local Preference?
When BGP has multiple valid routes to the same prefix, it walks down a tie-breaker list and stops at the first attribute that differs. The standard Cisco order:
- Highest Weight (Cisco-only, local to the router, never advertised)
- Highest Local Preference (AS-wide)
- Locally originated (via
network,redistribute, oraggregate-address) - Shortest AS_PATH
- Lowest ORIGIN (
i<e<?) - Lowest MED
- eBGP over iBGP
- Lowest IGP metric to the BGP NEXT_HOP
- Oldest (longest-established) eBGP route
- Lowest BGP router-ID, then lowest cluster-list length, then lowest neighbor IP address
Why is Weight first? Weight is purely local to one router and is never advertised to any neighbor, so it is the most specific, administrator-direct override — it lets one router make a decision without affecting the rest of the AS. A handy mnemonic for the top of the list: We Love Oranges AS Oranges Mean Pure Refreshment.
Interview tip: Always state that Weight is Cisco-proprietary and that Local Preference is the cross-vendor, AS-wide equivalent.
L24. What is the difference between Weight and Local Preference in terms of scope, direction of influence, and whether they propagate to other routers?
Both push BGP toward a preferred outbound exit, but they operate at different scales:
- Weight — Cisco-proprietary; scope is a single router only. Higher wins (default 0 for learned routes, 32768 for routes the router originates itself). It is never advertised to any neighbor, not even iBGP peers. Use it to make one router prefer a link without affecting anyone else.
- Local Preference — standard BGP; scope is the entire AS. Higher wins (default 100). It is propagated to iBGP peers (but never to eBGP peers), so the whole AS converges on the same exit point.
Analogy: Weight is one teller's personal sticky-note; Local Preference is a company-wide memo every branch reads. Both steer traffic leaving your AS, not traffic coming in.
Interview tip: The one-liner to memorise — 'Weight = one router, not advertised; Local-Pref = whole AS, shared over iBGP. Both control outbound.'
L25. How do you influence INBOUND traffic into your AS versus OUTBOUND traffic out of it? Map the right attribute to each direction (Local-Pref/Weight outbound; AS-path prepend/MED inbound).
The golden rule: attributes you set locally and keep inside your AS control OUTBOUND traffic; attributes you advertise to neighbors only influence INBOUND traffic. You fully control your own exits, but you can only hint at how others reach you.
- OUTBOUND (traffic leaving your AS) — use Weight (one router) or Local Preference (AS-wide). They are evaluated at the top of best-path, so they reliably pick your exit.
- INBOUND (traffic entering your AS) — use AS-path prepending (make a path look longer and therefore worse) or MED (suggest which of your links a single neighbor AS should prefer). You can also tag routes with BGP communities the upstream agrees to honor.
Memory hook: Local-Pref for Leaving; Prepend for People coming in.
Interview tip: Stress that inbound control is only influence — the remote AS can override your prepend or MED with its own higher Local Preference.
L26. What is MED, when is it compared between routes, and why is MED comparison only done by default for routes from the same neighboring AS (bgp always-compare-med)?
MED (Multi-Exit Discriminator) is an optional non-transitive attribute used when your AS connects to a neighbor AS over two or more links. It is a hint to that neighbor about which entry point into your AS to prefer. Lower MED wins, and it is evaluated at step 6 of best-path (after ORIGIN, before the eBGP-over-iBGP step).
By default MED is only compared between routes whose first (closest) AS in the AS_PATH is the same neighboring AS. Why? MED is a relative, internal metric that each AS sets by its own logic — AS 100 might use 50 to mean 'best', while AS 200 uses 50 to mean 'worst'. Comparing across different ASes would be apples-to-oranges and could route traffic badly.
bgp always-compare-med forces comparison even across different neighboring ASes — only safe when every neighbor agrees on what a given MED value means.
Interview tip: Remember MED is non-transitive — your neighbor honors it but does not propagate it deeper into the Internet.
L27. Explain how AS-path prepending influences inbound path selection and why it is considered a 'hint' rather than a guarantee across the Internet.
AS-path prepending means adding your own AS number to the AS_PATH extra times when you advertise a prefix out a link you want used less. Since shortest AS_PATH is a best-path tie-breaker (step 4), the longer path looks worse to remote ASes, so they tend to choose the other entry point — steering inbound traffic toward your preferred link.
Example: advertise the backup link with AS_PATH 65001 65001 65001 (prepended twice) and the primary link with just 65001.
It is only a hint because AS_PATH length is only step 4 in best-path. Any remote AS that sets a higher Local Preference (step 2) on your 'backup' link will override the prepend entirely — and you cannot control another AS's policy. Analogy: you can make one road look bumpier, but drivers with their own rules may still take it.
Interview tip: Mention that Local Preference beats prepending, so prepending alone is unreliable for traffic engineering.
L38. Two equal eBGP paths exist with identical AS-path length, origin, and MED. Which tie-breakers does BGP apply next, down to lowest router-ID and lowest neighbor address? Why is the 'oldest route' rule sometimes disabled?
With Weight, Local-Pref, AS-path, ORIGIN and MED all tied, BGP continues down the list:
- Prefer eBGP over iBGP — here both are eBGP, so still tied.
- Lowest IGP metric to the BGP NEXT_HOP.
- Oldest (longest-established) eBGP route.
- Lowest BGP router-ID of the advertising peer.
- If router-IDs tie (typical with route reflectors), lowest cluster-list length.
- Lowest neighbor IP address.
The oldest-route rule prefers the path that has been up longest, which reduces churn by not switching away from a stable path. But 'oldest wins' is non-deterministic — the winner depends on the accidental order in which routes arrived, so two otherwise-identical routers can disagree, making troubleshooting harder. Operators enable bgp bestpath compare-routerid to skip the age check and fall straight through to the deterministic lowest-router-ID tie-break.
Interview tip: Frame it as 'oldest = stable but non-deterministic; router-ID = deterministic but may re-select on reconvergence.'
L19. What does the ORIGIN attribute represent (i/e/?), how is it set, and where does it sit in the best-path order relative to MED?
The ORIGIN attribute records how a prefix first entered BGP — its 'birth certificate'. It has three values, and in best-path lower is preferred:
i(IGP) — most preferred. Set when the prefix is injected with anetworkstatement or anaggregate-address.e(EGP) — middle. Set by the legacy EGP protocol, essentially never seen today.?(incomplete) — least preferred. Set when the route is redistributed into BGP from another protocol (OSPF, static, connected, etc.), so its true origin is unknown.
Preference order: i < e < ?. In the best-path list, ORIGIN is step 5 — checked after shortest AS_PATH (step 4) and before MED (step 6).
Interview tip: Remember network yields i while redistribute yields ? — a classic 'why is my route losing best-path?' troubleshooting clue.
Next-Hop, iBGP Scaling & Policy Control (9)
L210. When an eBGP route is learned and passed to iBGP peers, what happens to the NEXT_HOP attribute by default, and why can this cause the route to be unusable inside the AS?
By default, when a router learns a prefix over eBGP and advertises it to its iBGP peers, it does not change the NEXT_HOP attribute. The next-hop stays as the external peer's IP address (the address on the eBGP link), not the router's own loopback or interior address.
Inside the AS, that external next-hop usually lives on a subnet your IGP (OSPF, IS-IS, or EIGRP) does not advertise. So an iBGP peer receives the route but has no IGP path to reach the next-hop. BGP marks it inaccessible and the prefix becomes unusable — the best-path algorithm rejects any path whose next-hop is unresolvable, so it never installs in the routing table.
Think of it like getting a delivery address in a building you have no key to: the address is valid, but you cannot get there.
Interview tip: Say the rule crisply — eBGP changes next-hop, iBGP does not — then mention the next-hop must be resolvable via the IGP (recursive lookup) for the path to be a best-path candidate.
L211. What does next-hop-self do, where do you configure it, and what real failure does it prevent on iBGP-learned external routes?
next-hop-self tells a router to rewrite the NEXT_HOP attribute to its own address (typically its loopback) before sending eBGP-learned routes to iBGP peers — instead of leaving the external peer's IP.
You configure it on the border router that holds the eBGP session, applied toward its iBGP neighbors, for example neighbor 10.0.0.2 next-hop-self. Now the iBGP peers see a next-hop they can reach through the IGP (the border router's loopback is carried in OSPF or IS-IS).
The real failure it prevents: external routes stuck as inaccessible because the original eBGP-link next-hop is not advertised inside the AS. Without it you would otherwise have to redistribute the eBGP link subnets into your IGP — messy and discouraged.
Interview tip: Stress that it is configured per-neighbor on the iBGP-facing side of the border router, and that it is the cleaner alternative to leaking eBGP link subnets into the IGP.
L212. Why does iBGP require a full mesh of sessions, and what is the n(n-1)/2 scaling problem this creates in a large AS?
iBGP has a strict loop-prevention rule: a router will not re-advertise a route learned from one iBGP peer to another iBGP peer. Unlike eBGP, iBGP does not prepend the AS to the AS_PATH for internal hops, so there is no path-vector way to detect internal loops. This iBGP split-horizon rule is the substitute.
The consequence: every iBGP speaker must peer directly with every other iBGP speaker so all of them hear each route firsthand — a full mesh.
That does not scale. For n routers you need n(n-1)/2 sessions. With 10 routers that is 45 sessions; with 100 routers it is 4,950. Each router also burns CPU and memory holding all those peers — like everyone in a 100-person team having to phone everyone else individually instead of using a group chat.
Interview tip: Quote the formula and a concrete number (100 routers gives 4,950 sessions), then segue to Route Reflectors and Confederations as the fix.
L213. What are the two standard solutions to the iBGP full-mesh problem, and at a high level how do they each break the requirement?
The two standard solutions are Route Reflectors (RR) and BGP Confederations.
Route Reflectors (RFC 4456): You designate one or a few routers as reflectors. The split-horizon rule is relaxed for the RR only — it is allowed to reflect iBGP routes between its clients. Clients peer only with the RR (or RRs), not with each other, so the mesh collapses into a hub-and-spoke. Loop safety is restored with the ORIGINATOR_ID and CLUSTER_LIST attributes.
Confederations (RFC 5065): You split the one large AS into smaller sub-ASes. Inside each sub-AS you still run iBGP (full mesh or an RR), but between sub-ASes you run a special intra-confederation eBGP. Because that behaves like eBGP, the full-mesh requirement no longer spans the whole AS — only each smaller sub-AS.
Interview tip: One sentence each — RR relaxes the re-advertise rule; confederations carve the AS into eBGP-speaking sub-ASes. Note that in practice Route Reflectors are far more common than confederations.
L314. Explain how a Route Reflector works: what are clients vs non-clients, and what are the RR re-advertisement rules for routes learned from each?
A Route Reflector (RR) is an iBGP router allowed to break the normal rule that an iBGP-learned route is not re-advertised to another iBGP peer. Its iBGP peers are split into two groups:
- Clients — peers explicitly configured as
route-reflector-client. The RR plus its clients form a cluster. - Non-clients — ordinary iBGP peers (other RRs or routers) not marked as clients. The RR must stay full-mesh with non-clients.
Reflection rules, by where the best route was learned:
- From a client — reflect to all other clients and non-clients (and advertise to eBGP peers).
- From a non-client — reflect to clients only (not to other non-clients, since they hear it directly via the non-client mesh).
- From an eBGP peer — send to all clients and non-clients.
The RR is like a librarian: clients just ask it, and it decides who else gets a copy.
Interview tip: Memorize client-learned goes to everyone; non-client-learned goes to clients only.
L315. How do ORIGINATOR_ID and CLUSTER_LIST prevent routing loops in a route-reflector design, and what is the purpose of the cluster-id?
Because RRs re-advertise iBGP routes, you lose the natural split-horizon loop protection, so two optional non-transitive attributes restore it:
- ORIGINATOR_ID — the RR stamps the route with the Router-ID of the iBGP router that first injected it into the AS. If a route comes back to that originator, it sees its own Router-ID and silently discards it, breaking the loop.
- CLUSTER_LIST — each RR prepends its cluster-id as the route passes through. If an RR receives a route whose CLUSTER_LIST already contains its own cluster-id, it knows the route already looped through its cluster and drops it. It is like a passport stamped at each cluster — a repeat stamp means you have been here, so reject it.
The cluster-id identifies a cluster (an RR plus its clients). By default it is the RR's Router-ID, but you set a shared cluster-id on redundant RRs serving the same clients so their reflections are recognized as the same cluster and deduplicated.
Interview tip: ORIGINATOR_ID is the per-originator loop check; CLUSTER_LIST is the per-cluster loop check.
L316. Compare Route Reflectors versus BGP Confederations: what is a sub-AS, how does AS_CONFED_SEQUENCE work, and when would you pick one over the other?
Both solve the iBGP full-mesh problem, but differently.
Confederations split one public AS into several private sub-ASes (for example 65001 and 65002, drawn from a private ASN range such as 64512 to 65534, or 4-byte private ASNs). Inside a sub-AS you run iBGP; between sub-ASes you run intra-confederation eBGP. As a route crosses sub-AS boundaries, BGP records the sub-ASes in the AS_CONFED_SEQUENCE — a special AS_PATH segment used only inside the confederation for loop detection. When the route exits to the real internet, that segment is stripped, so outsiders see only the single public AS number.
- Route Reflectors: easiest to add to an existing network — just configure clients, with minimal redesign. The most common choice in real ISPs.
- Confederations: better when you want distinct policy or administrative domains (for example after a merger) and finer control, but they are more complex to design and troubleshoot.
Think RR as one big office with a central mailroom, versus a confederation as separate departments that courier mail between them.
Interview tip: Say Route Reflectors for simplicity and incremental deployment; confederations for policy separation, and note they can be combined (run an RR inside a sub-AS).
L217. How do prefix-lists, AS-path access-lists (regex), and route-maps differ as BGP filtering tools, and when would you reach for each?
These are three layers of BGP policy, from simple to powerful:
- Prefix-lists match on the destination prefix and its length, for example
ip prefix-list IN permit 10.0.0.0/8 le 24. Use them to permit or deny specific networks, or to block bogons. Fast and exact — the go-to for which networks. - AS-path access-lists match the
AS_PATHusing regular expressions, for exampleip as-path access-list 1 permit _65001$matches routes originated by AS 65001. Use them to filter by who the route came from or passed through — for example accept only customer-originated prefixes. - Route-maps are the Swiss-army knife: ordered if-then policy that can match on prefix-lists, AS-paths, communities, and more, and then set attributes (local-preference, MED, communities, weight). Use a route-map when you must combine matching with changing attributes, or apply traffic-engineering policy.
Interview tip: Prefix-list answers which network; AS-path ACL answers which path or origin; route-map matches and modifies (and can call the other two as match clauses).
L318. What are BGP communities? Explain the behavior of NO_EXPORT and NO_ADVERTISE, and give a real use case such as a customer-tagged blackhole, plus why large communities (RFC 8092) became necessary with 4-byte ASNs.
BGP communities are optional transitive tags attached to a prefix — a 32-bit value usually written as ASN:value (for example 65001:100) — used to group routes and signal policy between routers and ASes without listing prefixes individually. Think of them as colored stickers that say treat me this way.
Two well-known communities control propagation:
- NO_EXPORT (
65535:65281) — the route may circulate inside the AS (and across confederation sub-ASes) but is not advertised to true external eBGP peers. - NO_ADVERTISE (
65535:65282) — the route is not advertised to any peer at all, internal or external; it stays on the receiving router.
Use case — RTBH (Remotely Triggered Black Hole): A customer under DDoS tags the victim /32 with a provider-defined blackhole community (for example 65000:666). The provider's inbound route-map matches it and sets the next-hop to a discard route, dropping attack traffic at the edge — automated, with no phone call needed.
Why large communities (RFC 8092): Classic communities pack a 16-bit ASN plus a 16-bit value. With 4-byte (32-bit) ASNs, the 4-byte ASN no longer fits the 16-bit field, so you could not encode do action X for AS 4200000000. Large communities use three 32-bit fields (GlobalAdmin:LocalData1:LocalData2, conventionally ASN:function:parameter), so 4-byte ASNs can express policy cleanly.
Interview tip: NO_EXPORT means do not leave the AS; NO_ADVERTISE means do not tell anyone; then cite RTBH plus the 4-byte-ASN reason for large communities.
eBGP, iBGP & Neighbor FSM (9)
L119. What is the difference between eBGP and iBGP in terms of where peers sit and how the AS_PATH is handled across each?
eBGP is BGP between two different Autonomous Systems (different remote-as) — think two separate companies shaking hands at a border. iBGP is BGP between routers inside the same AS (same remote-as) — coworkers in one company.
- AS_PATH on eBGP: every time a route crosses an AS boundary, the sending router prepends its own ASN. This growing list is how BGP detects loops and measures path length.
- AS_PATH on iBGP: the ASN is not prepended when a route is passed between iBGP peers, because the route has not left the AS yet.
So AS_PATH grows only at external hops, accurately counting how many ASes a route traversed.
Interview tip: Say it crisply — "eBGP = different ASN, prepends AS_PATH; iBGP = same ASN, does not prepend."
L120. What are the administrative distances of eBGP and iBGP routes on Cisco (20 vs 200), and why is eBGP preferred over iBGP by default?
On Cisco, eBGP has an administrative distance (AD) of 20 and iBGP has an AD of 200. AD is a router's trust ranking — lower wins when two protocols offer the same prefix.
- eBGP's low
20beats OSPF (110), IS-IS (115), and RIP (120) — so externally learned Internet routes win. - iBGP's high
200deliberately loses to almost every IGP, so your internal IGP keeps control of internal paths.
Why prefer eBGP? An eBGP route comes directly from a neighboring AS — fresher and closer to the source. An iBGP route was relayed across your own AS, so it is effectively second-hand. Think of eBGP as news straight from the source, iBGP as news passed along by a colleague.
Interview tip: Memorize 20 / 200 — a very common one-liner question.
L121. List the six BGP finite-state-machine states in order, from session start to data exchange (Idle, Connect, Active, OpenSent, OpenConfirm, Established).
The BGP neighbor FSM moves through six states, in order:
- Idle — start; router refuses inbound connections and waits to initiate.
- Connect — TCP three-way handshake to port
179is in progress. - Active — TCP failed; router actively retries the connection.
- OpenSent — TCP is up; our
OPENmessage (ASN, hold-time, capabilities) is sent, awaiting theirs. - OpenConfirm — both OPENs exchanged and accepted; waiting for the first
KEEPALIVE. - Established — session fully up;
UPDATEmessages (prefixes) now flow.
Think of it like a phone call: dial (Connect), redial if busy (Active), exchange names (OpenSent/OpenConfirm), then talk (Established).
Interview tip: A neighbor flapping between Active and Idle almost always means a TCP/179 reachability problem, not a BGP-policy problem.
L222. Explain the iBGP split-horizon rule: why does a router NOT re-advertise an iBGP-learned route to another iBGP peer, and what problem does this create?
The iBGP split-horizon rule says: a route learned from one iBGP peer must not be re-advertised to another iBGP peer. Why? Inside an AS the AS_PATH is not prepended, so BGP loses its normal loop-detection tool. Blocking iBGP-to-iBGP re-advertisement is the safety net that stops routes looping internally.
The catch: every iBGP speaker must hear each prefix directly from a router that learned it externally. With n routers that means a full mesh of n(n-1)/2 sessions — 10 routers = 45 sessions. That scales poorly.
- Route Reflectors (RR) — a designated router is allowed to reflect routes between clients.
- Confederations — split the AS into sub-ASes that use eBGP-like rules between them.
Picture a rule that no one repeats office gossip — accurate, but everyone must hear news straight from HR.
Interview tip: Name the fix immediately: route reflectors or confederations.
L223. A neighbor is stuck in the Active state. Walk me through the likely causes and how you would troubleshoot it.
Active means BGP tried the TCP connection to port 179 and it failed, so it is retrying. It is almost always a Layer-3/Layer-4 problem, not a BGP-policy problem.
Likely causes and checks:
- No IP reachability —
pingthe neighbor; if peering on loopbacks, confirm the IGP advertises them and checkupdate-source. - Wrong neighbor IP — verify the address in
neighbor x.x.x.x remote-asmatches the peer's real source. - ASN mismatch — wrong
remote-as; compare both sides. - TCP/179 blocked — an ACL or firewall dropping port 179. Test reachability and review ACLs.
- Source-address mismatch — peer expects your loopback but sees your interface IP; fix
update-source. - eBGP multihop missing — loopback or multi-hop eBGP peering needs
ebgp-multihop.
Use show ip bgp neighbors, show ip bgp summary, and debug ip bgp to confirm.
Interview tip: Lead with "Active = TCP failed" — that framing alone shows seniority.
L224. By default eBGP peers expect a directly connected neighbor (TTL=1). What is ebgp-multihop, when do you need it, and how does GTSM/ttl-security differ from it as a security control?
By default, eBGP sets the IP TTL=1 on its packets, assuming the peer is one hop away (directly connected). If the neighbor is more than one hop — for example you peer between loopbacks or across an intermediate device — the TTL hits zero in transit and the session never forms.
ebgp-multihop nraises the outgoing TTL so packets survive up tonhops. Use it for loopback eBGP peering, multihomed setups, or a device between the two routers.
GTSM (ttl-security hops n, RFC 5082) looks similar but has the opposite intent — it is a security control. Instead of setting TTL low, it sends TTL=255 and requires inbound packets to arrive with TTL of at least (255 minus n). A spoofing attacker many hops away cannot forge a high enough TTL, so off-path attacks are dropped. Like checking a courier came from next door, not across the city.
Interview tip: Do not enable both — ttl-security and ebgp-multihop are mutually exclusive on a neighbor.
L225. What is AS-path loop prevention in eBGP, and how is loop prevention handled differently inside iBGP where the AS_PATH is not prepended?
eBGP loop prevention uses the AS_PATH. Each AS prepends its own ASN when advertising a route externally. If a router receives a route whose AS_PATH already contains its own ASN, it knows the route looped back home, so it rejects it. Simple and reliable — like a parcel refusing a return-to-sender address.
Inside iBGP, the ASN is not prepended, so AS_PATH cannot catch internal loops. Two mechanisms cover this:
- iBGP split-horizon — a route learned from one iBGP peer is never re-advertised to another iBGP peer (hence the full-mesh requirement).
- Route reflectors add the
ORIGINATOR_IDandCLUSTER_LISTattributes. A reflector drops a route whose CLUSTER_LIST already holds its own cluster-ID, and a router ignores a route carrying its own ORIGINATOR_ID — internal loop prevention without touching AS_PATH.
Interview tip: Pair the two cleanly — "AS_PATH for eBGP; split-horizon plus ORIGINATOR_ID/CLUSTER_LIST for iBGP."
L226. Why is it common to peer iBGP sessions using loopback addresses with update-source, and what dependency does that introduce on the IGP?
A loopback interface is virtual — it never goes down as long as the router is alive, even if a physical link fails. Peering iBGP between loopbacks means the TCP session survives any single-link outage as long as some path still reaches that loopback. This is valuable in a meshed core with multiple equal paths.
- Because BGP normally sources packets from the outgoing physical interface, you must tell it to use the loopback with
neighbor x.x.x.x update-source Loopback0, so the TCP source IP matches what the peer expects.
The dependency: the loopback IPs are not directly connected, so BGP cannot reach them on its own. Your IGP (OSPF/IS-IS) must advertise every loopback so each router can route to its peers' loopbacks. No IGP route to the loopback = no TCP session = no iBGP. Think of the loopback as a permanent office extension and the IGP as the internal phone directory that connects calls.
Interview tip: Always state "iBGP rides on top of the IGP" — that layered dependency is the point.
L327. Two routers reach OpenSent but the session never establishes and flaps back to Idle/Active. The peering uses loopbacks across a link with a smaller MTU. What MTU/MSS issue could cause this and how do you confirm it?
Reaching OpenSent proves the TCP handshake and small OPEN messages got through. Flapping back from there points to large packets failing — a classic MTU/PMTUD black hole. BGP's first UPDATE (or a big OPEN with many capabilities) fills a full-size segment; that large packet hits the small-MTU link, cannot fragment because the DF bit is set, and the ICMP "fragmentation needed" message gets filtered. The big packet silently drops, TCP retransmits, the hold-timer expires, and the session resets.
How to confirm:
- Sweep ping with DF set:
ping x.x.x.x df-bit size 1500— large sizes fail, small sizes succeed, which points to an MTU mismatch. - Check interface MTU on both ends and any transit link with
show interface. - Inspect the negotiated TCP MSS in
show ip bgp neighbors.
Fixes: align MTU end-to-end, or lower the BGP TCP MSS (for example with ip tcp mss) so segments fit. It is like mailing a parcel too big for one slot in the chain — it never arrives, and no one tells you.
Interview tip: The phrase "PMTUD black hole — small packets pass, big ones do not" instantly signals senior-level troubleshooting.
Troubleshooting & Real Scenarios (10)
L128. A BGP neighbor relationship will not come up at all and shows Idle. What are the first things you check (TCP/179 reachability, correct neighbor IP and remote-as, ACLs)?
Idle means BGP has not even started the connection — it is the very first state. Work bottom-up through the layers:
- Layer-3 reachability —
pingthe neighbor IP. No reply means fix routing or the interface first. - Correct neighbor IP — confirm
neighbor x.x.x.x remote-asuses the peer's real source address (loopback vs interface). - Correct remote-as — an ASN typo means the OPEN is rejected.
- TCP/179 open — make sure no ACL or firewall blocks port
179in either direction. - Neighbor configured both sides — BGP needs both ends to define each other; one side missing keeps it Idle.
- Admin state — check the neighbor is not manually shut with
neighbor x.x.x.x shutdown.
Verify with show ip bgp summary and show ip bgp neighbors.
Interview tip: Always troubleshoot bottom-up — physical, IP, TCP, then BGP config.
L229. You see a prefix in 'show ip bgp' but it is not installed in the routing table. List the common reasons and how you would confirm each.
Appearing in show ip bgp only means BGP knows the prefix — it still has to win best-path and pass installation checks. Common reasons it is not in the RIB:
- Not the best path — another path won; only the best goes to the RIB. Check for the
>marker inshow ip bgp x.x.x.x. - Next-hop unreachable — the NEXT_HOP has no IGP route, so the path is invalid. Confirm with
show ip routefor that next-hop address. - Better AD elsewhere — the same prefix from OSPF or a static route (lower AD) wins the RIB. Check
show ip route x.x.x.x. - Synchronization (legacy) — old IOS required an IGP match before installing iBGP routes; disabled by default on modern code.
- RIB failure / overlap — shown as
r RIB-Failurein the BGP table.
Like having an address in your contacts (BGP table) but no road to drive there (no valid next-hop).
Interview tip: The number-one real-world cause is next-hop unreachable — mention it first.
L230. An iBGP router learns a prefix from a route reflector but the route is marked inaccessible. How do you diagnose a next-hop-unreachable problem end to end?
The clue is the BGP next-hop rule: an eBGP-learned route keeps the original eBGP peer's next-hop even after a route reflector passes it around internally (RRs do not change next-hop by default). If your iBGP routers cannot reach that external next-hop, the path is invalid and marked inaccessible.
Diagnose end to end:
show ip bgp x.x.x.x— read the NEXT_HOP and note the "(inaccessible)" flag.show ip routefor that next-hop — confirm there is no route to it (that is the root cause).- Check whether the IGP advertises the eBGP link subnet, or whether it should.
Two clean fixes:
next-hop-selfon the ASBR's iBGP sessions — it rewrites the next-hop to its own loopback (already in the IGP). Cleanest and most common.- Advertise the external peering subnet into the IGP.
Like a treasure map pointing to a road that does not exist on your map — fix the map.
Interview tip: Say next-hop-self — it is the answer interviewers wait for.
L331. A customer complains that one of your prefixes is reachable from some parts of the Internet but not others. How do you approach this using looking glasses and AS-path inspection?
Partial reachability means the prefix is propagating to some regions of the Internet but not others — usually a routing or policy issue, not a dead server. Approach it from the outside in.
- Looking glasses and route servers — query public looking glasses (RIPE RIS, your upstreams', major IXPs) from different regions. Where the prefix is missing or has a worse AS_PATH, you have localized the blind spot.
- AS-path inspection — compare the AS_PATH seen at working vs failing vantage points. A truncated, prepended, or unexpectedly long path reveals where propagation stops or who is de-preferring you.
- Check your own announcements — confirm you advertise the prefix to all upstreams; a missing announcement or an outbound filter on one transit causes regional loss.
- RPKI / ROA validity — an invalid or missing ROA gets you dropped by RPKI-validating networks only, so reachability looks partial. In 2026, RPKI Route Origin Validation is widely deployed by tier-1s and large content networks, so this is a common cause.
- De-aggregation / filters — a more-specific being filtered, or a peer's prefix-list or max-length cap, can block some regions.
Like a shop findable from one highway but not another because a sign is missing on that route.
Interview tip: Mention RPKI validity — partial reachability from "only some networks" is a classic ROA-invalid symptom in 2026.
L232. A prefix learned via eBGP is in your BGP table but is NOT being advertised to your iBGP peers. What rule explains this and how do you fix it?
First, check whether the path is actually best and valid — BGP only advertises its best path, and a best path needs a reachable next-hop. If the eBGP route's next-hop is unreachable internally, the path is invalid, so it is not best and is never sent onward. That is the most common real cause.
The relevant rule once it is valid: BGP only advertises its best path for a prefix, and an eBGP-learned best path is normally re-advertised to iBGP peers (the split-horizon rule blocks iBGP-learned routes to other iBGP peers, not eBGP-learned ones).
So check, in order:
show ip bgp x.x.x.x— is it marked best (>)? Is the next-hop accessible?- Any outbound route-map, prefix-list, or distribute-list on the iBGP neighbor filtering it out?
- Is
next-hop-selfneeded so peers can use it?
Fix: make the next-hop reachable (often with next-hop-self) and remove the offending outbound filter.
Interview tip: Distinguish clearly — split-horizon blocks iBGP-learned routes; an eBGP-learned route should propagate unless it is not best or is filtered.
L233. A neighbor is flapping repeatedly. How do you identify the flapping session, decide between soft-reset and hard-reset, and what does clear ip bgp * soft actually do?
Identify it: watch show ip bgp summary for an Up/Down time that keeps resetting, scan show logging for repeated %BGP-5-ADJCHANGE messages, and check show ip bgp neighbors for reset reasons (hold-timer expired, notification received). Root causes are usually flapping links, MTU issues, or hold-timer expiry — fix the underlying transport, not just the symptom.
Soft vs hard reset:
- Hard reset (
clear ip bgp x.x.x.x) tears down the TCP session and rebuilds it — disruptive, drops all prefixes, only for genuinely stuck sessions. - Soft reset re-applies policy without dropping the session — non-disruptive.
What clear ip bgp * soft does: it refreshes routing policy on all neighbors without resetting them. Outbound, it re-runs your outbound policy and re-advertises. Inbound, if the peer supports the route-refresh capability, it asks the peer to resend its full table so your new inbound policy applies; otherwise it uses a locally stored soft-reconfiguration inbound copy.
It is like re-checking everyone's ID at the door without making them leave the building.
Interview tip: Stress that an inbound soft-reset needs the route-refresh capability (or pre-configured soft-reconfiguration) — that detail earns points.
L234. After applying an inbound route-map change, the routes do not update. How do you re-apply policy without tearing down the session, and what capability must the peer support?
Inbound policy is applied to routes as they arrive. After you change an inbound route-map, the routes already in your table were filtered under the old policy, so nothing changes until the peer sends them again. You do not want to bounce the session — a hard reset drops all prefixes and disrupts traffic.
The clean way: trigger an inbound soft reset:
clear ip bgp x.x.x.x soft in(or the newerclear ip bgp x.x.x.x in).
This asks the peer to resend its full set of prefixes so your new inbound policy re-evaluates them — no session teardown.
Capability required: the peer must support the BGP Route Refresh capability (negotiated in the OPEN message; check show ip bgp neighbors). If it does not, you need neighbor x.x.x.x soft-reconfiguration inbound configured ahead of time, which keeps a local unmodified copy of received routes at a memory cost.
Think of it as asking the supplier to resend the catalog so you can re-sort it under new rules — instead of closing the shop.
Interview tip: Name it exactly — "Route Refresh capability" — and note modern routers negotiate it automatically.
L335. Traffic is leaving your dual-homed AS over the wrong upstream even though both are up. Walk through how you would diagnose and correct outbound path selection.
Outbound path = which exit your AS chooses, governed by the BGP best-path algorithm. Walk it from the top:
- Find what is actually chosen: run
show ip bgp x.x.x.xfor a few destination prefixes — note the best (>) path and which neighbor it points to. - Walk the best-path order: Weight (Cisco-local) then LOCAL_PREF then locally originated then shortest AS_PATH then lowest origin then lowest MED then eBGP over iBGP then lowest IGP cost to next-hop then oldest-route / lowest router-ID tiebreakers.
- Spot the cause: usually one upstream is winning on AS_PATH length or default LOCAL_PREF.
Correct it — outbound is controlled locally, the cleanest lever being LOCAL_PREF:
- Apply an inbound route-map on the preferred upstream setting a higher
local-preference(default 100, e.g. raise to 200); it is propagated AS-wide via iBGP. - Confirm with
show ip bgpthat the best path flipped; soft-reset inbound to apply.
Like choosing which highway your delivery vans take out of the city — you set the company rule.
Interview tip: LOCAL_PREF controls OUTBOUND traffic; do not confuse it with MED / AS-prepend, which influence INBOUND.
L336. Design and troubleshoot a dual-homed multihoming setup with redundancy and traffic engineering — tie together LOCAL_PREF, MED, AS-path prepending, communities, and convergence so that one link is primary inbound and outbound with clean failover.
Goal: ISP-A primary for both directions, ISP-B hot standby, automatic failover. Remember the rule — LOCAL_PREF controls outbound (you choose your exit); AS-prepend, MED, and communities influence inbound (you hint to others how to reach you).
- Outbound primary: an inbound route-map on ISP-A sets a higher
local-preference(e.g. 200 vs 100), propagated via iBGP — all internal traffic exits ISP-A and falls to ISP-B automatically if A drops. - Inbound primary: on the ISP-B advertisement, AS-path prepend your own ASN two or three times so the world sees a longer path via B and prefers A. MED only sways a single neighbor that has multiple links to you, so prepending is more reliable across the wider Internet.
- Communities: tag routes with your upstreams' documented BGP communities to ask them to set local-pref or prepend on their side — finer inbound control.
- Convergence: use BFD plus tuned BGP timers for fast failure detection, and
next-hop-selfwith a clean IGP so the standby path is ready instantly.
Like a main road and a backup road: you drive the main out (LOCAL_PREF), and you put up signs telling visitors the main road in is shorter (prepend).
Interview tip: The crisp summary line — "LOCAL_PREF for outbound, AS-prepend/communities for inbound, BFD for fast failover" — nails this question.
L237. Your edge router suddenly receives far more prefixes than usual and CPU spikes. How would you confirm a route leak from a peer, contain it immediately, and prevent recurrence?
A sudden prefix flood with a CPU spike usually means a peer leaked routes — re-advertising prefixes it should not, often a full table or its own customer/transit cone, into your edge.
Confirm:
show ip bgp summary— see which neighbor's received-prefix count jumped.show ip bgp neighbors x.x.x.x received-routes— inspect what they are sending; look for unexpected AS_PATHs (for example, your own prefixes coming back, or a customer leaking the full DFZ).
Contain immediately:
- Apply a maximum-prefix limit —
neighbor x.x.x.x maximum-prefixwith the expected count — to auto-shut the session if it overflows. - Or hard
shutdownthe offending neighbor, or apply a tight inbound prefix-list now.
Prevent recurrence:
- Strict inbound prefix-lists and AS-path filters per peer (accept only what they should send).
- RPKI ROV to drop invalids, with maximum-prefix always set.
- Deploy RFC 9234 BGP Roles / Only-To-Customer (OTC) to automatically detect and prevent route leaks; pair with peer-locking and ASPA validation where supported.
Like a delivery dock suddenly buried in parcels meant for someone else — cap intake, then fix who is allowed to ship to you.
Interview tip: Lead with maximum-prefix for containment and mention RPKI + RFC 9234 BGP Roles for prevention — that is the modern 2026 answer.
BGP Fundamentals & Transport (9)
L138. What is BGP and why is it called a path-vector protocol? How does it differ from an IGP like OSPF?
BGP (Border Gateway Protocol) is the routing protocol that glues the Internet together. It exchanges reachability between Autonomous Systems (separately administered networks such as ISPs, large enterprises, and cloud providers). It is the only EGP (exterior gateway protocol) in real use today.
It is called a path-vector protocol because every route carries the full list of ASes it has traversed in the AS_PATH attribute, a 'vector' of AS hops. Think of it like a parcel collecting a stamp from every country it passes through. This path is also how BGP detects loops: an AS that sees its own number in the AS_PATH drops the route.
Versus an IGP like OSPF: OSPF is link-state, builds a full topology map, and picks paths by cost/metric for fast convergence inside one AS. BGP cares about policy (who you peer with, business relationships) and scales to the full Internet table (roughly 1,000,000 IPv4 prefixes in 2026), not raw speed.
Interview tip: Say 'IGP = reachability inside an AS by metric; BGP = policy-based reachability between ASes via AS_PATH.'
L139. What transport protocol and port does BGP use, and why was TCP chosen instead of BGP running directly over IP like OSPF or EIGRP?
BGP runs over TCP on port 179. The peer that initiates the connection uses a random high source port; the destination port is 179.
BGP chose TCP because it gets reliability for free: guaranteed in-order delivery, acknowledgements, retransmission, flow control, and sequencing. A full Internet table is roughly 1,000,000 prefixes, and losing or reordering updates mid-transfer would corrupt the routing table. By riding on TCP, BGP does not implement its own reliability, so the protocol itself stays simpler.
By contrast, OSPF runs directly on IP protocol 89 and EIGRP on IP protocol 88; they build their own reliability (OSPF acknowledges LSAs, EIGRP uses RTP). They can afford this because IGP neighbors are usually one hop away on a shared segment. BGP peers are often multiple hops apart, so leaning on TCP's session handling makes far more sense.
Interview tip: Remember the trio: BGP = TCP/179, OSPF = IP/89, EIGRP = IP/88.
L140. What is an Autonomous System (AS) and an ASN? Explain the difference between a 2-byte and a 4-byte ASN.
An Autonomous System (AS) is a network (or group of networks) under a single administrative and routing policy, for example one ISP, a large enterprise, or a cloud provider. It is identified globally by an ASN (Autonomous System Number), allocated by IANA through the RIRs (ARIN, RIPE NCC, APNIC, LACNIC, AFRINIC). Think of an ASN like a country code for routing on the Internet.
2-byte ASN: the original 16-bit format, range 0 to 65535 (about 64K numbers). This pool was effectively exhausted, which forced the move to 4-byte.
4-byte ASN (RFC 6793): 32-bit, range 0 to 4294967295, billions of numbers. Often written in asdot notation such as 65000.1 or in plain decimal such as 4259840001. (asdot is essentially deprecated in favor of asplain in modern operations, but you will still see it.)
For backward compatibility, a router that only understands 2-byte ASNs sees a 4-byte ASN as the reserved placeholder AS23456 (AS_TRANS), and the real value travels in the new 4-byte AS_PATH attributes.
Interview tip: If asked 'why 4-byte?', answer simply: 'the 2-byte space (64K) was exhausted.'
L141. Name the five BGP message types and state the purpose of each (OPEN, UPDATE, KEEPALIVE, NOTIFICATION, ROUTE-REFRESH).
BGP has five message types, all sharing a common 19-byte header:
- OPEN is the first message after the TCP handshake. It negotiates BGP version, ASN, hold time, BGP Identifier (router-ID), and capabilities (4-byte ASN, MP-BGP, route-refresh, graceful restart). Both peers must agree to proceed.
- UPDATE is the workhorse. It advertises new or changed prefixes (NLRI) with their path attributes, and lists withdrawn routes being removed. This is how the table is actually built.
- KEEPALIVE is a tiny header-only message (19 bytes, no payload) sent periodically to keep the session alive and stop the hold timer from expiring.
- NOTIFICATION is an error message; sending it immediately tears down the session (for example hold-timer expired, malformed attribute). It always carries an error code and subcode.
- ROUTE-REFRESH asks a peer to re-send its full advertisements so you can re-apply changed inbound policy without bouncing the session.
Interview tip: Memorize as O-U-K-N-R: 'Open negotiates, Update carries routes, Keepalive sustains, Notification kills, Route-Refresh re-asks.'
L242. What information is exchanged in a BGP OPEN message, and what must match between two peers for the session to proceed past OpenSent?
After the TCP handshake, each peer sends an OPEN message carrying: BGP version (4), My Autonomous System (the sender's ASN), the proposed Hold Time, the BGP Identifier (a 32-bit router-ID, formatted like an IPv4 address), and an Optional Parameters field holding capabilities (4-byte ASN support, MP-BGP AFI/SAFI, route-refresh, graceful restart, ADD-PATH).
To move from OpenSent to OpenConfirm, these must agree:
- ASN must match what the local router expects in its neighbor statement (
remote-as). - BGP version: both must speak version 4.
- Hold time must be 0 or at least 3 seconds; the lower of the two proposed values is then used by both peers.
- BGP Identifier must be unique; a peer cannot share your router-ID.
A mismatch triggers a NOTIFICATION and the session resets. Capabilities that one side lacks are simply not used; if a capability is treated as required and is missing, the session resets.
Interview tip: If a session is stuck in OpenSent or Active, suspect an ASN mismatch or a wrong remote-as.
L143. What is the difference between public and private ASN ranges, and what happens to a private ASN when routes are advertised toward the public Internet (remove-private-as)?
Like IP addresses, ASNs split into public and private:
- Public ASNs are globally unique, assigned by the RIRs, and used to peer on the real Internet.
- Private ASNs are reusable internally (like RFC 1918 IPs): 2-byte
64512 to 65534and 4-byte4200000000 to 4294967294. They are for internal or customer use and must never appear in the global table.
The problem: if a private ASN leaks into the AS_PATH on the Internet, multiple unrelated networks could be reusing the same number, breaking loop detection and routing. So when an ISP advertises a customer's routes upstream, it applies remove-private-as on the eBGP session toward the Internet. This strips trailing private ASNs from the AS_PATH before advertising, leaving only public ASNs.
Caveat: by default it only removes private ASNs at the end of the path; if a public ASN sits behind a private one, you may need remove-private-as all (or a replace-as variant), depending on platform.
Interview tip: Analogy: a private ASN is like a phone extension; you must replace it with a public number before dialing out.
L244. How does the BGP KEEPALIVE mechanism relate to the hold timer, and what is the default keepalive/hold-time ratio on Cisco?
BGP uses two timers to detect a dead peer. The Hold Timer is the maximum time a router will wait without hearing anything (a KEEPALIVE or an UPDATE) before declaring the neighbor dead, sending a NOTIFICATION, and tearing the session down. The KEEPALIVE interval is how often the router sends those small heartbeat messages to reset the peer's hold timer.
The relationship: the hold timer is reset every time any BGP message arrives. As long as you receive a KEEPALIVE (or an UPDATE) before the hold timer expires, the session stays up. Because an UPDATE also resets the timer, a busy session may send fewer pure KEEPALIVEs.
On Cisco IOS the defaults are keepalive = 60 seconds, hold time = 180 seconds, a 1:3 ratio. During OPEN the two peers negotiate down to the lower proposed hold time, and the effective keepalive becomes one-third of that negotiated hold time.
Interview tip: Tuning timers down (for example 3/9) speeds failure detection, but BFD is the modern, far faster way to do it.
L245. What is the role of ROUTE-REFRESH (RFC 2918) and how does it improve on the older soft-reconfiguration-inbound approach to re-applying inbound policy?
When you change an inbound policy (route-map, prefix-list, filter), the routes you already received from a peer were filtered at arrival, so you need the original, unfiltered advertisements again to re-evaluate them. Old BGP could only do a hard reset (bounce the session), which is disruptive because it drops all routes and forces a full re-convergence.
Soft-reconfiguration-inbound was the first fix: the router stores a local copy of every route received before filtering. On a policy change you run clear ip bgp x.x.x.x soft in and it re-applies policy from that local store, with no session bounce. The downside is heavy memory cost, because it duplicates the entire pre-policy Adj-RIB-In for that peer.
ROUTE-REFRESH (RFC 2918) is the modern, capability-negotiated replacement. Instead of caching locally, the router simply asks the peer to re-advertise everything. There is no extra memory cost and no session reset. It is enabled by default on modern code.
Interview tip: One line: 'Route-refresh re-asks the peer; soft-reconfig caches locally and wastes RAM.'
L246. Why does BGP need an explicit 'network' statement or redistribution to originate a prefix, unlike OSPF which advertises interfaces directly? What must be true for a 'network' statement to take effect?
OSPF is built to advertise the networks of interfaces it runs on; you enable OSPF on an interface and its subnet floods automatically. BGP is different by design: it is a policy protocol for choosing which of potentially huge numbers of prefixes you want the world to see. So BGP never auto-originates; you must explicitly tell it what to advertise, via a network statement or redistribution.
Crucially, the BGP network x.x.x.x mask y.y.y.y statement does not mean 'advertise this connected interface.' It means: 'IF an exactly matching route (same prefix and mask) already exists in the routing table (RIB), then inject it into BGP.' So for it to take effect:
- The route must already be present in the routing table (from a connected route, a static route, or an IGP).
- The prefix and mask must match exactly:
network 10.0.0.0 mask 255.255.255.0will not advertise a /24 if the table only has the covering /16.
A common fix is a ip route 10.0.0.0 255.255.255.0 null0 pull-up route so the exact prefix exists in the RIB.
Interview tip: Stress 'exact match in the RIB'; that single phrase wins the question.
Convergence, MP-BGP & BGP Security (10)
L347. What is the modern fast-convergence stack for BGP, and how do BFD, BGP PIC, and ADD-PATH each contribute to sub-second failover instead of 'just tuning the timers'?
Lowering hold timers helps detection but caps out around a few seconds and adds CPU churn. The modern stack solves detection, switchover, and path availability as three separate problems:
- BFD (Bidirectional Forwarding Detection) solves fast detection. It is a lightweight hello protocol (sub-50ms timers) that detects a link or path failure in milliseconds and signals BGP instantly, far faster than BGP keepalives. It is the trigger.
- BGP PIC (Prefix Independent Convergence) solves fast switchover. The FIB pre-installs a backup next-hop, so on failure the dataplane swaps to it in a single operation regardless of how many prefixes use that next-hop. Without PIC, the router must re-walk thousands of prefixes; with PIC, convergence time is independent of table size (PIC Core handles the IGP next-hop, PIC Edge handles the backup BGP path).
- ADD-PATH (RFC 7911) solves path availability. Normally BGP advertises only its single best path, so a backup may not even be known to a router. ADD-PATH lets a peer advertise multiple paths for a prefix, so a backup is already in the RIB, ready for PIC to use.
Analogy: BFD is the smoke detector, ADD-PATH keeps a spare key ready, and PIC is the pre-rehearsed evacuation that is instant no matter how many people are in the building.
Interview tip: Frame it as detect (BFD) + switch (PIC) + have-a-backup (ADD-PATH).
L248. What is route dampening? Define penalty, suppress-limit, reuse-limit, and half-life, and explain why aggressive dampening fell out of favor on the Internet.
Route dampening (flap damping) suppresses a prefix that keeps going up and down ('flapping') so the instability does not ripple across the Internet and burn CPU everywhere.
The mechanics:
- Penalty: points added each time a route flaps (a withdrawal typically adds about 1000). The penalty accumulates with each flap and then decays over time.
- Suppress-limit: when the penalty climbs above this threshold, the route is suppressed (not used or advertised). Cisco default 2000.
- Reuse-limit: as the penalty decays below this value, the route is un-suppressed and usable again. Cisco default 750.
- Half-life: how long it takes the penalty to decay to half its value while the route is stable. Cisco default 15 minutes.
- Max-suppress-time caps the total suppression time. Cisco default 60 minutes.
Why it fell out of favor: RIPE (RIPE-378, then the revised RIPE-580) found that with the old default values a single real flap could suppress a perfectly healthy prefix for up to an hour, because penalties accumulate along the path and over-punish. It hurt availability more than it helped. Modern guidance is to disable it or use the far more lenient RIPE-580 values.
Interview tip: Mention RIPE-580; it signals you know current operational practice.
L249. What does maximum-paths enable, and what conditions must equal-cost eBGP (and separately iBGP) paths meet to be installed as multipath?
By default BGP installs only one best path per prefix. The maximum-paths N command enables BGP multipath, load-sharing traffic across multiple equal paths to the same destination, improving bandwidth use and resilience.
For paths to be eligible as multipath, BGP first runs the best-path algorithm; the candidate paths must then match the chosen best path on the early decision attributes. The key matching conditions:
- Same weight and same local-preference.
- Same AS_PATH length (and, by default, the same AS_PATH content, unless
bgp bestpath as-path multipath-relaxis configured to allow equal-length-but-different paths). - Same origin code and same MED.
For eBGP multipath: paths must be learned from the same neighboring AS (unless multipath-relax is enabled). For iBGP multipath (maximum-paths ibgp N): the paths must additionally have the same IGP metric to the BGP next-hop, because internal paths are compared by how far the next-hop is in the IGP.
Interview tip: The phrase 'as-path multipath-relax' is gold for multi-homed or DC fabric questions.
L350. Explain MP-BGP and the role of AFI/SAFI. How does the same BGP session carry IPv4, IPv6, VPNv4, and L2VPN information?
Original BGP only carried IPv4 unicast. MP-BGP (Multiprotocol BGP, RFC 4760) extends it so a single BGP session can carry routing for many address families. It does this with two new path attributes, MP_REACH_NLRI (reachable prefixes) and MP_UNREACH_NLRI (withdrawals), which wrap address-family-tagged prefixes inside ordinary UPDATE messages.
The tagging uses two fields:
- AFI (Address Family Identifier), the protocol family: 1 = IPv4, 2 = IPv6, 25 = L2VPN.
- SAFI (Subsequent AFI), the sub-type: 1 = unicast, 2 = multicast, 128 = MPLS VPN (VPNv4/VPNv6), 70 = EVPN.
So 'IPv4 unicast' = AFI 1 / SAFI 1, 'IPv6 unicast' = AFI 2 / SAFI 1, 'VPNv4' = AFI 1 / SAFI 128, and 'EVPN' = AFI 25 / SAFI 70. Peers negotiate which AFI/SAFI pairs they support as a capability in the OPEN message, then exchange all of those families over the same TCP/179 session.
Analogy: one delivery truck (the BGP session) carries differently-labeled parcels (AFI/SAFI), each sorted to the right warehouse (address-family table).
Interview tip: Be ready to recite AFI 1 = IPv4, 2 = IPv6, 25 = L2VPN and SAFI 1 = unicast, 128 = MPLS-VPN, 70 = EVPN.
L351. In an MPLS L3VPN, what is the difference between a Route Distinguisher (RD) and a Route Target (RT), and which one actually controls VRF import/export?
In MPLS L3VPN, multiple customers can use the same overlapping IP ranges (everyone uses 10.0.0.0/8). Two BGP constructs keep them straight, and they do different jobs:
- Route Distinguisher (RD) is a 64-bit value prepended to the IPv4 prefix to form a globally unique 96-bit VPNv4 route (RD:prefix). Its only job is uniqueness: it lets two customers' identical 10.1.1.0/24 routes coexist in the same BGP table without colliding. The RD does not decide who receives the route.
- Route Target (RT) is an extended community attached to the route that controls VRF import/export policy. A PE exports routes tagged with one or more RTs, and imports routes whose RT matches its import list. This is what actually builds the VPN topology (full-mesh, hub-and-spoke, extranet).
So the answer to 'which controls import/export?' is the Route Target. The RD is purely for uniqueness; the RT is the membership and policy knob.
Analogy: the RD is the unique parcel barcode; the RT is the mailing label that decides which mailboxes it is delivered to.
Interview tip: The classic trap is candidates saying 'RD controls VPN membership.' It does not; the RT does.
L352. Describe EVPN as a BGP control plane. What are the five route types, and when would you use Type-2 (MAC/IP) versus Type-5 (IP-prefix)?
EVPN (Ethernet VPN, RFC 7432, with IP-prefix Type-5 defined in RFC 9136) uses MP-BGP (AFI 25 / SAFI 70) as a control plane to distribute MAC and IP reachability, replacing the old 'flood-and-learn' behaviour of traditional L2VPN. It underpins modern data-center VXLAN fabrics: MAC and IP learning happens in BGP, not by flooding.
The five common route types:
- Type-1: Ethernet Auto-Discovery (per-ES and per-EVI), used for multihoming, fast withdrawal, and aliasing.
- Type-2: MAC/IP Advertisement, which advertises a specific host's MAC (and optionally its IP), enabling host reachability and ARP/ND suppression.
- Type-3: Inclusive Multicast Ethernet Tag, which builds the BUM (broadcast, unknown-unicast, and multicast) flooding tree per VNI.
- Type-4: Ethernet Segment route, used for multihoming and designated-forwarder (DF) election.
- Type-5: IP Prefix route, which advertises an IP subnet or prefix rather than a single host, for routing between subnets or to external networks.
Type-2 vs Type-5: use Type-2 for host-level reachability inside a stretched L2 domain (a specific host MAC plus /32 IP). Use Type-5 for prefix or subnet-level routing where you do not need per-host detail, for example advertising a summarized subnet, reaching silent hosts, or connecting to an external/legacy network.
Interview tip: One-liner: 'Type-2 = host MAC/IP; Type-5 = IP prefix/subnet.'
L353. What is BGP hijacking? Distinguish an origin hijack from a path-manipulation hijack, and explain why origin validation alone cannot stop the latter.
BGP hijacking is when an AS advertises prefixes it is not authorized to originate (or a more-attractive path to them), so traffic is misrouted to it, for interception, blackholing, or spam. Because BGP trusts what it is told, a malicious or compromised advertisement can divert global traffic.
- Origin hijack: the attacker AS originates a prefix it does not own (it claims to be the source). Example: announcing someone else's /24, or a more-specific /25 to win on longest-prefix-match. The forged route shows the wrong originating AS.
- Path-manipulation hijack: the attacker leaves the legitimate origin AS intact at the end of the AS_PATH but inserts or forges intermediate hops (or fakes a peering relationship) so traffic flows through it. The origin still looks correct.
Origin validation (RPKI/ROV) only checks that the origin AS (and prefix length) match the authorization for that prefix. A path-manipulation attack keeps the real origin at the end of the path, so ROV sees it as Valid and lets it through. ROV cannot verify that the intermediate AS hops are real; that requires path validation (ASPA, or the heavier BGPsec).
Interview tip: Crisp line: 'ROV validates the origin AS, not the path; path forgery needs ASPA or BGPsec.'
L354. Explain RPKI Route Origin Validation: what do the valid, invalid, and not-found states mean, and why do operators still accept not-found routes today (roughly 40% of the table)?
RPKI Route Origin Validation (ROV) uses signed ROAs (Route Origin Authorizations), cryptographic records in the RPKI that state 'AS X is authorized to originate prefix P up to a maximum length L.' A router checks each received route's origin AS and prefix length against its validated ROA cache and assigns one of three states:
- Valid: a covering ROA exists and the route's origin AS and prefix length match it. Trustworthy; prefer it.
- Invalid: a covering ROA exists for the prefix but the origin AS is wrong or the prefix is more-specific than the ROA's maxLength allows (a likely hijack or misconfiguration). Best practice: drop/reject.
- NotFound (also called Unknown): no covering ROA exists for that prefix, so RPKI cannot say anything either way.
Operators still accept NotFound because RPKI adoption, while now past the halfway mark (ROA coverage crossed roughly 50% of the global table in 2025 and is still climbing in 2026), still leaves a large share, on the order of 40%, with no ROA. If you dropped NotFound today, you would lose reachability to a big chunk of the Internet. So the pragmatic policy is: reject Invalid, accept NotFound, prefer Valid. As ROA coverage grows, the NotFound share keeps shrinking.
Interview tip: Say the policy in one breath: 'drop Invalid, keep NotFound for now, prefer Valid.'
L355. Beyond RPKI/ROV, what additional BGP security controls would you deploy — max-prefix limits, prefix filtering, TCP-AO/MD5, GTSM — and where does ASPA fit for path validation?
RPKI/ROV only validates the origin, so a layered, defense-in-depth approach is needed:
- Prefix filtering: explicitly permit only the prefixes a peer or customer is allowed to send (built from IRR and RPKI data). This blocks leaks and unauthorized announcements and is a MANRS baseline.
- Max-prefix limits: cap how many prefixes a peer can send; if the cap is exceeded, warn or tear down the session. This stops accidental full-table leaks from a misconfigured customer.
- AS_PATH filtering: restrict acceptable AS_PATH patterns (for example, a customer should not transit a tier-1).
- TCP-AO (RFC 5925) / MD5 (TCP-MD5, RFC 2385): authenticate the BGP TCP session itself so an attacker cannot inject into or reset it. TCP-AO is the modern, stronger replacement for the legacy MD5 option.
- GTSM (RFC 5082, the TTL-security mechanism): for directly-connected eBGP, require TTL = 255 on arrival; spoofed multi-hop packets arrive with a lower TTL and are dropped. This defends against off-path TCP attacks.
ASPA (Autonomous System Provider Authorization) fits at the path-validation layer that ROV cannot cover. Signed ASPA records declare each AS's legitimate upstream providers, so a router can detect path forgeries and route leaks (a path that violates the valley-free provider/customer relationship). ASPA is still an IETF draft in the SIDROPS working group rather than a finalized RFC, but it has moved into production: RIPE NCC and ARIN added ASPA support to their RPKI services in late 2025 and early 2026, with other RIRs following. It complements ROV: ROV checks the origin, ASPA checks that the path is plausible.
Interview tip: Group it as origin (ROV) + path (ASPA) + session (TCP-AO/GTSM) + sanity (max-prefix/filters).
L256. What is a maximum-prefix limit on a peer, what happens when it is exceeded, and why is it a basic but critical defense against accidental full-table leaks?
A maximum-prefix limit (neighbor x.x.x.x maximum-prefix N) sets a ceiling on how many prefixes you will accept from a specific BGP peer. You size it a bit above what that peer should legitimately send; a customer that owns a few /24s might get a limit of, say, 100, not 1,000,000.
When the received count exceeds the limit, the default behavior is to send a NOTIFICATION and tear the session down (the peer is then usually held down and must be reset, optionally with a restart timer that brings it back automatically after a delay). You can soften this with a warning-only mode, or set a threshold percentage (for example, log a warning at 75% before hitting the hard cap).
Why it is critical: the classic disaster is a small customer or stub AS that accidentally re-advertises the full Internet table (roughly 1,000,000 prefixes) back to you, or leaks it upstream (a route leak, often via a misconfigured 'route optimizer'). Without a cap, your router tries to install all of it, exhausting memory and FIB, spiking CPU, and propagating the leak globally; this is exactly how several large outages happened. A max-prefix limit contains the blast radius automatically by dropping the offending session before it floods you.
Interview tip: Call it 'the cheapest, highest-value BGP safety knob'; every eBGP session should have one.
20-minute drill: Pick one question from each section, set a 90-second timer, and answer out loud. If you can sketch the key BGP diagram from memory and land each 👉 Interview tip, you’re interview-ready.