Most engineers think…
Most engineers think "if BFD says the tunnel is up, the link is fine — and if I want voice on the good link, I just set the preferred colour." So they ship a one-line policy and assume voice is protected.
Wrong — and the helpdesk tickets will tell you. "Up" only proves the pipe exists; it says nothing about loss, latency or jitter. An MPLS circuit at 4% loss is up and useless for voice. Real AAR is an SLA class (the thresholds), a match (which traffic), a preferred colour (the first choice) AND a deliberate fall-through action (what to do when nothing meets the SLA — ECMP, a named backup colour, or strict drop). Skip any one of those and the policy quietly does the wrong thing.
① The problem AAR solves — "up" is not "good enough"
Here is the trap that catches every new SD-WAN engineer. Your branch in Pune has two transports — an MPLS circuit and a cheap biz-internet link. BFD says both tunnels are up. A user complains that Microsoft Teams audio is robotic. You check — the tunnel is up. So what is wrong? The MPLS circuit is dropping 4% of packets during a carrier brownout. It is alive but useless for voice.
That gap — between "the link works" and "the link is good enough for THIS application" — is exactly what App-Aware Routing closes. AAR doesn't just check liveness; it continuously measures loss, latency and jitter on every tunnel and forwards each app on a path that meets that app's SLA. Voice gets a clean path; a backup file copy can tolerate the lossy one.
The analogy every Indian engineer gets instantly: dual-SIM failover on your phone. Both SIMs show full signal bars (both "up"). But you start a UPI payment and SIM 1's data is crawling — packets are there, but slow and lossy. A smart phone would notice the payment is timing out and push it over SIM 2. AAR is that smart phone for your branch: it doesn't care that the bars are full, it cares whether this transaction is actually getting through cleanly. The second analogy: a toll expressway vs the service road running beside it. The service road is "open" (up), but it's potholed and slow. For an ambulance (voice) you want the smooth expressway even if it costs a toll; for a goods truck (a backup job) the bumpy free road is fine.
How does AAR know a tunnel is lossy? BFD does double duty. Besides detecting liveness, the BFD hello packets (default interval 1000 ms) carry timestamps, so the edge measures round-trip latency, jitter and packet loss on every tunnel. At each poll interval (default 10 minutes), the edge averages those numbers — by default over the last 6 poll intervals (the multiplier) — and re-evaluates which tunnels meet each SLA class. Those defaults are slow on purpose (stability over twitchiness), and they are the #1 thing you tune for voice.
Four ideas that anchor the whole lesson
Tap each card — these are the vocabulary hooks you'll reuse in every later section.
BFD liveness only proves the pipe exists. Loss/latency/jitter decide if it's usable. AAR routes on usable, not on alive.
A named bundle of max loss, latency and jitter — e.g. VOICE = 2% loss / 150 ms / 30 ms. The yardstick a tunnel must pass.
Match the app (app list / DSCP / prefix), map it to an SLA class, set a preferred colour and a fall-through. The whole policy in one breath.
AAR is your phone silently moving a stuck UPI payment to the other SIM — it watches the transaction, not the signal bars. So: route on quality.
Rahul at Infosys says: "Both my tunnels show BFD up, but voice is choppy on the MPLS path. AAR is configured. Why might voice still be stuck on the bad link for several minutes?"
Pause & Predict
Predict: if you crank the poll interval all the way down to catch brownouts instantly, what new problem are you likely to create? Type your guess.
② SLA classes + app-route policy — thresholds, match, colours
AAR is built from three pieces: a SLA class list (the thresholds), an app-route policy (match traffic → map to the class → set colours), and the action for when reality doesn't cooperate. You build all of it inside the centralized policy on the Controller, then push it. The vManage path to define the class is worth memorising: Configuration > Policies > Centralized Policy > Custom Options > Lists > SLA Class > New SLA Class List.
An SLA class is just three numbers with a name. A sensible VOICE class might be loss ≤ 2%, latency ≤ 150 ms, jitter ≤ 30 ms. A BUSINESS-DATA class is looser — loss ≤ 5%, latency ≤ 300 ms. A tunnel "meets" the class only if it satisfies all the thresholds you set (leave a value blank and it isn't checked). Cisco also ships predefined classes you'll recognise from QoS: realtime, full-mesh, low-loss-low-latency.
policy
sla-class VOICE_SLA
latency 150
loss 2
jitter 30
!
app-route-policy STEER_VOICE
vpn-list CORP_VPNS
sequence 10
match
app-list MS_TEAMS
!
action
sla-class VOICE_SLA preferred-color biz-internet
backup-sla-preferred-color mpls
!
!
!
!% SLA class "VOICE_SLA" created (loss 2 / latency 150 / jitter 30) % app-route-policy "STEER_VOICE" seq 10: match app-list MS_TEAMS % action: prefer biz-internet if SLA met; fall back to mpls only if NO tunnel meets SLA Configuration committed. Apply to sites via apply-policy site-list BRANCHES.
Notice the match options in that sequence. You don't have to match by application name — you can match by DSCP (e.g. EF for voice), by source/destination prefix, by DSCP plus app — whatever identifies the traffic cleanly. App lists ride on NBAR2 deep-packet recognition, so "MS_TEAMS" really does mean Teams, not just a port number.
Now the part exam writers love: the four actions that decide what happens when tunnels do — or don't — meet the SLA. They are a fall-through ladder. sla-class alone: ECMP-forward across every tunnel that meets the class. preferred-color: try the named colour first if it meets the SLA, otherwise any tunnel that does. backup-sla-preferred-color: a named colour used only when no tunnel meets the SLA. strict: if no tunnel meets the SLA, drop the packet. The trap: by default (no strict), when nothing meets the SLA, traffic is ECMP'd across all tunnels anyway — degraded but delivered.
Symptom: during a brief brownout, voice goes completely silent — not choppy, silent — and recovers by itself. Cause: someone set the action to strict, so when no tunnel met the SLA for a poll interval, AAR dropped the packets instead of using a degraded path. Fix: use strict only when a degraded path is genuinely worse than no path (rare). For voice, prefer backup-sla-preferred-color so the call survives on a fallback colour. Remember you can't set both strict and backup in the same action.
Pause & Predict
Predict: you match Salesforce by DSCP EF and bind it to a VOICE_SLA class, but the rule never seems to act on Salesforce. What's the most likely reason the match fails? Type your guess.
Sneha at TCS needs Webex to ride MPLS when MPLS is clean, but to keep flowing on biz-internet even when NEITHER link meets the SLA (a degraded call beats a dropped one). Which action does she set?
③ Data policy vs AAR — engineering the path by hand
AAR picks the best of the tunnels you already have. Centralized data policy is the other tool: it engineers the path by hand, regardless of SLA. Think of AAR as "send voice on whichever lane is cleanest" and data policy as "force ALL guest traffic out the local ISP, no matter what."
The headline data-policy use cases an L2 engineer meets daily: Direct Internet Access (DIA) — send branch internet/guest traffic straight out the local ISP instead of hairpinning to a data centre (the dabbawala dropping a tiffin at the nearest station instead of routing every box through Churchgate). NAT — translate that DIA traffic to the WAN edge's public IP. Service chaining / service insertion — steer flows through a firewall or IPS sitting in a service VPN before they leave. set-TLOC / local-TLOC-color — pin a flow to a specific transport, overriding the natural path.
policy
data-policy BRANCH_DIA
vpn-list GUEST_VPN
sequence 20
match
source-ip 10.20.4.0/24
!
action accept
nat use-vpn 0
!
!
default-action accept
!
!
apply-policy
site-list BRANCHES
data-policy BRANCH_DIA from-service
!
!% data-policy "BRANCH_DIA" seq 20 matches guest subnet 10.20.4.0/24 % action accept + nat use-vpn 0 -> exit locally via transport VPN 0 (DIA) % applied from-service to site-list BRANCHES Guest traffic now egresses the branch ISP (203.0.113.10) instead of the DC.
So when AAR and data policy both touch a flow, which one wins? The order of operations on a WAN Edge is: local ingress policy → app-aware routing → centralized data policy. AAR is evaluated first, but a data policy that matches the same flow can overwrite AAR's path decision. The important nuance Cisco documents: when AAR's preferred colours don't meet the SLA but some data-policy colours do, those are used; and if no transport colour meets the SLA, the data-policy colours always take precedence. In plain words — design them as a team, not as rivals.
▶ Follow one Teams packet through AAR, then a guest packet through data policy
Watch the WAN Edge make two different decisions on two different flows, step by step. Press Play for the healthy path, then Break it to see the failure.
Aditya at HCL faces this
Aditya, an L2 engineer, set a beautiful AAR policy to keep Salesforce on the low-latency biz-internet colour. But a separate data policy he forgot about is sending ALL of that VPN's traffic out the MPLS TLOC — and Salesforce keeps using MPLS no matter what AAR says.
Order of operations. AAR is evaluated first, but the centralized data policy matches the same flow and OVERWRITES the path decision with its set-TLOC / colour action. Two correct policies, applied to the same traffic, fighting each other.
He checks what the edge actually received from the Controller and which policy is acting on the flow, rather than assuming AAR is the only thing touching it.
vManage → Monitor → Devices → (branch cEdge) → Real Time → 'Policy From vSmart' and 'App Route Statistics'; CLI: show sdwan policy from-vsmart and show sdwan policy service-path vpn 10 interface ... source-ip ... dest-ip ...Scope the data policy so it doesn't match the Salesforce flow (tighten its match, or sequence Salesforce to 'accept' with no TLOC override before the catch-all), letting AAR's SLA decision stand for that app.
show sdwan policy service-path now returns the biz-internet TLOC for the Salesforce 5-tuple, and App Route Statistics shows Salesforce on the SLA-met colour; guest/other traffic still follows the intended data policy.
Pause & Predict
Predict: an architect says "just put everything in data policy, skip AAR — it's simpler." What capability do they lose? Type your guess.
Meera at Airtel sees a guest flow that AAR was supposed to leave alone, but a centralized data policy is NATting it out the local ISP. Both policies match the flow. Which statement about the order of operations is correct?
④ QoS on the edge + Cloud OnRamp
AAR chooses which tunnel; QoS decides what happens inside that tunnel when it's congested. They're partners: AAR avoids the lossy link, QoS protects voice on the link you're actually using. Edge QoS is four moves in order — classify (map traffic to a forwarding class), queue (drop it in the right hardware queue), schedule (decide who gets served first), shape (cap the rate per transport so you don't overrun the circuit).
The hardware facts to memorise: every interface has 8 hardware queues, numbered 0–7. Queue 0 is always the Low-Latency Queue (LLQ) — that's where voice goes, and it's serviced first (priority). Queues 1–7 carry user-defined classes and are scheduled by default with WRR. You build this with localized data policy (the QoS map + class maps), then bind it to the interface, and you can shape per transport so a 50 Mbps broadband link is never overrun. On a hub, per-tunnel QoS shapes each spoke separately so one busy branch can't starve the rest.
show sdwan app-route stats remote-system-ip 10.0.0.12 show sdwan bfd sessions show policy-map interface GigabitEthernet0/0/1
app-route stats: tunnel biz-internet loss 0.2% latency 22ms jitter 4ms -> VOICE_SLA: met
tunnel mpls loss 4.1% latency 19ms jitter 6ms -> VOICE_SLA: NOT met
bfd sessions: 10.0.0.12 biz-internet up 10.0.0.12 mpls up (both UP — only SLA differs)
policy-map GigabitEthernet0/0/1: class VOICE (Q0/LLQ) 0 drops; class BULK (Q5/WRR) 318 drops (shaped)Last piece: Cloud OnRamp. Cloud OnRamp for SaaS continuously probes the path to apps like Microsoft 365 and Webex from each exit (DIA at the branch vs via the data centre/gateway) and steers each app out the lowest-loss, lowest-latency exit — AAR's quality idea, extended to the cloud's front door. Cloud OnRamp for Multicloud automates SD-WAN gateways into AWS, Azure and GCP so branches reach cloud-hosted apps over the fabric instead of the open internet. You enable both from Configuration > Cloud OnRamp in vManage.
Worth knowing for a 2025–2026 deployment: classic AAR reacts on the slow BFD poll cycle (minutes). Enhanced Application-Aware Routing (EAAR), introduced around 20.12, measures performance from real data-plane packets (not just control-plane BFD), adds a small metadata header (so watch MTU), and includes SLA dampening to stop flapping — giving sub-minute, even ~1-second, reroutes for voice. You can spot EAAR-enabled sessions with show sdwan bfd sessions alt. This is the modern answer to "AAR is too slow for voice" — instead of fighting the poll timers, you let inline measurement do it.
For any app-routing task ask three questions: (1) What's the SLA? loss/latency/jitter the app tolerates → that's your SLA class. (2) How do I match it? app-list / DSCP / prefix. (3) What if no path qualifies? ECMP (default), a named backup colour, or strict-drop — pick on purpose. Then separately: AAR chooses the tunnel, QoS protects traffic inside it, data policy engineers exits (DIA/NAT/service-chain), and Cloud OnRamp extends quality to SaaS. Almost every config maps onto that grid.
Symptom: voice is marked EF at the LAN but lands in the default queue (Q1–7) on the WAN, getting dropped under load. Cause: the tunnel/transport re-marked or didn't trust the DSCP, so your QoS class-map never matched EF. Fix: confirm the edge trusts/preserves the inner DSCP (and that the SLA/QoS match uses the value you actually see on the WAN side). No queue can prioritise traffic it never classified.
Take a real ask — "keep Teams clean across both transports and don't let backups starve voice." Name the SLA class (VOICE: 2%/150ms/30ms), the match (app-list MS_TEAMS or DSCP EF), the action (preferred-color + backup-sla, not strict), the QoS (voice in Q0/LLQ, backups shaped in a WRR queue), and where you'd verify (show sdwan app-route stats, show policy-map interface). If you can do that cold, you're ready for the 300-415 and the branch.
Karthik at Flipkart must guarantee voice is served first even when a 100 Mbps branch link is saturated by a backup job. Which QoS placement is correct?
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from Cisco SD-WAN docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: In one line, why can a tunnel be "up" yet still need AAR to move your voice traffic off it? Then compare to the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- App-Aware Routing (AAR)
- Centralized-policy feature that forwards each app onto a tunnel currently meeting that app's SLA (loss/latency/jitter), not just any 'up' tunnel.
- SLA class
- A named set of max loss, latency and jitter a tunnel must satisfy to carry a class of traffic; leave a value blank and it isn't checked.
- App-route policy
- The centralized policy that matches traffic (app-list/DSCP/prefix), maps it to an SLA class, and sets preferred colours and a fall-through action.
- Preferred colour
- The first-choice transport an app uses IF it meets the SLA; otherwise AAR falls through to any SLA-met tunnel.
- backup-sla-preferred-color
- A named transport used ONLY when no tunnel meets the SLA — pins traffic to a fallback instead of ECMPing everywhere.
- strict (AAR action)
- Drops the packet when no tunnel meets the SLA; the only AAR action that drops. Cannot be combined with backup-sla in one action.
- BFD
- Bidirectional Forwarding Detection — per-tunnel hello probe (default 1000 ms) that detects liveness AND feeds AAR's loss/latency/jitter measurements.
- Poll interval / multiplier
- AAR averages BFD measurements over the poll interval (default 10 min) across the last 6 intervals before re-deciding; tune down for voice.
- Enhanced AAR (EAAR)
- From ~20.12: measures from real data-plane packets (not just BFD), adds a metadata header (watch MTU) and SLA dampening for sub-minute reroutes.
- Centralized data policy
- Controller-pushed policy that engineers the path by hand — DIA, NAT, service chaining, set-TLOC, drop/count/mirror; can overwrite AAR's choice.
- DIA (Direct Internet Access)
- Send branch internet/guest traffic straight out the local ISP (often with NAT) instead of hairpinning to a data centre.
- QoS queues / LLQ
- Each interface has 8 hardware queues (0–7); Queue 0 is the Low-Latency Queue serviced first (voice); Q1–7 use WRR by default.
- Cloud OnRamp
- SaaS variant probes the best exit to apps like M365/Webex; Multicloud variant automates SD-WAN gateways into AWS/Azure/GCP.
📚 Sources
- Cisco Catalyst SD-WAN Policies Configuration Guide, IOS XE 17.x — "Application-Aware Routing" (SLA class = max loss/latency/jitter; BFD probes; poll interval default 10 min averaged over 6 intervals; actions sla-class / preferred-color / backup-sla-preferred-color / strict; strict and backup cannot be combined). cisco.com/c/en/us/td/docs/routers/sdwan/configuration/policies/ios-xe-17/policies-book-xe/application-aware-routing.html
- Cisco Catalyst SD-WAN Policies Configuration Guide — "Enhanced Application-Aware Routing" (EAAR from ~20.12: inline data-plane measurement, metadata header/MTU, SLA dampening; show sdwan bfd sessions alt). cisco.com/c/en/us/td/docs/routers/sdwan/configuration/policies/ios-xe-17/policies-book-xe/m-enhanced-application-aware-routing.html
- Cisco Catalyst SD-WAN Forwarding and QoS Configuration Guide, IOS XE 17.x — "Forwarding and QoS" + "Per-Tunnel QoS" (8 hardware queues 0–7, Queue 0 = LLQ, default WRR, localized policy classify/schedule/shape, per-tunnel QoS hub on IOS XE). cisco.com/c/en/us/td/docs/routers/sdwan/configuration/qos/ios-xe-17/qos-book-xe/forwarding-qos.html
- Cisco Community — "Understand BFD Protocol Relationship with App-Aware Routing" (BFD detects link failure and gathers loss/latency/jitter; poll-interval vs BFD hello sampling trade-off; tuning the poll interval for voice). community.cisco.com / cisco.com/c/en/us/support/docs/routers/sd-wan/221604-understand-bfd-protocol-relationship-wit.html
- NetworkAcademy.IO (CCIE Enterprise / SD-WAN) — "Configuring Application-Aware Routing Policies" & "AAR alongside Data Policy" (action semantics; order local-ingress → AAR → data policy; data policy can overwrite AAR; when no colour meets SLA, data-policy colours take precedence). networkacademy.io/ccie-enterprise/sdwan/aar-alongside-data-policy
- Daniels Networking Blog (lostintransit.se) — "Catalyst SD-WAN Enhanced Application Aware Routing" (EAAR 20.12, ~10–60s aggressive reaction, dampening, 12-byte metadata header). lostintransit.se/2024/02/19/catalyst-sd-wan-enhanced-application-aware-routing/
- Cisco ENSDWI 300-415 exam blueprint — Domain: Policies (configure app-aware routing, SLA classes, centralized data policy, QoS, Cloud OnRamp). learningnetwork.cisco.com/s/ensdwi-exam-topics
What's next?
You can now steer apps by SLA and shape them with QoS. The final lesson ties it together with branch security, Direct Internet Access hardening, and a real troubleshooting playbook for when control connections, OMP or AAR misbehave.