Your CISO asks: "if a user uploads a file to a personal OneDrive account from a corporate laptop, which ZIA control can tell personal OneDrive apart from corporate OneDrive and block the personal one?"

Correct: (b). Tenant awareness is the defining feature of CASB Inline. URL Filtering sees only the domain (onedrive.com — same for both). File Type Control filters by MIME. SSL Inspection bypass would actually remove the ability to see inside the request. Tenant restriction in CASB Inline lets you say "allow only tenant=corp-MS-tenant-id, block all other Microsoft 365 logins on this device".

A regulator asks for evidence of every file in your Microsoft 365 tenant that contains PCI data and has a public share link, plus proof you revoked the share. Which ZIA capability gives you this?

Correct: (c). "Data at rest" + "share link metadata" + "retroactive revoke" all point to the SaaS Security API (OOB CASB). Inline DLP only sees data crossing the tunnel right now — files uploaded last year never passed through it, so it cannot enumerate them. URL Filtering and File Type Control don't operate on resting SaaS objects.

Your SOC complains DLP is firing 50,000 alerts/day on a "Credit Card Number" dictionary rule, drowning real incidents. Most fire on developer log files. What's the right fix?

Correct: (b). Composite dictionaries with proximity are the textbook false-positive fix. Add EDM if you need "only OUR customers" matching. (a) creates a compliance gap. (c) is too broad — dev environments do touch real data sometimes. (d) doesn't address the dictionary pattern at all.

You want DLP to match only YOUR 1.2 million customer card numbers, not random 16-digit numbers. Which engine?

Correct: (a). EDM is the structured-row-match engine — exactly the "match only OUR data" use case. The hash-on-upload approach means the PSE never sees plaintext customer data. (b) matches any Luhn-valid 16-digit number, including test cards. (c) regex still matches the pattern, not the values. (d) IDM is for full documents (M&A drafts, board decks), not row-based data.

A user pastes three paragraphs of last quarter's confidential board deck into ChatGPT. The board deck itself was never uploaded — just an excerpt. Which engine has any chance of catching this?

Correct: (b). IDM's shingle / rolling-hash model is purpose-built for partial-document leaks — a paragraph or two is enough if the threshold is set low (e.g. 30% for trade secrets). EDM is for structured rows, not free text. PCI is the wrong content class. File Type Control can't read semantics. Lower IDM thresholds for high-secrecy docs; raise them for legal/templated content.

Your CASB OOB connector for Microsoft 365 silently stopped scanning three weeks ago. Compliance only noticed when an external audit asked for last month's scan report. Root cause?

Correct: (d). OAuth token expiry is the #1 silent failure mode of OOB CASB. The fix is two-part: re-authorise immediately and add monitoring on connector health so it never silently dies again. (a) would have impacted Inline DLP not just OOB. (c)/(b) only affect the inline path; OOB talks to SaaS directly and is unaffected by tunnel state.

You uploaded the customer database to EDM six weeks ago. New customers have onboarded daily since. A new customer's card number is pasted into Gmail and the rule does NOT fire. Why?

Correct: (c). EDM is a snapshot — fresh customer data needs a fresh upload. Operationalise this with a cron-driven pipeline + an alert on staleness. (a)/(b)/(d) are possible in other scenarios but the symptom "new customer specifically slips through" is the classic EDM-staleness signature.

You want to allow source-code pushes from your engineering team to GitHub Enterprise (corporate) but block them to public github.com. Which configuration is correct?

Correct: (a). The same content / different destination / different action pattern. CASB Inline's tenant awareness is what lets ZIA tell GH Enterprise apart from public GH on the same parent domain. Order matters because ZIA uses first-match. (b) is hostile. (c) is too broad. (d) creates a giant exfil hole.

A new DLP rule is going live next week to block PCI uploads to all personal webmail. You want minimum disruption. What's the recommended rollout sequence?

Correct: (d). The "Confirm before Block" pattern is the textbook safe rollout — it gives SecOps the false-positive data, gives users the training, and gives compliance the audit trail. (a) generates Monday-morning chaos and a flood of tickets. (c)/(b) don't address the underlying tuning need.

Insights shows a "Blocked" DLP event. The match preview in the log displays the full credit card number in plaintext. Compliance is concerned. Correct posture?

Correct: (c). A DLP alert that leaks the very data it caught is a textbook compliance failure (PCI DSS specifically calls out logging-of-PAN). Always configure rules to log redacted previews, and verify across the entire forwarding chain — ZIA Insights, NSS feed, SIEM, ticketing. (a)/(b)/(d) all miss the structural issue.

Data Protection - DLP, EDM/IDM and CASB | Batch 11 ·

Content-specific feature visual for this lesson: use it as the 60-second map before reading the full detail.

Infographic: concept-to-practice path

Start with the mental model, then move into the workflow, evidence, and practice questions.

Infographic: evidence ladder

Use this ladder when the question asks for troubleshooting, rollout, or proof.

Infographic: healthy vs broken thinking

This comparison turns the article into an interview and troubleshooting checklist.

Infographic: mini runbook

Convert the learning into a practical story you can explain to a manager or interviewer.

Pick where you want to start

The three planes

Inline DLP, CASB Inline, CASB Out-of-Band — what each can and can't do.

DLP dictionaries

Predefined, custom (word/phrase/regex), composite + proximity.

EDM & IDM

Exact-row match vs full-document fingerprints — pick the right engine.

CASB Inline vs OOB

In-the-path now vs at-rest retroactive — when to use which.

Why this lesson matters

DLP is the control that makes Zscaler legally and contractually defensible. URL Filtering keeps users off bad sites. Threat Protection keeps malware out. But the question Legal and Compliance will eventually ask is: "if an employee tries to upload our customer database to a personal Gmail, do we know? Do we stop it?" That is DLP's job — and it's what goes into your SOC-2, ISO-27001, DPDP / GDPR and cyber-insurance evidence packs. Get it wrong and you don't just have a security gap, you have a contract violation.

CASB is the SaaS-era hole URL Filtering and Cloud App Control alone cannot close. URL Filtering sees onedrive.com; it cannot tell which OneDrive tenant the user logged into, what file they uploaded, or whether a share link from six months ago is now public on the internet. CASB Inline answers the first two. CASB Out-of-Band (SaaS Security API) answers the third. Together with DLP they form ZIA's data-protection triangle.

The shape of ZIA Data Protection

Three engines, three placements in the data path:

Inline DLP — sits in the ZIA proxy. Inspects body + attachments in motion. Actions: Allow, Confirm (user-justification banner before allow), Block, plus per-incident notify + ICAP forward to external forensics. Latency: tens of ms per inspected request.
CASB Inline — same proxy path, with SaaS-tenancy awareness. Knows corporate vs personal Microsoft / Google / Box accounts on the same domain. Inspects the JSON / multipart body of SaaS API calls. Can block upload to personal OneDrive while allowing the same file to corporate OneDrive on the same session.
CASB Out-of-Band / SaaS Security API — outside the data path. Connects to the SaaS provider over its admin API (OAuth, signed by the tenant admin). Scans data at rest — every file already in Box / O365 / GDrive / Salesforce / GitHub / ServiceNow — for sensitive content, public share links, externally-shared folders, malware. Acts after the fact: revoke share, change owner, quarantine, notify manager.

Inline catches data leaving now. Out-of-Band finds what slipped through last quarter and is sitting in SharePoint with a public link. Most tenants run all three, each tuned to a different risk class.

Legend endpoint / boxes in the data path (royal) DLP / CASB inspection engine (cyan→magenta gradient) SaaS / destination side (magenta) verdict logged / allowed blocked / failure

SVG #1 — ZIA Data Protection planes (Inline DLP + CASB Inline + CASB OOB)

Two planes, one DLP engine reused on both. Inline DLP + CASB Inline live in the proxy data path. CASB Out-of-Band lives outside the data path entirely, scanning SaaS via OAuth-signed admin APIs. The same dictionaries, EDM templates and IDM templates power both planes.

Quick check · The three planes

A file was uploaded to SharePoint last quarter and never crossed the ZIA tunnel. It contains PCI data and has a public share link. Which plane can find it and revoke the share?

a) Inline DLP — it inspects every upload in the proxy data path.b) CASB Inline — it has SaaS-tenancy awareness on the same domain.c) CASB Out-of-Band (SaaS Security API) — it scans data at rest via the OAuth admin API and acts after the fact (revoke share, quarantine).d) SSL Inspection — it decrypts the file in motion.

Correct: c. Inline DLP and CASB Inline only see data crossing the tunnel now — a file uploaded last quarter never passed through them. CASB Out-of-Band lives outside the data path, scanning SaaS data at rest via OAuth admin APIs, and can revoke shares retroactively.

DLP Dictionaries — the matching primitives

A dictionary is a named pattern definition. DLP rules reference dictionaries. Dictionaries can contain word lists, phrases, regex patterns, or EDM/IDM template references. So you CAN match regex — but always via a dictionary, never directly in the rule. Build dictionaries once, reuse them across rules. ZIA ships three flavours.

Predefined dictionaries (40+ out of the box)

Curated by Zscaler, kept up to date with regulatory format changes. The high-value ones in production:

PCI — Credit Card Number (Visa, MasterCard, Amex, Discover, JCB; each with its own checksum). Default uses Luhn validation, so a random 16-digit string fails the check.
PII (per region) — US SSN, US Driver's License, UK NI Number, India Aadhaar (with Verhoeff check), India PAN, EU IBAN, ABA Routing Number, Brazil CPF.
PHI — Medical Record Numbers, ICD-10 / ICD-11 codes, NPI (US National Provider Identifier).
Source Code — language-aware: C/C++, Java, Python, Go, JavaScript, SQL DDL. Detects keyword density, not just file extension.
OFAC / Sanctions — name + alias matching against the US Treasury sanctions list.
AWS / Azure / GCP keys — recognise the signed prefix structure of cloud access keys (e.g. AKIA for AWS).
Crypto wallet addresses — BTC, ETH, etc.

Custom dictionaries — regex + score + threshold

Three matching modes per dictionary:

Mode	What it does	Good for
Word	Case-insensitive exact word match (whole-token). "Confidential" matches; "Confidentially" does not.	Project codenames, classification labels (e.g. "INTERNAL ONLY")
Phrase	Multi-word ordered match. "Patient ID" matches when those two tokens appear in order with whitespace.	Standard form labels, header phrases
Regex	Full PCRE2 regex. Slowest; use only when word/phrase can't express the pattern.	Internal customer ID format (e.g. `CUST-\d{8}`)

Every match contributes a score. A dictionary has a threshold — minimum cumulative score before it counts as "triggered". This lets you say: "trigger only when at least 4 different credit-card numbers appear in the same request, not on a single one". Threshold scoring is what turns DLP from a false-positive generator into a usable production control.

Composite dictionaries — AND / OR / proximity

A composite combines atomic dictionaries with logical operators and a proximity window (chars or words) — the biggest false-positive killer in production. Example: PCI-CC-Strict = CreditCardNumber AND (CardholderName OR ExpiryDate OR CVV) within 50 words. A log file with one test card no longer fires; a real cardholder record does.

Quick check · Dictionaries

A bare "Credit Card Number" dictionary rule is firing 50,000 alerts/day, mostly on dev log files. What is the textbook fix?

a) Disable PCI DLP entirely.b) Replace it with a composite — CreditCardNumber AND (CardholderName OR ExpiryDate OR CVV) within a 50-word proximity window — so a lone test card stops firing but a real cardholder record still does.c) Switch the dictionary from Word mode to Phrase mode.d) Raise the DLP file-size cap above 16 MB.

Correct: b. Composite dictionaries with a proximity window are the biggest false-positive killer in production. A log file with one lone test card no longer fires; a real cardholder record (CC plus name/expiry/CVV nearby) still does. Add EDM if you also need "only OUR customers".

EDM — Exact Data Match (structured data)

EDM answers: "don't match any 16-digit number — match only one of our 1.2M customer card numbers". Export the sensitive table (CSV: card_number, customer_name, dob, email), the EDM tool salts + hashes each cell and uploads the hash index. The PSE never sees plaintext, only hashes. At inspection time, ZIA hashes candidate tokens and looks them up.

Primary + secondary field matching

EDM templates designate one field as primary (strong identifier — card number, SSN) and others as secondary. A rule typically demands "primary present AND ≥N secondary fields from the same row" — catches a full-record leak, not a single-field accidental paste. Pair EDM with dictionary rules at different thresholds for layered control.

Production constraints (the gotchas)

Source CSV size limit — multi-million-row sources may need chunking + per-cell normalisation. Plan for the export pipeline, not just the upload.
Refresh cadence — the hash index is a snapshot. New customers added after the last upload are invisible until the next refresh. Production tenants rebuild + re-upload weekly, cron-driven from the system of record.
Normalisation — if source stores 4111-1111-1111-1111 but the user pastes 4111111111111111, hashes differ unless both sides strip identically. Use per-column normalisers (strip spaces, lowercase, ASCII-fold) at upload time.

Quick check · EDM vs IDM

A user pastes three paragraphs of last quarter's confidential board deck into ChatGPT — the deck itself was never uploaded, just an excerpt. Which engine has any chance of catching it?

a) EDM — it matches your exact structured customer rows.b) IDM — its rolling-hash shingle fingerprints fire on partial-document leaks; even a couple of pasted paragraphs hash to entries in the index if the threshold is low.c) The predefined PCI dictionary.d) File Type Control.

Correct: b. IDM is purpose-built for unstructured, partial-document leaks. EDM is for structured rows, not free text; PCI is the wrong content class; File Type Control can't read semantics. Set IDM thresholds low (≈30%) for trade secrets, higher (≈80%) for templated legal content.

IDM — Indexed Document Match (full documents)

IDM is for unstructured leaks: board decks, M&A drafts, source-code ZIPs, legal contracts, design docs. Upload the protected documents; ZIA computes a rolling-hash fingerprint of overlapping shingles (small text windows). At inspection time, candidate content is compared against the fingerprint index with a partial-match threshold (30–80%). Higher (80%) for legal contracts where templated reuse is OK; lower (30%) for trade secrets where any meaningful overlap is alarming. IDM is the anchor for "the board deck PDF leaked to ChatGPT" — a few pasted paragraphs still fire the rule because their shingles hash to entries in the index.

🔤EDM tokenizer + case-folding

EDM tokenizer: Splits on whitespace + punctuation. Case Folding (checkbox in EDM template config) controls case-sensitive matching — if unchecked, 'John Smith' and 'john smith' do NOT match. Most common EDM false-negative cause in production.

EDM pipeline: source CSV → per-column normaliser (strip/lowercase/ASCII-fold) → tokenizer (whitespace + punctuation) → optional case-fold → salt+hash → upload to PSE hash index.

OCR — what gets scanned, what doesn't

OCR scope: Runs on image attachments in JPG/PNG/TIFF/PDF-with-images. Supports defined language set. File-size cap default ≤10 MB. Adds 200–800 ms per scanned image — scope to high-risk destinations only or you'll get Webex/Teams perf tickets the same day.

DLP Trigger Walkthrough Lab Cloud Connector + DLP Sandbox

SVG #2 — Inline DLP flow: a single Gmail attachment upload

A single attachment upload exercises the entire engine: SSL Inspection terminates the TLS, the multipart body is unpacked, the CSV is parsed and scanned against both the PCI dictionary and the EDM customer-DB hash index, the threshold trips, the action fires, the user is notified, and the Insights log captures a redacted preview so SecOps can investigate without re-exposing the data.

▶ Watch one Gmail attachment hit the DLP engine

A user attaches customers.csv to a Gmail compose and hits Send. Press Play for the healthy block path, then Break it to see the classic file-size-cap bypass — and the fix.

① UploadThe user attaches customers.csv to a Gmail compose window and hits Send — a multipart/form-data POST to mail.google.com.

▼

② SSL InspectThe Z-Tunnel forwards to the PSE; SSL Inspection terminates the TLS so the multipart body and attachment are readable in clear.

▼

③ DLP scanThe DLP engine scans body + attachment against the dictionaries and EDM index: PCI-CC-Strict ✓ and EDM Customer-DB ✓ both match real cardholder rows.

▼

④ VerdictThe composite threshold is crossed → Action = BLOCK, reason = PCI-CC-Strict, severity High.

▼

⑤ UserThe user sees the notification banner: "Upload blocked by corporate DLP policy" — the file never reaches Gmail.

▼

⑥ LogIn parallel, SecOps gets an Insights + NSS/SIEM entry with a redacted preview (4111-XXXX-XXXX-1234 · Alice Wang · 04/27) — the log isn't a secondary leak.

Press Play to step through the healthy block path, then press Break it.

Quick check · The inspection flow

In the redacted Insights entry above, why does the match preview show 4111-XXXX-XXXX-1234 instead of the full card number?

a) Gmail truncated it before sending.b) The preview is redacted on purpose so the alert log itself doesn't become a secondary data leak — a full PAN in the log is a PCI DSS compliance failure.c) SSL Inspection only decrypts part of the body.d) EDM stores only hashes, so the log can never show digits.

Correct: b. A DLP alert that leaks the very data it caught is a textbook compliance failure — PCI DSS specifically calls out logging of the PAN. Configure rules to log a redacted preview, and verify the whole forwarding chain (Insights, NSS, SIEM, ticketing) strips it too.

DLP rule configuration — GUI walkthrough

The path in the Admin Portal:

ZIA · Inline DLP rule creation

Policy → Data Loss Prevention → DLP Policy → + Add Rule

  Order:           20
  Rule Name:       Block-PCI-to-Non-Corp-Webmail
  Status:          Enabled
  DLP Engines:     PCI-CC-Strict  (composite: CC + name/expiry within 50 words)
                   + EDM-Customer-DB  (require primary + ≥2 secondary fields)
  Min Match Count: 1  (engine threshold counts, not raw matches)
  File Types:      All  (DLP scans body AND attachments)
  URL Categories:  Webmail
  Cloud Apps:      Gmail (personal), Yahoo Mail, Outlook.com personal
  Users / Groups:  All EXCEPT Group=Customer-Support-Approved-Senders
  Locations:       All
  Action:          Block
  Notification:    User notification template "PCI-block-banner-v3"
  Auditor:         secops-dlp@corp.com  (gets per-incident email)
  ICAP:            Forward redacted preview to Forensics-ICAP
  Severity:        High

A paired gentler rule for the same content to corporate destinations:

ZIA · Inline DLP rule — Alert + Allow for sanctioned destinations

Policy → Data Loss Prevention → DLP Policy → + Add Rule

  Order:           10  (HIGHER priority — fires before the Block rule)
  Rule Name:       Allow-SourceCode-to-GH-Enterprise-Alert-Only
  DLP Engines:     SourceCode-Composite  (any of: Java + Python + C + Go)
  URL Categories:  (none)
  Cloud Apps:      GitHub Enterprise (tenant-aware via CASB Inline)
  Action:          Allow
  Notification:    None (silent — engineering workflow)
  Auditor:         dev-dlp@corp.com (weekly digest, not per-incident)

Then a paired Block rule at Order 30:
  Cloud Apps:      GitHub.com (public)
  Action:          Block
  Notification:    "Push source code to public GitHub blocked — use GH Enterprise"

The pattern: same content, different destinations, different actions. Without CASB Inline's tenant awareness, ZIA would only see "github.com" — both rules collapse to one and engineering is either silently leaking or completely blocked.

🔌ICAP integration

ICAP: Configured tenant-wide under Administration → DLP Incident Receiver, not per-rule. Sends DLP incidents to an external incident management platform.

🎚DLP Severity Levels

ZIA DLP supports 5 severity levels: Info / Low / Medium / High / Critical. Map them deliberately to your SIEM noise budget — Critical and High page on-call, Medium goes to a daily digest, Low to a weekly review, Info is silent telemetry. Without severity mapping, all DLP looks the same to SecOps.

📏DLP file-size cap (the silent gap)

DLP file-size cap: Default DLP_inspect_max_bytes = 16 MB. Files larger than this skip inspection entirely. Surface this in design discussions — large CAD files, design assets, video clips all bypass DLP by default. Consider raising the cap for high-risk groups (legal, design, M&A) or layering File Type Control to block specific large-file types outbound.

🔒Encrypted / password-protected files

Encrypted/password-protected files: DLP cannot inspect. Combine with File Type Control: block password-protected ZIP/7z outbound, or quarantine for review. Otherwise this is the single easiest DLP bypass in the wild.

🔑 Lock in the key terms — tap to flip

🧩

Composite dictionary

tap to flip

Combines atomic dictionaries with AND/OR and a proximity window (e.g. CC AND name/expiry/CVV within 50 words). The biggest false-positive killer in production.

🔢

EDM

tap to flip

Exact Data Match — salts + hashes your structured rows (e.g. the customer table) and uploads the hash index. Matches only YOUR data. Primary + secondary fields; refresh weekly (snapshot).

📄

IDM

tap to flip

Indexed Document Match — rolling-hash fingerprints of overlapping shingles from full documents (board decks, M&A, legal). Partial-match threshold catches excerpt leaks.

🔌

CASB Out-of-Band

tap to flip

SaaS Security API — sits outside the data path, connects via OAuth admin API, scans data at rest, and acts retroactively (revoke share, quarantine, change owner). OAuth token expiry is its #1 silent failure.

CASB Inline vs Out-of-Band — when to use which

Dimension	CASB Inline	CASB Out-of-Band (SaaS Security API)
Where it sits	In the proxy data path (same as ZIA)	Outside the data path; OAuth-signed admin API to the SaaS provider
Detection latency	Real-time (tens of ms)	Scheduled scan (minutes to hours) plus event-driven webhooks where supported
What it can do	Block upload / download / share in motion · enforce tenant restrictions (only corp Microsoft 365) · redact a message in-flight	Find files already at rest · revoke public share links · change file owner · quarantine to admin-only folder · notify uploader's manager · scan for malware in stored files
What it cannot do	See data that already exists in the SaaS (didn't pass through ZIA) · catch user accessing SaaS from an unmanaged device that bypasses the tunnel	Block a leak in real time — only finds it after the fact
Coverage gap when user is on personal device off-tunnel	Blind	Still works — connector talks to SaaS directly, not the user
Typical SaaS	Microsoft 365, Google Workspace, Box, Dropbox, Slack, ServiceNow, Salesforce, GitHub Enterprise (any SaaS reachable through the tunnel)	Microsoft 365, Google Workspace, Box, Dropbox, Salesforce, ServiceNow, GitHub, Workday, Slack (subset supports OAuth admin scope)
Auth dependency	Z-Tunnel + ZIA identity	OAuth token signed by tenant admin (must be refreshed before expiry — top failure mode)
Best use case	"Block this file from being uploaded to personal OneDrive right now."	"Find every file in our O365 tenant with a public share link AND containing PCI data, revoke share, notify owner."

✓Verify — confirm Data Protection is actually working

After enabling DLP + CASB, validate on a controlled test account before relaxing:

Insights → DLP dashboard. Trigger a deliberate test (paste 4 Luhn-valid test cards into a Gmail compose window from a corp laptop). Confirm a "Blocked" entry appears within 30s with the rule name and a redacted match preview.
Insights → Web → filter destination=mail.google.com. Confirm the request shows "DLP=hit", policy name, score, and action.
NSS / SIEM — verify the DLP event also arrived in your SIEM with the same redacted preview. If only ZIA sees it, your forwarding pipeline is broken (run a sample NSS feed query).
CASB API connector status — Admin → SaaS Security API → Connectors. Each connector should show "Connected · last scan <1h ago · OAuth expires in >30d". A red status on a connector means a blind spot on that SaaS.
EDM index health — Admin → DLP → EDM Templates → check "Last upload" and "Row count". If row count jumped down or "Last upload" is more than 7 days old on a weekly cadence, the export pipeline broke.

⚠Common Mistakes — DLP and CASB

Single-dictionary CC rule = false-positive flood. A bare "Credit Card Number" dictionary fires on every Luhn-valid 16-digit string — including dev test cards in log files. Wrap CC detection in a composite with proximity (CC + name OR expiry OR CVV within 50 words). Day-1 naïve PCI DLP = 50,000 alerts and SecOps stops looking at the queue.
EDM hash file not refreshed. New customers walk out unmatched. Automate weekly export → hash → upload; alert if the job hasn't completed in 9 days.
CASB OAuth token silently expired. OAuth refresh-token lifetime is provider-specific: Microsoft Graph: 90-day sliding (refreshes on use); Box: 60 days; Google Workspace: non-expiring for service accounts. Monitor the 'Last Successful Scan' age field — re-consent before the tenant-specific expiry. Don't assume 90 days everywhere. Connector goes red on the status page, OOB scanning stops, no one notices — until a forensic ask reveals six months of un-scanned SharePoint.
Watermarking turned on for everything. Visible per-user PDF watermarks are great for sensitive previews but kill performance and confuse users when applied to every document. Scope to board decks, legal contracts, M&A drafts — not "all PDFs".
Source-code block rule with no engineering exception. Universal block on source-code uploads to public destinations breaks the 3 AM open-source release pipeline. Add Group=Engineering + Destination=approved-repos exception above the universal block.
OCR not enabled for image attachments. Screenshot of a credit card in a Slack DM bypasses text-only DLP. Turn on OCR for inline DLP (scope to high-risk destinations to manage latency).
CASB Inline without tenant restrictions. Without "Allow only Microsoft tenant ID X, block all other Microsoft 365 logins", a user can sign into personal OneDrive on the same browser as corporate OneDrive and CASB Inline only sees "OneDrive". Tenant restriction is the most valuable single CASB Inline setting; configure per SaaS, per tenant.

💡Pro Tips

Always run DLP in "Confirm" mode for two weeks before "Block". Confirm allows the action but pops a user-justification banner ("type a reason"). You get the false-positive list before users get angry, you get end-user training data, and you get a clean audit trail showing the org informed users before enforcing.
Pair Inline DLP with CASB OOB for the same SaaS. Inline catches new uploads; OOB sweeps existing data. On day 1 of a new CASB connector, the OOB scan usually finds thousands of pre-existing public share links with sensitive content — that's your first quarter of cleanup work.
Use Severity correctly. ZIA DLP supports 5 levels — Info / Low / Medium / High / Critical per rule. Map them deliberately to your SIEM noise budget — Critical and High page on-call, Medium goes to a daily digest, Low to a weekly review, Info is silent telemetry. Without severity mapping, all DLP looks the same to SecOps.

Real-world scenario — Gmail outbound DLP with inline redaction

Scenario: Gmail outbound DLP with inline redaction — User pastes a 16-digit card number into a Gmail compose window. ZIA CASB Inline sees the POST to Gmail, runs the body through the Credit Card dictionary, and rewrites the affected digits to XXXX before the request leaves ZIA. The user sees the redacted version in their Sent folder. Same flow works for outbound webmail (Outlook Web, Yahoo Mail). For Slack and most SaaS chat: redaction is NOT supported — the action must be Block instead.

Rules already in place

Composite dictionary PCI-CC-Strict = CreditCardNumber AND (CardholderName OR ExpiryDate OR CVV) within 50 words. Threshold = 1 composite match.
EDM template Customer-DB-v2026-05-20 — 1.2M rows, primary = card_number, secondary = name / dob / email. Suppresses a duplicate dictionary fire when the same record is an EDM hit.
CASB Inline rule: Cloud App = Gmail / Outlook Web / Yahoo Mail (webmail family), Action = "Redact and replace with notification".

What the user sees

User hits Send in the Gmail compose window. CASB Inline intercepts the POST to mail.google.com, inspects the form body, trips PCI-CC-Strict, and rewrites the affected digits in-flight: card number digits → XXXX-XXXX-XXXX-XXXX, name → [REDACTED — PII], with a banner appended. Gmail receives the redacted version — that's what gets sent and what appears in the user's Sent folder. The user's UI shows a notification: "DLP redacted PCI data — ask the recipient to use the secure-pay link instead."

What SecOps sees

Insights → DLP → today: "Redacted" entry within 5s. User=engineer@corp, App=Gmail, Engine=PCI-CC-Strict, Action=Redact, Severity=High.
Incident detail: redacted preview (4111-XXXX-XXXX-1234 · J*** B**** · 04/27), URL path, recipient address (hashed).
NSS feed: parallel event in the SIEM, queryable; pre-built "DLP severity=High by hour" widget updates.
Auditor email: secops-dlp@corp gets a structured email in ~30s with redacted preview + Jump-to-Insights link.
SOC ticket (low-priority): confirm the user used the correct workaround; log it for the quarterly compliance report. No customer data was exposed — the point of the exercise.

What CASB OOB confirms later

The next hourly OOB scan on the Gmail (Google Workspace) connector re-confirms the stored Sent message body is the redacted version, not the original. PCI scan: zero hits in that mailbox for the day. The compliance evidence pack writes itself. That's "DLP done right" — silent to the recipient, helpful to the user, defensible to the auditor, no PII in the alert log.

Important caveat: Inline body-modification (redaction) is currently GA only for webmail-family SaaS (Gmail / Outlook Web / Yahoo Mail) and a small set of HTTP-form-style apps. For Slack and most SaaS chat platforms ZIA's CASB Inline does NOT modify the message body — the only supported action is Block (the message is rejected before reaching the SaaS, and the user is notified). Plan rules accordingly; don't promise "redaction" for a SaaS where only Block is supported.

DLP Redaction Flow Lab CASB Tenant Restriction Walkthrough EDM Hash Upload Simulator

📌 Quick reference (memorise — this is the data-protection arc)

Three engines, three placements. Inline DLP and CASB Inline live in the proxy data path; CASB Out-of-Band lives outside it (OAuth admin API to SaaS).
Inline catches now; OOB catches what slipped through. Run both for the same SaaS — Inline blocks new leaks, OOB sweeps the back-catalog.
Three dictionary types. Predefined (40+, regulator-current), Custom (word / phrase / regex with score and threshold), Composite (AND/OR + proximity window — the false-positive killer).
EDM = hash-upload of structured exact data (your actual customer rows). Primary + secondary field model. Plan the refresh pipeline; weekly cadence with alert-on-staleness.
IDM = rolling-hash fingerprints of full documents (board decks, M&A, legal). Partial-match threshold — set high for templated content, low for trade secrets.
Rule order matters. Same first-match logic as other ZIA policies — Allow / sanctioned-destination exceptions ABOVE generic Block.
CASB Inline tenant restriction is the most valuable single setting — without it, you cannot tell corporate OneDrive from personal OneDrive.
CASB OOB OAuth tokens expire silently. Monitor "Last Scan" age in your SIEM — the connector goes red, scanning stops, no one notices.
Run Confirm before Block for at least two weeks per new rule — false-positive triage plus end-user training plus audit trail.
Verify path. Insights → DLP for triggers · Insights → Web with DLP filter for per-request view · Connector status page for OOB health · NSS feed in SIEM for the structured evidence pack.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. The exact framing an interviewer wants to hear.

Pre-curated from Zscaler docs + interview Q&A, scoped to this lesson. For a live tenant issue, paste your Insights → DLP detail into chat.techclick.in.

▶ QUICK LAB · ~15 MIN

Build + test a DLP rule end-to-end:

Create a Credit Card dictionary using the built-in pattern. Set confidence threshold to Medium.
Create a DLP rule: outbound + Webmail destination + Credit Card dict + Action = Block + Notify User.
From a test laptop, attempt to paste a Luhn-valid 16-digit card into Gmail compose — verify block page.
Now upload a CSV with 100 cards as a file attachment — DLP should match. If it doesn't, check the 16 MB file-size cap.
Check CASB → Last Scan age for your O365 tenant — re-consent if > 80 days.

What's next — Lesson 9

Module 9 switches tracks completely — from ZIA (internet-bound traffic) to ZPA (private app access). Same Z-App, totally different architecture, totally different problem space.

Lesson 9 — ZPA Architecture Deep Dive → Practice ZDTA on exam.techclick.in

📩 Quiz me on this in 7 days. Opt in and we'll email you 3 micro-questions from this lesson at Day 1, Day 7 and Day 30 — spaced repetition is how it sticks. Un-tick any time.

Data Protection — DLP, EDM/IDM & CASB

Pick where you want to start

The three planes

DLP dictionaries

EDM & IDM

CASB Inline vs OOB

Why this lesson matters

The shape of ZIA Data Protection

DLP Dictionaries — the matching primitives

Predefined dictionaries (40+ out of the box)

Custom dictionaries — regex + score + threshold

Composite dictionaries — AND / OR / proximity

EDM — Exact Data Match (structured data)

Primary + secondary field matching

Production constraints (the gotchas)

IDM — Indexed Document Match (full documents)

OCR — what gets scanned, what doesn't

▶ Watch one Gmail attachment hit the DLP engine

DLP rule configuration — GUI walkthrough

CASB Inline vs Out-of-Band — when to use which

Real-world scenario — Gmail outbound DLP with inline redaction

Rules already in place

What the user sees

What SecOps sees

What CASB OOB confirms later

📌 Quick reference (memorise — this is the data-protection arc)

🤖 Ask the AI Tutor

📝 Check your understanding

What's next — Lesson 9