In Netskope DLP, which object do you attach to a Real-time Protection policy?

Correct: c. A DLP profile (the collection of rules + classifiers + fingerprints) is what attaches to a policy. A lone identifier or regex must first live inside a rule, and the rule inside a profile. A severity threshold is a setting on a rule, not an attachable object.

You have your company’s exact list of 80,000 employee bank-account numbers and must stop only those real numbers leaking, with almost no false positives. What do you build?

Correct: a. EDM fingerprints the real records so only genuine account numbers (matched as rows) fire — the Severity Threshold Record count tunes how many records before it alerts. A dictionary catches the word not the numbers; a 16-digit regex flags any 16 digits (huge FP); an ML screenshot classifier is the wrong data type.

Priya needs a rule that fires only when an account ID sits within 1000 bytes of a customer name, in any order, to cut noise. Which operator?

Correct: a. NEAR enforces proximity (≤1000 bytes) regardless of order — precisely "these two, close together". OR fires on either term alone; a file-wide AND is less precise (the two could be pages apart); NOT excludes rather than requires.

A team reports: "Our Critical DLP alerts never trigger — every credit-card hit lands as Low, even big leaks." Steering and SSL inspection look fine. Most likely root cause?

Correct: d. Netskope’s documented tie-break: if two tiers share a threshold value, classification defaults to Low and Critical never fires. An old Client wouldn’t cause "all Low"; an oversized EDM file errors on upload; an unattached profile would fire nothing at all, not "everything Low".

A locally-installed Postman uploads a customer database to an external API. Skope IT shows the event, but no DLP incident is raised. Why, and what’s the fix?

Correct: a. Skope IT logging without a DLP hit means the content wasn’t inspected inline — an API/protocol blind spot. Steering that traffic through inline DLP (and confirming no bypass) is the fix. Severity tuning, detector choice and "just wait" don’t address an uninspected flow.

Two designs to protect a confidential design document plus an exact list of 50,000 customer records: (A) one big regex profile for both; (B) IDM for the document + EDM for the records, each with distinct severity tiers. Which is stronger and why?

Correct: b. A document and a structured record list are different data shapes; IDM fingerprints the document (catching renames and excerpts) and EDM fingerprints the records (whole-row, near-zero FP). Distinct severity tiers avoid the default-to-Low trap. A single regex floods the SOC and misses renamed documents entirely.

Netskope DLP Deep-Dive: Profiles, Rules, EDM/IDM and ML

Q: Karthik at Wipro asks: "I built a perfect regex rule for India PAN numbers — why won’t my Real-time Protection policy let me select it?" What did he miss?

Correct: a. A policy attaches a DLP profile, never a lone rule. The rule (ingredient) must sit inside a profile (recipe) before a Real-time Protection policy can use it. Regex in custom rules is fully supported, and no special licence or reboot is involved.

Q: Priya at ICICI needs a rule that fires only when an account number appears within 1000 bytes of a customer name, in either order. Which operator does the job?

Correct: c. NEAR is the proximity operator — it matches two identifiers within a byte distance (≤1000) regardless of order. AND would fire if both appear anywhere in a huge file (less precise); OR fires on either alone; NOT excludes. NEAR is the precision tool for "these two, close together".

Q: Aditya at Flipkart must detect leaked passport images and stray source-code files — but he has zero regex patterns and the data types vary wildly. Best fit?

Correct: b. ML classifiers recognise a document/image TYPE (passport, source code, screenshot) with no pattern written — exactly the job here. Dictionaries and regex need literal patterns; EDM needs a structured record list. For a custom type, TYOC trains a model on your own data.

Q: A sensitive file was shared in OneDrive last week — it’s already at rest, not in transit. The business wants it pulled and the owner notified. Which action and channel?

Correct: d. Data already at rest in a SaaS app is handled out-of-band by API protection — you Quarantine it and use Contact Owners to notify. Inline Block only stops traffic in transit (this leak already happened); regex severity and NEAR are detection tuning, not response actions for stored data.

Content-specific feature visual for this lesson: use it as the 60-second map before reading the full detail.

Most engineers think…

Most engineers think DLP is "write a regex for credit cards and SSNs, switch it on, done." So they ship a raw pattern and call it a data-protection programme.

Wrong — and the helpdesk will hate you. A naked regex matches any 9–16 digit string, so order numbers, ticket IDs and timestamps all trip it. Real Netskope DLP is choosing the right detector (EDM for exact records, IDM for documents, ML for file types) and tuning severity thresholds + proximity (NEAR) + column groups so the alerts you get are the leaks you care about, not noise.

① Rule vs profile vs identifier — the three words people mix up

Three Netskope words get used interchangeably and shouldn’t be. A data identifier (Netskope also calls it a DLP entity) is the raw pattern — "a credit-card number", "an India PAN". A DLP rule is one detector that combines identifiers with regex, dictionaries and exact-match logic. A DLP profile is the container of several rules plus classifiers and fingerprints — and the profile is what you actually attach to a policy.

The analogy that sticks: a rule is an ingredient, a profile is the recipe. One ingredient (a regex, a dictionary) does nothing on the plate alone. The recipe — several rules plus ML classifiers plus document fingerprints — is what you "cook" inside a Real-time Protection policy. You never attach a bare rule to a policy; you attach the profile.

👉 So far: identifier = the pattern, rule = one detector, profile = the recipe you attach to a policy. Next: see how they stack into one enforcement chain.

Figure 1 — Identifiers → Rule → Profile → Policy → channels

Follow the arrows left to right: detectors feed a rule, rules feed a profile, the profile attaches to ONE policy, and that policy is enforced by SWG, CASB inline and API protection alike.

Build it once and it works everywhere. The same DLP profile that blocks an upload over the SWG also fires on a Salesforce attachment via CASB, and on a file already sitting in OneDrive via API protection. Netskope ships predefined profiles with rule sets mapped to well-known compliance regulations — PCI (payment-card), PHI (health) and PII — so you rarely start from a blank page.

Four words you’ll use every day

Tap each card — these are the vocabulary anchors for the whole lesson.

🧩

Data identifier

tap to flip

The raw pattern a rule looks for (a card number, a PAN). Predefined by Netskope or custom-built. The smallest Lego brick of DLP.

🔎

DLP rule

tap to flip

One detector: identifiers + regex + dictionary + exact-match, combined with operators. The ingredient, not the dish.

📋

DLP profile

tap to flip

A collection of rules + classifiers + fingerprint rules. This is the object you attach to a policy. The recipe.

🌐

One profile, many channels

tap to flip

The same profile enforces across SWG, CASB inline and API. Write detection once; reuse it everywhere. So: build the profile, not 3 copies.

Quick check · Q1 of 10

Karthik at Wipro asks: "I built a perfect regex rule for India PAN numbers — why won’t my Real-time Protection policy let me select it?" What did he miss?

a) Policies attach DLP profiles, not bare rules — wrap the rule in a profile firstb) Regex isn’t allowed in custom rulesc) PAN numbers need a separate licenced) He must reboot the tenant

Correct: a. A policy attaches a DLP profile, never a lone rule. The rule (ingredient) must sit inside a profile (recipe) before a Real-time Protection policy can use it. Regex in custom rules is fully supported, and no special licence or reboot is involved.

Pause & Predict

Predict: if the SAME profile fires on a web upload, a Salesforce attachment AND a file already in OneDrive, what does that tell you about WHERE the DLP engine runs? Type your guess.

Answer: It runs once, centrally, inside the single-pass engine — not three separate copies. Each channel (SWG inline, CASB inline, API protection) hands its decrypted content to the same DLP engine. That’s why you build the profile once and reuse it across all enforcement points.

② Custom rules: regex, dictionaries, NEAR & severity

Predefined identifiers cover the common stuff. The day you need something company-specific — an internal project codename, a customer-account format — you build a custom rule. The console path is worth memorising: Policies > Profiles > DLP > Edit Rules > Data Loss Prevention, then New Rule. The wizard walks you through entity → advanced options → scan options → severity threshold → name.

A custom rule has four kinds of detector you can mix: predefined identifiers, your own RegEx expressions, a keyword dictionary, and exact-match criteria. You glue them together with operators on the Advanced Options screen: AND, OR, NOT, and the proximity operator NEAR.

NEAR is the one that separates a beginner rule from a tuned one. It fires only when two identifiers appear within a set distance (max 1000 characters, measured between the outermost characters), and order doesn’t matter. "Account number NEAR a name" is far more precise than "any 16-digit string" because real leaks have context around them.

Figure 2 — Pick the detector by the data

Read each column top to bottom: what it detects, its strength (green +) and its weakness (red −). Regex is fast but noisy; EDM and IDM are precise but need indexing first; ML needs no pattern at all.

After detectors comes the Severity Threshold screen, and this is where careers are made or tickets are born. You choose a counting mode — Record (how many hits) or Aggregate Score (a weighted total) — and set occurrence counts for Low / Medium / High / Critical. There’s also a Count only unique record toggle so 100 copies of the same SSN don’t inflate one file to Critical.

Figure 3 — Severity threshold decision tree

The trap to avoid: if two tiers share the same count, the platform silently classifies everything as Low and Critical never fires. Give each tier a distinct, ascending count.

🖥️ This is the screen you’ll build the rule on — Netskope tenant → Policies → Profiles → DLP → Edit Rules → Data Loss Prevention → New Rule. Real wizard fields. (Recreated for clarity — your tenant matches this.)

tenant.goskope.com · Policies → Profiles → DLP → Edit Rules

Name the DLP Rule

Custom — Account# NEAR Name

DLP Entity

Postal Addresses (US) + custom regex

Advanced Expression

RegEx AC[0-9]{4} NEAR keyword dict

Global Data Identifier

On (match every CSV column)

Counting mode

Aggregate Score

Severity Threshold

Low 1 · Med 5 · High 20 · Crit 50

Update Rule → Apply Changes

Common mistake — the underscore bypass

Symptom: a user appends a character — Confidential_x — and your keyword rule never fires. Cause: the dictionary/regex stops at a word boundary, so trailing symbols dodge the match. Fix: harden the regex to allow trailing word characters, or pair the keyword identifier with a NEAR context identifier so the match doesn’t hinge on the bare word.

One more flag matters for spreadsheets: Global Data Identifier (GDI). Without it, an identifier matches once and stops; with it on, the identifier re-matches every column of a CSV or Excel sheet. Forget GDI on a spreadsheet rule and you’ll catch the first leaked row and miss the other 9,999.

Quick check · Q2 of 10

Priya at ICICI needs a rule that fires only when an account number appears within 1000 bytes of a customer name, in either order. Which operator does the job?

a) ANDb) ORc) NEARd) NOT

Correct: c. NEAR is the proximity operator — it matches two identifiers within a byte distance (≤1000) regardless of order. AND would fire if both appear anywhere in a huge file (less precise); OR fires on either alone; NOT excludes. NEAR is the precision tool for "these two, close together".

Pause & Predict

Predict: your raw 9-digit SSN regex is flooding the SOC with false positives from order numbers and timestamps. Name TWO ways to cut the noise without turning the rule off. Type your guess.

Answer: One: clone the predefined rule and add AND NOT a benign-context dictionary (e.g. exclude lines containing "order", "ticket"). Two: convert to EDM seeded from the real customer table and raise the Severity Threshold Record count so only files with enough genuine records fire. Bonus: add NEAR a contextual keyword so the digits only count when near words like "SSN" or "account".

③ EDM, IDM & ML — precision detection

Regex is a sledgehammer. For the data that actually gets you fined, Netskope has three precision tools. EDM fingerprints your real structured records. IDM fingerprints whole documents. ML classifiers recognise a file’s type without any pattern at all.

The wedding-gate analogy nails the difference. EDM is the exact printed guest list at the gate — only names that match the real list (your DB of SSNs and account numbers) get flagged, no lookalikes. The ML classifier is the bouncer who recognises "this looks like a passport" even for a document he’s never seen. IDM is the librarian’s fingerprint of every book — even a photocopied chapter (a partial or derivative match) gets recognised.

EDM is the false-positive killer. You upload a UTF-8-encoded CSV or TXT with a header row (8 MB via the UI, up to 160 GB with multipart upload), set each column’s normalization to string or number, then build column groups — and columns inside a group are AND-ed during the exact match. The console path: Policies > Profiles > DLP > Edit Rules > Data Loss Prevention > Exact Match. Because EDM matches whole rows, a lone 9-digit number can’t fake a record — that’s the magic. You don’t set a separate "minimum matches" field; you control how many records before it alerts on the Severity Threshold screen (the Record count).

Figure 4 — EDM matches real rows, not lookalikes

A lone number (file A) doesn’t match because it has no row-mates. A name + SSN + account from the SAME indexed row, in proximity (file B), satisfies the column group and fires Critical. Tune sensitivity with the Severity Threshold Record count.

▶ Follow one upload through EDM

Watch a leaked customer file get scanned against the EDM index, step by step. Press Play for the healthy path, then Break it to see the failure.

① Indexcustomers.csv → uploaded, header row read, columns set string/number

▼

② Groupbuild a column group: name AND ssn AND account (AND-ed)

▼

③ Scanoutbound file: "Vikram 559-30-2210 AC2207" hits the same indexed row

▼

④ Verdict≥ Severity Record count → Critical → action Block, logged to Incidents

Press Play to step through the healthy path. Then press Break it.

IDM covers the document you can’t reduce to a pattern — a confidential design doc, a board deck. It’s implemented as a fingerprint rule (now moving to Fingerprint Groups): you index the file and set a similarity threshold (default 85%) so IDM recognises copies, renamed versions, and even excerpts (partial/derivative matches). ML classifiers go further — Netskope ships ~28 prebuilt models, document ones like tax form, resume, patent, NDA, bank statement, offer letter, source code and image ones like passport, driver’s licence, payment card, social security card, photo ID and screenshot — and they live under Policies > Profiles > DLP > File Classifiers. With TYOC (Train Your Own Classifier) you train a model on your own data type (at least 20 positive sample files). Across the platform Netskope offers 3,000+ predefined data identifiers over 1,500+ file types, and in April 2025 it released DLP as an API service — DLP On Demand.

Sneha at Infosys faces this

Sneha, an L1 analyst, is told: "Someone is emailing copies and excerpts of our confidential network-design document to a personal Gmail. Stop it." She reaches for a keyword rule on the doc’s title.

Likely cause

A keyword/regex rule only catches the literal title text — rename the file or paste two paragraphs into an email body and it sails through. The leak is a DOCUMENT, not a pattern, so it needs a document fingerprint (IDM).

Diagnosis

She picks the detector by the data shape: a whole confidential document with partial copies → IDM (fingerprint rule), not regex.

Policies → Profiles → DLP → hover Edit Rules → Rules → New Fingerprint Rule → index the design doc, set similarity threshold 85%

Fix

Index the design document as a fingerprint rule, add it to the DLP profile, and attach the profile to the Gmail-personal-instance CASB inline policy with action Block + User Alert.

Verify

In Incidents → DLP, send a test email with a renamed copy and then just two excerpted paragraphs → both show Last Action: Block tagged with the fingerprint rule name; an unrelated document does not trip it.

Quick check · Q3 of 10

Aditya at Flipkart must detect leaked passport images and stray source-code files — but he has zero regex patterns and the data types vary wildly. Best fit?

a) A bigger keyword dictionaryb) An ML classifier (passports, source code) — or TYOC for custom typesc) A stricter 16-digit regexd) Disable EDM and rely on severity

Correct: b. ML classifiers recognise a document/image TYPE (passport, source code, screenshot) with no pattern written — exactly the job here. Dictionaries and regex need literal patterns; EDM needs a structured record list. For a custom type, TYOC trains a model on your own data.

Pause & Predict

Predict: you have an exact list of 50,000 customer records AND you want to catch leaked copies of one secret design PDF. Which detector for each, and why not just one for both? Type your guess.

Answer: EDM for the 50,000 structured records (it matches whole rows of real data with near-zero false positives); IDM for the design PDF (it fingerprints the document so renamed copies and excerpts are caught). They solve different shapes of data — structured table vs unstructured document — so you use both, not one.

④ Actions, incidents & where it all runs

Detection is half the job; the action is the other half. On the Real-time Protection policy you choose what happens when the profile fires: Alert (log only), Block (stop it inline), User Alert (warn + let the user justify), and for CASB/API channels also Quarantine, Encrypt, and Legal Hold. Inline channels (SWG, CASB inline) can block in real time; API protection acts on data already at rest (quarantine, encrypt, legal hold).

When something fires, you live in Incidents > DLP (plus Incidents > Quarantine and Incidents > Legal Hold). Each incident shows the Last Action (allow/block/alert) and lets you change Status, Assign it, set Severity, and run object actions — Encrypt / Restore / Block / Delete. The Quarantine page even has Contact Owners to nudge the file owner.

Figure 5 — Netskope DLP at a glance

Your one-card map of everything in this lesson — keep it open while you build your first profile, and revisit it before any DLP interview.

🖥️ This is the incident screen you’ll triage on — Netskope tenant → Incidents → DLP. Real fields you act on. (Recreated for clarity — your tenant matches this.)

tenant.goskope.com · Incidents → DLP

Rule / Profile

PII-India · EDM customers.csv

Severity

Critical

Last Action

Block

Status

New → In Progress

Assign

soc-tier2@infosys

Object action

Quarantine

Apply

Pro tip — the mental model that sticks

For any DLP task ask two questions: (1) what shape is the data? pattern → regex/dictionary, exact records → EDM, whole document → IDM, file type → ML; (2) what should happen? just watch → Alert, stop it now → Block (inline), it’s already stored → Quarantine/Encrypt/Legal Hold (API). Almost every DLP config maps onto that grid.

Common mistake — the steering blind spot

Symptom: a locally-installed Postman uploads company data and DLP fires nothing, though it shows up in Skope IT. Cause: the payload was never inspected inline — it’s an API/protocol blind spot. Fix: steer that traffic through inline DLP (SWG/CASB inline) and check the app isn’t sitting on a steering bypass. No detector can fire on traffic the engine never sees.

Prove you’ve got the DLP model

Take any real ask — "stop staff emailing copies of our confidential pricing PDF to personal Gmail" — and name the detector (IDM), the channel (CASB inline + API), the action (Block + Alert / Quarantine), the severity setup (distinct tiers), and where you’d review it (Incidents → DLP). If you can do that cold, you’re ready for the cert and the SOC floor.

Next: CASB — inline + API enforcement Jump ahead: Private Access (NPA)

Quick check · Q4 of 10

A sensitive file was shared in OneDrive last week — it’s already at rest, not in transit. The business wants it pulled and the owner notified. Which action and channel?

a) Inline Block on the SWGb) A stricter regex severity thresholdc) A NEAR operator on the ruled) Quarantine via API protection, then Contact Owners

Correct: d. Data already at rest in a SaaS app is handled out-of-band by API protection — you Quarantine it and use Contact Owners to notify. Inline Block only stops traffic in transit (this leak already happened); regex severity and NEAR are detection tuning, not response actions for stored data.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from Netskope docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

🧠 In your own words

Type one line: In one line, why does EDM produce far fewer false positives than a 9-digit SSN regex? Then compare to the expert version.

Expert version: Because EDM matches whole rows of your real indexed records (name + SSN + account together), so a random 9-digit string with no matching row-mates can’t trigger it — whereas a bare regex flags any 9-digit number anywhere.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📩 Quiz me on this in 7 days. Opt in and we'll email 3 micro-questions on DLP Deep-Dive at Day 1, Day 7 and Day 30 — spaced repetition is how this sticks. Un-tick any time.

📖 Glossary

Data identifier (DLP entity): The pattern a rule looks for — predefined by Netskope or custom-built by you.
DLP rule: One detector combining identifiers, regex, dictionaries and exact-match logic. The ingredient.
DLP profile: A collection of rules + classifiers + fingerprint rules — the object you attach to a policy. The recipe.
Keyword dictionary: An uploaded list of terms (e.g. "Confidential", a project codename) the rule matches against.
NEAR (proximity): Operator matching two terms within N bytes (≤1000), regardless of order — precision over a bare pattern.
Global Data Identifier (GDI): Flag that makes an identifier re-match every subsequent object/column — essential for CSV/Excel scanning.
EDM (Exact Data Match): Fingerprints structured PII from a UTF-8 CSV/TXT; column groups are AND-ed; tune sensitivity via the Severity Threshold Record count.
IDM (Indexed Document Match): Fingerprints whole documents so copies, renames and excerpts (partial/derivative content) are caught.
ML classifier: One of ~28 predefined models recognising a document/image TYPE (source code, tax form, passport, screenshot) with no pattern; lives under Policies > Profiles > DLP > File Classifiers.
TYOC: Train Your Own Classifier — a customer-trained ML model for a custom data type Netskope doesn’t ship; needs at least 20 positive sample files.
Fingerprint rule (IDM): Indexed Document Match: created via Policies > Profiles > DLP > Edit Rules > Rules > New Fingerprint Rule (moving to Fingerprint Groups); similarity threshold default 85% catches excerpts.
Severity threshold: Record vs Aggregate Score, with Low/Medium/High/Critical counts; equal tiers default to Low; "Count only unique record" clears the preset threshold.
DLP On Demand: Netskope’s April-2025 release of DLP as an API service, callable outside the inline path; 3,000+ data identifiers over 1,500+ file types.

📚 Sources

Netskope Docs — “DLP Rules” (rule wizard: entity, advanced expressions, scan options, severity, name). docs.netskope.com/en/netskope-help/data-security/data-loss-prevention/dlp-rules/
Netskope Docs — “DLP Profiles” (profile = collection of rules + classifiers + custom fingerprint rules; predefined profiles mapped to PCI, PHI, PII). docs.netskope.com/en/dlp-profiles/
Netskope Docs — “Use Advanced Expressions” (AND/OR/NOT, NEAR max 1000 characters, Global Data Identifier) & “Select an Exact Match File” (EDM: UTF-8 CSV/TXT, 8 MB UI / 160 GB multipart, column groups AND-ed). docs.netskope.com/en/use-advanced-expressions/ · docs.netskope.com/en/select-an-exact-match-file/
Netskope Docs — “Select a Severity Threshold” (Record vs Aggregate Score; Count only unique record; equal tiers default to Low). docs.netskope.com/en/select-a-severity-threshold/
Netskope press release & Help Net Security — “Netskope One DLP On Demand” (Apr 2025, DLP-as-API; 3,000+ data identifiers over 1,500+ file types, Train Your Own Classifier). netskope.com/press-releases · helpnetsecurity.com/2025/04/08/netskope-one-dlp-on-demand/
Netskope Docs — “File Classifiers” (28 predefined ML classifiers; TYOC needs ≥20 positive files) & “Create Fingerprint Rules” (IDM via fingerprint rules → Fingerprint Groups; similarity threshold default 85%). docs.netskope.com/en/file-classifiers/ · docs.netskope.com/en/create-fingerprint-rules/
Netskope NSK200 (NCCSI) exam blueprint — DLP domain: build/tune profiles, custom regex, EDM vs IDM vs ML classifiers. pass4success.com/netskope/exam/nsk200

What's next?

You can now build detection that catches real leaks. Next we go deeper into Private Access (ZTNA) — giving users per-app access to internal apps without a VPN.

Next · Netskope Private Access (NPA): ZTNA → Practice on exam.techclick.in →

Netskope DLP Deep-Dive — Profiles, Rules, EDM/IDM & ML

🎯 By the end you will be able to

Pick where you want to start

Rule vs profile

Custom detectors

EDM · IDM · ML

Actions & incidents

① Rule vs profile vs identifier — the three words people mix up

Four words you’ll use every day

② Custom rules: regex, dictionaries, NEAR & severity

③ EDM, IDM & ML — precision detection

▶ Follow one upload through EDM

④ Actions, incidents & where it all runs

🤖 Ask the AI Tutor

📝 Wrap-up assessment — six more

🧠 In your own words

🗣 Teach a friend

📖 Glossary

📚 Sources

What's next?