TTechclick ⚡ XP 0% All lessons
Netskope · Data Protection · DLP Deep-DiveInteractive · L1 / L2 / L3

Netskope DLP Deep-Dive — Profiles, Rules, EDM/IDM & ML

A DLP rule is one detector. A profile is the recipe. EDM finds your exact customer records, IDM catches copies of a secret document, and ML spots a passport it has never seen before. This lesson shows you how to build detection that actually catches leaks without drowning you in false positives.

📅 2026-06-05 · ⏱ 13 min · 3 live demos · 4 infographics · 🏷 10-Q assessment + AI Tutor inline

⚡ Quick Answer

Build Netskope DLP from the ground up: rules vs profiles vs data identifiers, predefined vs custom (regex, dictionaries, NEAR, severity), Exact Data Match (EDM), Indexed Document Match (IDM), ML classifiers, actions and incident review across SWG, CASB and API.

🎯 By the end you will be able to

Read as:

Pick where you want to start

1

Rule vs profile

Ingredient vs recipe — where each one lives.

2

Custom detectors

Regex, dictionaries, NEAR & severity tuning.

3

EDM · IDM · ML

Exact records, document copies, "looks-like".

4

Actions & incidents

Block, quarantine, legal hold, then review.

🧠 Warm-up — 3 questions, no score

Just notice which ones make you pause. We answer all three inside the lesson.

1. You need to catch a leak of your exact list of 50,000 customer SSNs with almost no false positives. Best tool?

Answered in Rule vs profile.

2. What does a DLP profile actually contain?

Answered in EDM · IDM · ML.

3. Your Critical alerts never fire — everything lands as Low. Most likely cause?

Answered in Custom detectors.

Most engineers think…

Most engineers think DLP is "write a regex for credit cards and SSNs, switch it on, done." So they ship a raw pattern and call it a data-protection programme.

Wrong — and the helpdesk will hate you. A naked regex matches any 9–16 digit string, so order numbers, ticket IDs and timestamps all trip it. Real Netskope DLP is choosing the right detector (EDM for exact records, IDM for documents, ML for file types) and tuning severity thresholds + proximity (NEAR) + column groups so the alerts you get are the leaks you care about, not noise.

① Rule vs profile vs identifier — the three words people mix up

Three Netskope words get used interchangeably and shouldn’t be. A data identifier (Netskope also calls it a DLP entity) is the raw pattern — "a credit-card number", "an India PAN". A DLP rule is one detector that combines identifiers with regex, dictionaries and exact-match logic. A DLP profile is the container of several rules plus classifiers and fingerprints — and the profile is what you actually attach to a policy.

The analogy that sticks: a rule is an ingredient, a profile is the recipe. One ingredient (a regex, a dictionary) does nothing on the plate alone. The recipe — several rules plus ML classifiers plus document fingerprints — is what you "cook" inside a Real-time Protection policy. You never attach a bare rule to a policy; you attach the profile.

👉 So far: identifier = the pattern, rule = one detector, profile = the recipe you attach to a policy. Next: see how they stack into one enforcement chain.
Figure 1 — Identifiers → Rule → Profile → Policy → channels
Build a DLP rule once, wrap it in a profile, attach it to ONE policy — and every channel enforces it Left: data identifiers, regex, dictionaries and exact-match feed into a DLP rule. Several rules plus classifiers and fingerprints make a DLP profile. The profile attaches to one Real-time Protection policy, which is enforced across three channels: SWG inline web, CASB inline SaaS, and API protection. Identifiers → Rule → Profile → Policy → every channel ingredients (detectors) data identifiersRegEx expressionskeyword dictionaryexact-match (EDM) DLP Ruleone detector DLP Profilerules + classifiers+ fingerprints Real-timeProtection policy SWG · webCASB · SaaSAPI · at-rest one profile, reused across all channels untrustedtrusted / inspectedinspection / policykey ideaallowed
Follow the arrows left to right: detectors feed a rule, rules feed a profile, the profile attaches to ONE policy, and that policy is enforced by SWG, CASB inline and API protection alike.

Build it once and it works everywhere. The same DLP profile that blocks an upload over the SWG also fires on a Salesforce attachment via CASB, and on a file already sitting in OneDrive via API protection. Netskope ships predefined profiles with rule sets mapped to well-known compliance regulations — PCI (payment-card), PHI (health) and PII — so you rarely start from a blank page.

Four words you’ll use every day

Tap each card — these are the vocabulary anchors for the whole lesson.

🧩
Data identifier
tap to flip

The raw pattern a rule looks for (a card number, a PAN). Predefined by Netskope or custom-built. The smallest Lego brick of DLP.

🔎
DLP rule
tap to flip

One detector: identifiers + regex + dictionary + exact-match, combined with operators. The ingredient, not the dish.

📋
DLP profile
tap to flip

A collection of rules + classifiers + fingerprint rules. This is the object you attach to a policy. The recipe.

🌐
One profile, many channels
tap to flip

The same profile enforces across SWG, CASB inline and API. Write detection once; reuse it everywhere. So: build the profile, not 3 copies.

Quick check · Q1 of 10

Karthik at Wipro asks: "I built a perfect regex rule for India PAN numbers — why won’t my Real-time Protection policy let me select it?" What did he miss?

Correct: a. A policy attaches a DLP profile, never a lone rule. The rule (ingredient) must sit inside a profile (recipe) before a Real-time Protection policy can use it. Regex in custom rules is fully supported, and no special licence or reboot is involved.

Pause & Predict

Predict: if the SAME profile fires on a web upload, a Salesforce attachment AND a file already in OneDrive, what does that tell you about WHERE the DLP engine runs? Type your guess.

Answer: It runs once, centrally, inside the single-pass engine — not three separate copies. Each channel (SWG inline, CASB inline, API protection) hands its decrypted content to the same DLP engine. That’s why you build the profile once and reuse it across all enforcement points.

② Custom rules: regex, dictionaries, NEAR & severity

Predefined identifiers cover the common stuff. The day you need something company-specific — an internal project codename, a customer-account format — you build a custom rule. The console path is worth memorising: Policies > Profiles > DLP > Edit Rules > Data Loss Prevention, then New Rule. The wizard walks you through entity → advanced options → scan options → severity threshold → name.

A custom rule has four kinds of detector you can mix: predefined identifiers, your own RegEx expressions, a keyword dictionary, and exact-match criteria. You glue them together with operators on the Advanced Options screen: AND, OR, NOT, and the proximity operator NEAR.

NEAR is the one that separates a beginner rule from a tuned one. It fires only when two identifiers appear within a set distance (max 1000 characters, measured between the outermost characters), and order doesn’t matter. "Account number NEAR a name" is far more precise than "any 16-digit string" because real leaks have context around them.

Figure 2 — Pick the detector by the data
Pick the detector by the data: regex for patterns, EDM for exact records, IDM for documents, ML for "looks like" A four-column comparison of detection techniques. Regex and dictionary match patterns but are noisy. EDM fingerprints exact structured records from a CSV with near-zero false positives. IDM fingerprints whole documents to catch copies and excerpts. ML classifiers recognise document and image types with no pattern written. Four ways to detect — match the technique to the data RegEx / dictionarydetects:any 9–16 digitstring, keywords+ FAST to write− NOISY: matchesrandom numbersEDM (Exact Match)detects:your real recordsfrom a CSV/DB+ near-zero falsepositives− needs the sourcedata indexedIDM (Indexed Doc)detects:whole-documentfingerprint+ catches copies& excerpts− index thedocs firstML classifierdetects:"looks like" apassport / code+ no patternto write− probabilistic,not exact Rule of thumb: exact list → EDM · secret document → IDM · file/image type → ML · quick pattern → regex
Read each column top to bottom: what it detects, its strength (green +) and its weakness (red −). Regex is fast but noisy; EDM and IDM are precise but need indexing first; ML needs no pattern at all.

After detectors comes the Severity Threshold screen, and this is where careers are made or tickets are born. You choose a counting mode — Record (how many hits) or Aggregate Score (a weighted total) — and set occurrence counts for Low / Medium / High / Critical. There’s also a Count only unique record toggle so 100 copies of the same SSN don’t inflate one file to Critical.

Figure 3 — Severity threshold decision tree
Give each severity tier a DISTINCT ascending count — equal thresholds silently default everything to Low A decision tree for severity thresholds. Choose Record count or Aggregate Score. With distinct ascending counts the platform classifies Low, Medium, High, Critical correctly. If two tiers share the same threshold value, the platform defaults the classification to Low and Critical never fires. Severity threshold — the trap that hides every Critical hit counting mode?Record · Aggregate Score DISTINCT: Low 1 · Med 5 · High 20 · Crit 50→ each tier classifies correctly ✓ EQUAL: High 20 = Crit 20→ platform defaults to LOW ✗ Symptom you will see: "Critical alerts never fire — everything lands as Low."Cause: two tiers share a threshold value. Fix: distinct ascending counts, or Aggregate Score with weighted entities. Bonus: "Count only unique record" stops 100 copies of one SSN inflating the count (it clears the preset threshold).Record = how many hits · Aggregate Score = sum of entity weights, so one high-weight match can hit Critical. untrustedtrusted / inspectedinspection / policykey ideaallowed
The trap to avoid: if two tiers share the same count, the platform silently classifies everything as Low and Critical never fires. Give each tier a distinct, ascending count.
🖥️ This is the screen you’ll build the rule on — Netskope tenant → Policies → Profiles → DLP → Edit Rules → Data Loss Prevention → New Rule. Real wizard fields. (Recreated for clarity — your tenant matches this.)
tenant.goskope.com · Policies → Profiles → DLP → Edit Rules
1
Name the DLP Rule
Custom — Account# NEAR Name
2
DLP Entity
Postal Addresses (US) + custom regex
3
Advanced Expression
RegEx AC[0-9]{4} NEAR keyword dict
Global Data Identifier
On (match every CSV column)
Counting mode
Aggregate Score
4
Severity Threshold
Low 1 · Med 5 · High 20 · Crit 50
Update Rule → Apply Changes
Common mistake — the underscore bypass

Symptom: a user appends a character — Confidential_x — and your keyword rule never fires. Cause: the dictionary/regex stops at a word boundary, so trailing symbols dodge the match. Fix: harden the regex to allow trailing word characters, or pair the keyword identifier with a NEAR context identifier so the match doesn’t hinge on the bare word.

One more flag matters for spreadsheets: Global Data Identifier (GDI). Without it, an identifier matches once and stops; with it on, the identifier re-matches every column of a CSV or Excel sheet. Forget GDI on a spreadsheet rule and you’ll catch the first leaked row and miss the other 9,999.

Quick check · Q2 of 10

Priya at ICICI needs a rule that fires only when an account number appears within 1000 bytes of a customer name, in either order. Which operator does the job?

Correct: c. NEAR is the proximity operator — it matches two identifiers within a byte distance (≤1000) regardless of order. AND would fire if both appear anywhere in a huge file (less precise); OR fires on either alone; NOT excludes. NEAR is the precision tool for "these two, close together".

Pause & Predict

Predict: your raw 9-digit SSN regex is flooding the SOC with false positives from order numbers and timestamps. Name TWO ways to cut the noise without turning the rule off. Type your guess.

Answer: One: clone the predefined rule and add AND NOT a benign-context dictionary (e.g. exclude lines containing "order", "ticket"). Two: convert to EDM seeded from the real customer table and raise the Severity Threshold Record count so only files with enough genuine records fire. Bonus: add NEAR a contextual keyword so the digits only count when near words like "SSN" or "account".

③ EDM, IDM & ML — precision detection

Regex is a sledgehammer. For the data that actually gets you fined, Netskope has three precision tools. EDM fingerprints your real structured records. IDM fingerprints whole documents. ML classifiers recognise a file’s type without any pattern at all.

The wedding-gate analogy nails the difference. EDM is the exact printed guest list at the gate — only names that match the real list (your DB of SSNs and account numbers) get flagged, no lookalikes. The ML classifier is the bouncer who recognises "this looks like a passport" even for a document he’s never seen. IDM is the librarian’s fingerprint of every book — even a photocopied chapter (a partial or derivative match) gets recognised.

EDM is the false-positive killer. You upload a UTF-8-encoded CSV or TXT with a header row (8 MB via the UI, up to 160 GB with multipart upload), set each column’s normalization to string or number, then build column groups — and columns inside a group are AND-ed during the exact match. The console path: Policies > Profiles > DLP > Edit Rules > Data Loss Prevention > Exact Match. Because EDM matches whole rows, a lone 9-digit number can’t fake a record — that’s the magic. You don’t set a separate "minimum matches" field; you control how many records before it alerts on the Severity Threshold screen (the Record count).

Figure 4 — EDM matches real rows, not lookalikes
EDM only fires when columns from the SAME real row match together — that is how it kills false positives A CSV of real customer records is indexed into a fingerprint. An outbound file is scanned. A lone 9-digit number does not match. A name plus the SSN plus the account number that belong to the same indexed row, all within proximity, satisfy the column group and clear the severity Record count to fire a Critical hit. EDM: match real ROWS, not lookalikes customers.csv (source) Aarti · 472-11-9931 · AC8841Vikram · 559-30-2210 · AC2207Meera · 601-77-4015 · AC9930 indexed → fingerprint (UTF-8, header row) EDM indexcolumn group (AND) outbound file A — NO match"order id 559302210"lone number, no row mates outbound file B — CRITICAL"Vikram 559-30-2210 AC2207"3 fields, SAME row, in proximity≥ severity Record count → fires Why FP drops: a random 9-digit string can't fake a whole ROW.Tune sensitivity with the Severity Threshold Record count — how many records before it alerts. untrustedtrusted / inspectedinspection / policykey ideaallowed
A lone number (file A) doesn’t match because it has no row-mates. A name + SSN + account from the SAME indexed row, in proximity (file B), satisfies the column group and fires Critical. Tune sensitivity with the Severity Threshold Record count.

▶ Follow one upload through EDM

Watch a leaked customer file get scanned against the EDM index, step by step. Press Play for the healthy path, then Break it to see the failure.

① Indexcustomers.csv → uploaded, header row read, columns set string/number
② Groupbuild a column group: name AND ssn AND account (AND-ed)
③ Scanoutbound file: "Vikram 559-30-2210 AC2207" hits the same indexed row
④ Verdict≥ Severity Record count → Critical → action Block, logged to Incidents
Press Play to step through the healthy path. Then press Break it.

IDM covers the document you can’t reduce to a pattern — a confidential design doc, a board deck. It’s implemented as a fingerprint rule (now moving to Fingerprint Groups): you index the file and set a similarity threshold (default 85%) so IDM recognises copies, renamed versions, and even excerpts (partial/derivative matches). ML classifiers go further — Netskope ships ~28 prebuilt models, document ones like tax form, resume, patent, NDA, bank statement, offer letter, source code and image ones like passport, driver’s licence, payment card, social security card, photo ID and screenshot — and they live under Policies > Profiles > DLP > File Classifiers. With TYOC (Train Your Own Classifier) you train a model on your own data type (at least 20 positive sample files). Across the platform Netskope offers 3,000+ predefined data identifiers over 1,500+ file types, and in April 2025 it released DLP as an API service — DLP On Demand.

Sneha at Infosys faces this

Sneha, an L1 analyst, is told: "Someone is emailing copies and excerpts of our confidential network-design document to a personal Gmail. Stop it." She reaches for a keyword rule on the doc’s title.

Likely cause

A keyword/regex rule only catches the literal title text — rename the file or paste two paragraphs into an email body and it sails through. The leak is a DOCUMENT, not a pattern, so it needs a document fingerprint (IDM).

Diagnosis

She picks the detector by the data shape: a whole confidential document with partial copies → IDM (fingerprint rule), not regex.

Policies → Profiles → DLP → hover Edit Rules → Rules → New Fingerprint Rule → index the design doc, set similarity threshold 85%
Fix

Index the design document as a fingerprint rule, add it to the DLP profile, and attach the profile to the Gmail-personal-instance CASB inline policy with action Block + User Alert.

Verify

In Incidents → DLP, send a test email with a renamed copy and then just two excerpted paragraphs → both show Last Action: Block tagged with the fingerprint rule name; an unrelated document does not trip it.

Quick check · Q3 of 10

Aditya at Flipkart must detect leaked passport images and stray source-code files — but he has zero regex patterns and the data types vary wildly. Best fit?

Correct: b. ML classifiers recognise a document/image TYPE (passport, source code, screenshot) with no pattern written — exactly the job here. Dictionaries and regex need literal patterns; EDM needs a structured record list. For a custom type, TYOC trains a model on your own data.

Pause & Predict

Predict: you have an exact list of 50,000 customer records AND you want to catch leaked copies of one secret design PDF. Which detector for each, and why not just one for both? Type your guess.

Answer: EDM for the 50,000 structured records (it matches whole rows of real data with near-zero false positives); IDM for the design PDF (it fingerprints the document so renamed copies and excerpts are caught). They solve different shapes of data — structured table vs unstructured document — so you use both, not one.

④ Actions, incidents & where it all runs

Detection is half the job; the action is the other half. On the Real-time Protection policy you choose what happens when the profile fires: Alert (log only), Block (stop it inline), User Alert (warn + let the user justify), and for CASB/API channels also Quarantine, Encrypt, and Legal Hold. Inline channels (SWG, CASB inline) can block in real time; API protection acts on data already at rest (quarantine, encrypt, legal hold).

When something fires, you live in Incidents > DLP (plus Incidents > Quarantine and Incidents > Legal Hold). Each incident shows the Last Action (allow/block/alert) and lets you change Status, Assign it, set Severity, and run object actions — Encrypt / Restore / Block / Delete. The Quarantine page even has Contact Owners to nudge the file owner.

Figure 5 — Netskope DLP at a glance
Netskope DLP at a glance — the whole detect-and-act map on one card A nine-tile cheat sheet: DLP rule, DLP profile, predefined vs custom identifiers, NEAR proximity, EDM, IDM, ML classifier, severity threshold, and incident actions, each with a one-line role and where it lives in the console. Netskope DLP — your one-glance map DLP Ruleone detector: ids + regex + dictDLP Profilerules + classifiers + fingerprintsPredefined / Custom3,000+ data identifiers vs admin-builtNEAR (proximity)two terms within ≤1000 bytesEDMexact records from a CSV/DBIDMfingerprint whole documentsML classifierpassport, source code, screenshotsSeverityRecord / Aggregate · Low→CriticalIncidents > DLPalert · block · quarantine · legal hold
Your one-card map of everything in this lesson — keep it open while you build your first profile, and revisit it before any DLP interview.
🖥️ This is the incident screen you’ll triage on — Netskope tenant → Incidents → DLP. Real fields you act on. (Recreated for clarity — your tenant matches this.)
tenant.goskope.com · Incidents → DLP
1
Rule / Profile
PII-India · EDM customers.csv
2
Severity
Critical
3
Last Action
Block
Status
New → In Progress
Assign
soc-tier2@infosys
4
Object action
Quarantine
Apply
Pro tip — the mental model that sticks

For any DLP task ask two questions: (1) what shape is the data? pattern → regex/dictionary, exact records → EDM, whole document → IDM, file type → ML; (2) what should happen? just watch → Alert, stop it now → Block (inline), it’s already stored → Quarantine/Encrypt/Legal Hold (API). Almost every DLP config maps onto that grid.

Common mistake — the steering blind spot

Symptom: a locally-installed Postman uploads company data and DLP fires nothing, though it shows up in Skope IT. Cause: the payload was never inspected inline — it’s an API/protocol blind spot. Fix: steer that traffic through inline DLP (SWG/CASB inline) and check the app isn’t sitting on a steering bypass. No detector can fire on traffic the engine never sees.

Prove you’ve got the DLP model

Take any real ask — "stop staff emailing copies of our confidential pricing PDF to personal Gmail" — and name the detector (IDM), the channel (CASB inline + API), the action (Block + Alert / Quarantine), the severity setup (distinct tiers), and where you’d review it (Incidents → DLP). If you can do that cold, you’re ready for the cert and the SOC floor.

Next: CASB — inline + API enforcementJump ahead: Private Access (NPA)
Quick check · Q4 of 10

A sensitive file was shared in OneDrive last week — it’s already at rest, not in transit. The business wants it pulled and the owner notified. Which action and channel?

Correct: d. Data already at rest in a SaaS app is handled out-of-band by API protection — you Quarantine it and use Contact Owners to notify. Inline Block only stops traffic in transit (this leak already happened); regex severity and NEAR are detection tuning, not response actions for stored data.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from Netskope docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

📝 Wrap-up assessment — six more

You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.

Q5 · Remember

In Netskope DLP, which object do you attach to a Real-time Protection policy?

Correct: c. A DLP profile (the collection of rules + classifiers + fingerprints) is what attaches to a policy. A lone identifier or regex must first live inside a rule, and the rule inside a profile. A severity threshold is a setting on a rule, not an attachable object.
Q6 · Apply

You have your company’s exact list of 80,000 employee bank-account numbers and must stop only those real numbers leaking, with almost no false positives. What do you build?

Correct: a. EDM fingerprints the real records so only genuine account numbers (matched as rows) fire — the Severity Threshold Record count tunes how many records before it alerts. A dictionary catches the word not the numbers; a 16-digit regex flags any 16 digits (huge FP); an ML screenshot classifier is the wrong data type.
Q7 · Apply

Priya needs a rule that fires only when an account ID sits within 1000 bytes of a customer name, in any order, to cut noise. Which operator?

Correct: a. NEAR enforces proximity (≤1000 bytes) regardless of order — precisely "these two, close together". OR fires on either term alone; a file-wide AND is less precise (the two could be pages apart); NOT excludes rather than requires.
Q8 · Analyze

A team reports: "Our Critical DLP alerts never trigger — every credit-card hit lands as Low, even big leaks." Steering and SSL inspection look fine. Most likely root cause?

Correct: d. Netskope’s documented tie-break: if two tiers share a threshold value, classification defaults to Low and Critical never fires. An old Client wouldn’t cause "all Low"; an oversized EDM file errors on upload; an unattached profile would fire nothing at all, not "everything Low".
Q9 · Analyze

A locally-installed Postman uploads a customer database to an external API. Skope IT shows the event, but no DLP incident is raised. Why, and what’s the fix?

Correct: a. Skope IT logging without a DLP hit means the content wasn’t inspected inline — an API/protocol blind spot. Steering that traffic through inline DLP (and confirming no bypass) is the fix. Severity tuning, detector choice and "just wait" don’t address an uninspected flow.
Q10 · Evaluate

Two designs to protect a confidential design document plus an exact list of 50,000 customer records: (A) one big regex profile for both; (B) IDM for the document + EDM for the records, each with distinct severity tiers. Which is stronger and why?

Correct: b. A document and a structured record list are different data shapes; IDM fingerprints the document (catching renames and excerpts) and EDM fingerprints the records (whole-row, near-zero FP). Distinct severity tiers avoid the default-to-Low trap. A single regex floods the SOC and misses renamed documents entirely.
Lesson complete — saved to your profile.
Almost! You need 70% (7 of 10) — re-read the path that tripped you up and tap "Try again".

🧠 In your own words

Type one line: In one line, why does EDM produce far fewer false positives than a 9-digit SSN regex? Then compare to the expert version.

Expert version: Because EDM matches whole rows of your real indexed records (name + SSN + account together), so a random 9-digit string with no matching row-mates can’t trigger it — whereas a bare regex flags any 9-digit number anywhere.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📖 Glossary

Data identifier (DLP entity)
The pattern a rule looks for — predefined by Netskope or custom-built by you.
DLP rule
One detector combining identifiers, regex, dictionaries and exact-match logic. The ingredient.
DLP profile
A collection of rules + classifiers + fingerprint rules — the object you attach to a policy. The recipe.
Keyword dictionary
An uploaded list of terms (e.g. "Confidential", a project codename) the rule matches against.
NEAR (proximity)
Operator matching two terms within N bytes (≤1000), regardless of order — precision over a bare pattern.
Global Data Identifier (GDI)
Flag that makes an identifier re-match every subsequent object/column — essential for CSV/Excel scanning.
EDM (Exact Data Match)
Fingerprints structured PII from a UTF-8 CSV/TXT; column groups are AND-ed; tune sensitivity via the Severity Threshold Record count.
IDM (Indexed Document Match)
Fingerprints whole documents so copies, renames and excerpts (partial/derivative content) are caught.
ML classifier
One of ~28 predefined models recognising a document/image TYPE (source code, tax form, passport, screenshot) with no pattern; lives under Policies > Profiles > DLP > File Classifiers.
TYOC
Train Your Own Classifier — a customer-trained ML model for a custom data type Netskope doesn’t ship; needs at least 20 positive sample files.
Fingerprint rule (IDM)
Indexed Document Match: created via Policies > Profiles > DLP > Edit Rules > Rules > New Fingerprint Rule (moving to Fingerprint Groups); similarity threshold default 85% catches excerpts.
Severity threshold
Record vs Aggregate Score, with Low/Medium/High/Critical counts; equal tiers default to Low; "Count only unique record" clears the preset threshold.
DLP On Demand
Netskope’s April-2025 release of DLP as an API service, callable outside the inline path; 3,000+ data identifiers over 1,500+ file types.

📚 Sources

  1. Netskope Docs — “DLP Rules” (rule wizard: entity, advanced expressions, scan options, severity, name). docs.netskope.com/en/netskope-help/data-security/data-loss-prevention/dlp-rules/
  2. Netskope Docs — “DLP Profiles” (profile = collection of rules + classifiers + custom fingerprint rules; predefined profiles mapped to PCI, PHI, PII). docs.netskope.com/en/dlp-profiles/
  3. Netskope Docs — “Use Advanced Expressions” (AND/OR/NOT, NEAR max 1000 characters, Global Data Identifier) & “Select an Exact Match File” (EDM: UTF-8 CSV/TXT, 8 MB UI / 160 GB multipart, column groups AND-ed). docs.netskope.com/en/use-advanced-expressions/ · docs.netskope.com/en/select-an-exact-match-file/
  4. Netskope Docs — “Select a Severity Threshold” (Record vs Aggregate Score; Count only unique record; equal tiers default to Low). docs.netskope.com/en/select-a-severity-threshold/
  5. Netskope press release & Help Net Security — “Netskope One DLP On Demand” (Apr 2025, DLP-as-API; 3,000+ data identifiers over 1,500+ file types, Train Your Own Classifier). netskope.com/press-releases · helpnetsecurity.com/2025/04/08/netskope-one-dlp-on-demand/
  6. Netskope Docs — “File Classifiers” (28 predefined ML classifiers; TYOC needs ≥20 positive files) & “Create Fingerprint Rules” (IDM via fingerprint rules → Fingerprint Groups; similarity threshold default 85%). docs.netskope.com/en/file-classifiers/ · docs.netskope.com/en/create-fingerprint-rules/
  7. Netskope NSK200 (NCCSI) exam blueprint — DLP domain: build/tune profiles, custom regex, EDM vs IDM vs ML classifiers. pass4success.com/netskope/exam/nsk200

What's next?

You can now build detection that catches real leaks. Next we go deeper into Private Access (ZTNA) — giving users per-app access to internal apps without a VPN.