Most engineers think…
Most engineers think DLP is "write a regex for credit cards and SSNs, switch it on, done." So they ship a raw pattern and call it a data-protection programme.
Wrong — and the helpdesk will hate you. A naked regex matches any 9–16 digit string, so order numbers, ticket IDs and timestamps all trip it. Real Netskope DLP is choosing the right detector (EDM for exact records, IDM for documents, ML for file types) and tuning severity thresholds + proximity (NEAR) + column groups so the alerts you get are the leaks you care about, not noise.
① Rule vs profile vs identifier — the three words people mix up
Three Netskope words get used interchangeably and shouldn’t be. A data identifier (Netskope also calls it a DLP entity) is the raw pattern — "a credit-card number", "an India PAN". A DLP rule is one detector that combines identifiers with regex, dictionaries and exact-match logic. A DLP profile is the container of several rules plus classifiers and fingerprints — and the profile is what you actually attach to a policy.
The analogy that sticks: a rule is an ingredient, a profile is the recipe. One ingredient (a regex, a dictionary) does nothing on the plate alone. The recipe — several rules plus ML classifiers plus document fingerprints — is what you "cook" inside a Real-time Protection policy. You never attach a bare rule to a policy; you attach the profile.
Build it once and it works everywhere. The same DLP profile that blocks an upload over the SWG also fires on a Salesforce attachment via CASB, and on a file already sitting in OneDrive via API protection. Netskope ships predefined profiles with rule sets mapped to well-known compliance regulations — PCI (payment-card), PHI (health) and PII — so you rarely start from a blank page.
Four words you’ll use every day
Tap each card — these are the vocabulary anchors for the whole lesson.
The raw pattern a rule looks for (a card number, a PAN). Predefined by Netskope or custom-built. The smallest Lego brick of DLP.
One detector: identifiers + regex + dictionary + exact-match, combined with operators. The ingredient, not the dish.
A collection of rules + classifiers + fingerprint rules. This is the object you attach to a policy. The recipe.
The same profile enforces across SWG, CASB inline and API. Write detection once; reuse it everywhere. So: build the profile, not 3 copies.
Karthik at Wipro asks: "I built a perfect regex rule for India PAN numbers — why won’t my Real-time Protection policy let me select it?" What did he miss?
Pause & Predict
Predict: if the SAME profile fires on a web upload, a Salesforce attachment AND a file already in OneDrive, what does that tell you about WHERE the DLP engine runs? Type your guess.
② Custom rules: regex, dictionaries, NEAR & severity
Predefined identifiers cover the common stuff. The day you need something company-specific — an internal project codename, a customer-account format — you build a custom rule. The console path is worth memorising: Policies > Profiles > DLP > Edit Rules > Data Loss Prevention, then New Rule. The wizard walks you through entity → advanced options → scan options → severity threshold → name.
A custom rule has four kinds of detector you can mix: predefined identifiers, your own RegEx expressions, a keyword dictionary, and exact-match criteria. You glue them together with operators on the Advanced Options screen: AND, OR, NOT, and the proximity operator NEAR.
NEAR is the one that separates a beginner rule from a tuned one. It fires only when two identifiers appear within a set distance (max 1000 characters, measured between the outermost characters), and order doesn’t matter. "Account number NEAR a name" is far more precise than "any 16-digit string" because real leaks have context around them.
After detectors comes the Severity Threshold screen, and this is where careers are made or tickets are born. You choose a counting mode — Record (how many hits) or Aggregate Score (a weighted total) — and set occurrence counts for Low / Medium / High / Critical. There’s also a Count only unique record toggle so 100 copies of the same SSN don’t inflate one file to Critical.
Symptom: a user appends a character — Confidential_x — and your keyword rule never fires. Cause: the dictionary/regex stops at a word boundary, so trailing symbols dodge the match. Fix: harden the regex to allow trailing word characters, or pair the keyword identifier with a NEAR context identifier so the match doesn’t hinge on the bare word.
One more flag matters for spreadsheets: Global Data Identifier (GDI). Without it, an identifier matches once and stops; with it on, the identifier re-matches every column of a CSV or Excel sheet. Forget GDI on a spreadsheet rule and you’ll catch the first leaked row and miss the other 9,999.
Priya at ICICI needs a rule that fires only when an account number appears within 1000 bytes of a customer name, in either order. Which operator does the job?
Pause & Predict
Predict: your raw 9-digit SSN regex is flooding the SOC with false positives from order numbers and timestamps. Name TWO ways to cut the noise without turning the rule off. Type your guess.
③ EDM, IDM & ML — precision detection
Regex is a sledgehammer. For the data that actually gets you fined, Netskope has three precision tools. EDM fingerprints your real structured records. IDM fingerprints whole documents. ML classifiers recognise a file’s type without any pattern at all.
The wedding-gate analogy nails the difference. EDM is the exact printed guest list at the gate — only names that match the real list (your DB of SSNs and account numbers) get flagged, no lookalikes. The ML classifier is the bouncer who recognises "this looks like a passport" even for a document he’s never seen. IDM is the librarian’s fingerprint of every book — even a photocopied chapter (a partial or derivative match) gets recognised.
EDM is the false-positive killer. You upload a UTF-8-encoded CSV or TXT with a header row (8 MB via the UI, up to 160 GB with multipart upload), set each column’s normalization to string or number, then build column groups — and columns inside a group are AND-ed during the exact match. The console path: Policies > Profiles > DLP > Edit Rules > Data Loss Prevention > Exact Match. Because EDM matches whole rows, a lone 9-digit number can’t fake a record — that’s the magic. You don’t set a separate "minimum matches" field; you control how many records before it alerts on the Severity Threshold screen (the Record count).
▶ Follow one upload through EDM
Watch a leaked customer file get scanned against the EDM index, step by step. Press Play for the healthy path, then Break it to see the failure.
IDM covers the document you can’t reduce to a pattern — a confidential design doc, a board deck. It’s implemented as a fingerprint rule (now moving to Fingerprint Groups): you index the file and set a similarity threshold (default 85%) so IDM recognises copies, renamed versions, and even excerpts (partial/derivative matches). ML classifiers go further — Netskope ships ~28 prebuilt models, document ones like tax form, resume, patent, NDA, bank statement, offer letter, source code and image ones like passport, driver’s licence, payment card, social security card, photo ID and screenshot — and they live under Policies > Profiles > DLP > File Classifiers. With TYOC (Train Your Own Classifier) you train a model on your own data type (at least 20 positive sample files). Across the platform Netskope offers 3,000+ predefined data identifiers over 1,500+ file types, and in April 2025 it released DLP as an API service — DLP On Demand.
Sneha at Infosys faces this
Sneha, an L1 analyst, is told: "Someone is emailing copies and excerpts of our confidential network-design document to a personal Gmail. Stop it." She reaches for a keyword rule on the doc’s title.
A keyword/regex rule only catches the literal title text — rename the file or paste two paragraphs into an email body and it sails through. The leak is a DOCUMENT, not a pattern, so it needs a document fingerprint (IDM).
She picks the detector by the data shape: a whole confidential document with partial copies → IDM (fingerprint rule), not regex.
Policies → Profiles → DLP → hover Edit Rules → Rules → New Fingerprint Rule → index the design doc, set similarity threshold 85%Index the design document as a fingerprint rule, add it to the DLP profile, and attach the profile to the Gmail-personal-instance CASB inline policy with action Block + User Alert.
In Incidents → DLP, send a test email with a renamed copy and then just two excerpted paragraphs → both show Last Action: Block tagged with the fingerprint rule name; an unrelated document does not trip it.
Aditya at Flipkart must detect leaked passport images and stray source-code files — but he has zero regex patterns and the data types vary wildly. Best fit?
Pause & Predict
Predict: you have an exact list of 50,000 customer records AND you want to catch leaked copies of one secret design PDF. Which detector for each, and why not just one for both? Type your guess.
④ Actions, incidents & where it all runs
Detection is half the job; the action is the other half. On the Real-time Protection policy you choose what happens when the profile fires: Alert (log only), Block (stop it inline), User Alert (warn + let the user justify), and for CASB/API channels also Quarantine, Encrypt, and Legal Hold. Inline channels (SWG, CASB inline) can block in real time; API protection acts on data already at rest (quarantine, encrypt, legal hold).
When something fires, you live in Incidents > DLP (plus Incidents > Quarantine and Incidents > Legal Hold). Each incident shows the Last Action (allow/block/alert) and lets you change Status, Assign it, set Severity, and run object actions — Encrypt / Restore / Block / Delete. The Quarantine page even has Contact Owners to nudge the file owner.
For any DLP task ask two questions: (1) what shape is the data? pattern → regex/dictionary, exact records → EDM, whole document → IDM, file type → ML; (2) what should happen? just watch → Alert, stop it now → Block (inline), it’s already stored → Quarantine/Encrypt/Legal Hold (API). Almost every DLP config maps onto that grid.
Symptom: a locally-installed Postman uploads company data and DLP fires nothing, though it shows up in Skope IT. Cause: the payload was never inspected inline — it’s an API/protocol blind spot. Fix: steer that traffic through inline DLP (SWG/CASB inline) and check the app isn’t sitting on a steering bypass. No detector can fire on traffic the engine never sees.
Take any real ask — "stop staff emailing copies of our confidential pricing PDF to personal Gmail" — and name the detector (IDM), the channel (CASB inline + API), the action (Block + Alert / Quarantine), the severity setup (distinct tiers), and where you’d review it (Incidents → DLP). If you can do that cold, you’re ready for the cert and the SOC floor.
A sensitive file was shared in OneDrive last week — it’s already at rest, not in transit. The business wants it pulled and the owner notified. Which action and channel?
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from Netskope docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: In one line, why does EDM produce far fewer false positives than a 9-digit SSN regex? Then compare to the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- Data identifier (DLP entity)
- The pattern a rule looks for — predefined by Netskope or custom-built by you.
- DLP rule
- One detector combining identifiers, regex, dictionaries and exact-match logic. The ingredient.
- DLP profile
- A collection of rules + classifiers + fingerprint rules — the object you attach to a policy. The recipe.
- Keyword dictionary
- An uploaded list of terms (e.g. "Confidential", a project codename) the rule matches against.
- NEAR (proximity)
- Operator matching two terms within N bytes (≤1000), regardless of order — precision over a bare pattern.
- Global Data Identifier (GDI)
- Flag that makes an identifier re-match every subsequent object/column — essential for CSV/Excel scanning.
- EDM (Exact Data Match)
- Fingerprints structured PII from a UTF-8 CSV/TXT; column groups are AND-ed; tune sensitivity via the Severity Threshold Record count.
- IDM (Indexed Document Match)
- Fingerprints whole documents so copies, renames and excerpts (partial/derivative content) are caught.
- ML classifier
- One of ~28 predefined models recognising a document/image TYPE (source code, tax form, passport, screenshot) with no pattern; lives under Policies > Profiles > DLP > File Classifiers.
- TYOC
- Train Your Own Classifier — a customer-trained ML model for a custom data type Netskope doesn’t ship; needs at least 20 positive sample files.
- Fingerprint rule (IDM)
- Indexed Document Match: created via Policies > Profiles > DLP > Edit Rules > Rules > New Fingerprint Rule (moving to Fingerprint Groups); similarity threshold default 85% catches excerpts.
- Severity threshold
- Record vs Aggregate Score, with Low/Medium/High/Critical counts; equal tiers default to Low; "Count only unique record" clears the preset threshold.
- DLP On Demand
- Netskope’s April-2025 release of DLP as an API service, callable outside the inline path; 3,000+ data identifiers over 1,500+ file types.
📚 Sources
- Netskope Docs — “DLP Rules” (rule wizard: entity, advanced expressions, scan options, severity, name). docs.netskope.com/en/netskope-help/data-security/data-loss-prevention/dlp-rules/
- Netskope Docs — “DLP Profiles” (profile = collection of rules + classifiers + custom fingerprint rules; predefined profiles mapped to PCI, PHI, PII). docs.netskope.com/en/dlp-profiles/
- Netskope Docs — “Use Advanced Expressions” (AND/OR/NOT, NEAR max 1000 characters, Global Data Identifier) & “Select an Exact Match File” (EDM: UTF-8 CSV/TXT, 8 MB UI / 160 GB multipart, column groups AND-ed). docs.netskope.com/en/use-advanced-expressions/ · docs.netskope.com/en/select-an-exact-match-file/
- Netskope Docs — “Select a Severity Threshold” (Record vs Aggregate Score; Count only unique record; equal tiers default to Low). docs.netskope.com/en/select-a-severity-threshold/
- Netskope press release & Help Net Security — “Netskope One DLP On Demand” (Apr 2025, DLP-as-API; 3,000+ data identifiers over 1,500+ file types, Train Your Own Classifier). netskope.com/press-releases · helpnetsecurity.com/2025/04/08/netskope-one-dlp-on-demand/
- Netskope Docs — “File Classifiers” (28 predefined ML classifiers; TYOC needs ≥20 positive files) & “Create Fingerprint Rules” (IDM via fingerprint rules → Fingerprint Groups; similarity threshold default 85%). docs.netskope.com/en/file-classifiers/ · docs.netskope.com/en/create-fingerprint-rules/
- Netskope NSK200 (NCCSI) exam blueprint — DLP domain: build/tune profiles, custom regex, EDM vs IDM vs ML classifiers. pass4success.com/netskope/exam/nsk200
What's next?
You can now build detection that catches real leaks. Next we go deeper into Private Access (ZTNA) — giving users per-app access to internal apps without a VPN.