Common interview slip
Many candidates treat Splunk as 'just a log search tool' and blur universal forwarders with heavy forwarders, or say RBA is just another alert rule. Both slip cost marks.
Universal forwarders are lightweight shippers — they send raw data with minimal parsing and almost no CPU/memory overhead, ideal for every endpoint. Heavy forwarders parse, filter, mask and route events before they reach the indexer, useful for data enrichment or compliance filtering at the source. And Risk-Based Alerting is not a single detection rule — it is a risk-score accumulation system: each detection contributes points to an entity's risk object (user, system, IP), and a Risk Notable Event fires only when the cumulative score crosses a threshold, dramatically reducing alert volume while surfacing genuinely risky behaviour. Knowing these distinctions is exactly what Splunk interviewers probe.
① Architecture & Forwarders — the pipeline and deployment components
Q: Describe the Splunk architecture — what are the three tiers?
Model answer: Splunk is a three-tier distributed pipeline. Forwarders sit on the data sources — servers, firewalls, endpoints — and ship data to the middle tier. Indexers receive the data stream, run it through the parsing pipeline (line breaking, timestamp extraction, field extraction metadata, and writing events to time-series buckets), and store the indexed events. Search heads receive user queries, translate them to SPL, and fan the search out to all indexers (distributed search), collect partial results and merge them. The clean one-liner: forwarders collect, indexers store, search heads query.
Q: Universal forwarder vs heavy forwarder — when do you use each?
Model answer: A universal forwarder (UF) is a lightweight agent — it does minimal parsing (just enough to identify events), uses almost no CPU or memory, and ships raw data compressed to the indexer. It is the default choice for every endpoint and server because its footprint is tiny. A heavy forwarder (HF) is a full Splunk installation configured only to forward. It runs the complete parsing pipeline — it can filter, transform, mask (for PII), route to different indexes and even do lookups before data reaches the indexer. Use an HF when you need to scrub sensitive fields before indexing, route by source type, or do compliance-driven data reduction at the collection tier. The interview rule: UF for volume/scale, HF for pre-processing.
Q: Explain the indexer bucket lifecycle — hot, warm, cold, frozen.
Model answer: When an indexer writes new events it opens a hot bucket — it is actively being written to and can be searched. When the bucket hits its size or time limit the indexer rolls it to warm (read-only, recent data, still on fast storage). After the warm retention age, it becomes cold (read-only, older data, typically moved to cheaper/slower storage). When cold retention expires the bucket is frozen — Splunk either deletes it or archives it to a path you specify (e.g. S3 via SmartStore). Only frozen data is inaccessible by default. The lifecycle drives storage tiering and is the answer to 'how does Splunk manage disk?'
Q: What is indexer clustering, and what does the replication factor do?
Model answer: Indexer clustering groups indexers under a cluster master (manager node) so that bucket copies are replicated across peers, providing data availability and high availability. The replication factor (RF) sets how many raw data copies exist (default 3 — data survives loss of RF-1 peers). The search factor (SF) sets how many searchable copies exist (default 2 — searches keep working if one peer is down). A search head cluster adds HA at the search tier with a deployer and captain role. The typical enterprise config: RF=3, SF=2, three or more indexer peers and a three-member search head cluster.
When asked about Splunk architecture, say: 'Forwarders collect data from sources, indexers parse and store it in time-series buckets, and search heads run queries by fanning out to all indexers and merging results.' That single sentence — plus knowing UF vs HF and hot/warm/cold/frozen — covers the architecture section of most Splunk interviews.
Which Splunk forwarder does NOT parse events before shipping them?
② SPL & Data Models — search language, CIM and acceleration
Q: Explain how SPL works — what is the pipe model?
Model answer: SPL (Search Processing Language) is pipe-based: each command takes a result set, transforms it, and passes the output to the next command via |. A search always starts with a generating command (most commonly a keyword/field search against the index, written before the first pipe) that returns raw events. Then transforming commands aggregate: stats counts and aggregates (like SQL GROUP BY), timechart plots values over time, eval computes new fields, rex extracts fields with regex, lookup enriches with external data, where filters rows. The result flows left to right — each | step narrows or reshapes. A typical detection query: index=windows EventCode=4625 | stats count by user, src_ip | where count > 20 — counts failed logins per user and IP, then filters to those over 20.
Q: What is the Common Information Model (CIM), and why does Splunk ES depend on it?
Model answer: The CIM (Common Information Model) is Splunk's field-naming standard: it defines canonical field names for every data domain (e.g. src_ip, dest_ip, user, action in the Network Traffic data model). Different vendors use different raw field names — Cisco calls a source IP src, Palo Alto calls it srcaddr. CIM normalises them so an ES correlation search written against src_ip works across all sources. CIM is enforced through field aliases and calculated fields in add-ons (TAs), and Splunk Enterprise Security correlation searches are written against CIM data models — so if your data is not CIM-normalised, your ES detections do not fire.
Q: What is a data model and what is data model acceleration?
Model answer: A data model is a hierarchical schema that defines a structured view over raw events: it declares datasets (like Authentication, Network Traffic) and the fields each dataset expects. You search a data model with | datamodel Authentication All_Authentication search or with Pivot. Data model acceleration (also called TSIDX acceleration) pre-summarises the data model into time-series index (TSIDX) files so queries with tstats skip raw event reading entirely, returning results orders of magnitude faster. ES correlation searches almost always use | tstats count from datamodel=... for speed. The trade-off: acceleration uses extra disk and a background search job to keep the summaries current.
Universal Forwarder: lightweight, ships raw data, minimal footprint — default for endpoints. Heavy Forwarder: full parsing pipeline, can filter/mask/route before indexing — use when you need pre-processing or compliance scrubbing.
Common Information Model — Splunk's field-naming standard that normalises raw fields (src, srcaddr…) to canonical names (src_ip, dest_ip, user, action). ES correlation searches are written against CIM data models, so CIM compliance is mandatory.
Risk-Based Alerting: each detection adds risk points to a risk object (user/system/IP). A Risk Notable fires only when the cumulative score crosses a threshold, cutting alert fatigue and surfacing genuinely risky behaviour.
A Python or visual workflow in Splunk SOAR (Phantom) triggered by a Notable Event. It orchestrates enrichment (VirusTotal, geo-IP), containment (block in proxy, quarantine in AD) and notification (Slack), reducing MTTR for routine incidents.
Getting data into Splunk is not the same as getting it working with Enterprise Security. ES correlation searches are written against CIM data model field names (src_ip, user, action). If your source data has not been normalised via a Technology Add-on (TA) with field aliases, those fields do not exist in the data model and the correlation search fires on nothing. Always verify CIM normalisation with | datamodel Authentication All_Authentication search before assuming ES detections work.
You want to search network traffic across 30 days as fast as possible using an accelerated CIM data model. Which command should you use?
③ ES, RBA & SOAR — Enterprise Security, risk scoring and automated response
Q: What is Splunk Enterprise Security (ES) and how does it generate Notable Events?
Model answer: Splunk Enterprise Security (ES) is a premium SIEM app built on top of Splunk. It provides pre-built correlation searches (scheduled SPL searches that run against CIM data models), a Notable Events framework (the analyst queue for investigating detections), Risk-Based Alerting (RBA), threat intelligence management, and asset and identity enrichment. When a correlation search fires it creates a Notable Event in the Incident Review dashboard — similar to a ticket. Analysts work Notable Events by reviewing the evidence, assigning status (New / In Progress / Resolved), and documenting the outcome. The gold line: ES = correlation searches on CIM data → Notable Events → analyst workflow.
Q: What is Risk-Based Alerting (RBA), and how does it differ from traditional alerting?
Model answer: Traditional alerting fires a Notable Event every time a single detection rule triggers — leading to alert fatigue when the same benign behaviour trips the same rule repeatedly. Risk-Based Alerting (RBA) changes the model: each detection rule adds risk points to a risk object (a user, system or other entity) rather than firing directly. A separate Risk Notable correlation search fires only when a risk object's cumulative risk score crosses a threshold — say 100 points — within a time window. The result is far fewer, higher-fidelity alerts. Interviewers like the one-liner: RBA aggregates weak signals per entity so only genuinely risky behaviour escalates.
Q: What is Splunk SOAR (Phantom) and how do playbooks work?
Model answer: Splunk SOAR (formerly Phantom) is a Security Orchestration, Automation and Response platform that connects to hundreds of security tools via apps and connectors. A playbook is an automated workflow — written in Python or via a visual editor — that executes a sequence of actions when triggered by an event (e.g. a Notable Event from ES). A typical phishing playbook: receive the alert → extract URLs and hashes → query VirusTotal → if malicious, block the URL in the proxy, quarantine the user in AD, and post a Slack notification → close the Notable Event. Playbooks can be fully automated or prompt an analyst for approval before a destructive action. The key concept: SOAR reduces mean time to respond (MTTR) by automating repetitive triage and containment steps.
Q: How does Splunk ES integrate with Splunk SOAR for end-to-end automation?
Model answer: The integration is bidirectional. ES → SOAR: a Notable Event can trigger a SOAR playbook automatically via the adaptive response framework (ES sends the event context to SOAR using the Splunk ES app for SOAR). The playbook enriches and responds — e.g. geo-IP lookup, endpoint isolation via EDR, block in firewall. SOAR → ES: SOAR playbooks can update the Notable Event status, add comments and close it once automated response is complete, giving analysts full audit trail in the ES Incident Review. The net result: detection in ES, automated triage and containment in SOAR, and a documented resolution back in ES — all without an analyst touching the keyboard for routine detections.
Before enabling RBA in production, verify that each detection rule is actually writing risk events to the risk_score index. Run index=risk to see recent risk events, check the risk_object and risk_score fields, and confirm they match your entity naming (user vs username matters). A misconfigured risk_object_type means every risk event lands on a different object and the score never accumulates — the most common RBA gotcha.
▶ Watch a Windows log become a Risk Notable — and find why detections go missing
Step through how a Windows failed-login event travels from forwarder to an ES Risk Notable. Press Play for the healthy path, then Break it to see the classic 'CIM not normalised' failure.
How does Risk-Based Alerting reduce alert fatigue compared to traditional Splunk ES correlation searches?
④ Tuning & Scenarios — scheduler, tstats, summary indexing and real-world fixes
Q: What is summary indexing and when should you use it?
Model answer: Summary indexing is a technique where a scheduled search runs periodically, computes aggregated results (e.g. hourly counts, risk tallies), and writes those summaries as events to a dedicated summary index. A later search queries only the lightweight summary rather than replaying all raw events. Use summary indexing when: you have a very expensive scheduled search that needs to run frequently, you need long-retention aggregates beyond raw data retention, or your reporting queries span months of data. It predates data model acceleration and is still useful for custom aggregations that do not fit a CIM data model. Trade-off: adds complexity and another scheduled job to maintain.
Q: What is tstats, and why is it faster than a regular search?
Model answer: | tstats is a generating command that queries TSIDX (time-series index) files — the pre-built acceleration summaries of data models — without reading raw events. Because the TSIDX files contain only the indexed field metadata (not full event text), tstats returns results far faster than a stats search over raw data, especially over long time ranges. The typical ES correlation search pattern: | tstats summariesonly=true count from datamodel=Network_Traffic.All_Traffic where All_Traffic.action=blocked by All_Traffic.src_ip. The summariesonly=true flag means: only use the pre-built summaries, do not fall back to raw events. The trade-off is that tstats only knows fields that the data model acceleration has computed — you cannot tstats over arbitrary fields not in the accelerated model.
Q: How do you manage search scheduler concurrency — what causes searches to skip and how do you fix it?
Model answer: Splunk's search scheduler runs saved searches and correlations on a time-based queue. When more searches are due to run than there are available scheduler slots (controlled by max_searches_per_cpu and scheduler.max_searches_perc), the scheduler skips lower-priority searches, logging 'This search was skipped'. Fixes: (1) Stagger search schedules — spread correlation searches so they do not all fire at the :00 mark. (2) Increase indexer capacity so searches finish faster and slots free up. (3) Convert expensive raw-event searches to tstats so they complete in milliseconds. (4) Raise priority for critical detection searches. (5) Disable or tune infrequently used saved searches. The diagnostic: check the Search Job Inspector and the scheduler.log to find which searches are skipping and why.
Q: A Splunk analyst sees high lag on a correlation search that queries 30 days of Windows event logs. How do you approach the tuning?
Model answer: First, check whether the data is CIM-normalised and the data model is accelerated. If it is, rewrite the search using | tstats summariesonly=true against the correct data model — this alone often reduces runtime from minutes to seconds. If the search cannot use tstats (e.g. it queries a custom field not in the model), consider summary indexing: run a nightly aggregation job and query the summary for 30-day reporting. Also check the time range — searches that span months across unaccelerated indexes scan every raw event on every indexer; reducing to the last 7 days is often sufficient for detection. Check index= filters are specific (do not leave out the index= term — without it, Splunk scans all indexes). Finally, check if the search is accelerated at the report level (Splunk report acceleration). Use the Search Job Inspector to see which command consumes the most time.
Priya at SecureNova in Bengaluru faces this
SecureNova's Splunk ES has a correlation search 'Excessive Failed Logins' that runs every 5 minutes against 30 days of Windows Security logs. The search is skipping 80% of the time, and when it does run it takes 4 minutes — missing detections in the analyst queue.
The search runs a raw stats query over index=windows across 30 days without using data model acceleration. The Windows Authentication data model is enabled and accelerated but not used. The search also runs at exactly :00, :05, :10 — competing with a dozen other correlation searches for scheduler slots.
In the Search Job Inspector the 'stats count by user' step accounts for 3.5 minutes — it is scanning 30 days of raw events. In scheduler.log the search shows 'search not executed: concurrency limit reached' at each :00 boundary.
Settings ▸ Searches, Reports and Alerts ▸ (find the search) ▸ Edit schedule / Search Job InspectorRewrite the search using tstats summariesonly=true against datamodel=Authentication.All_Authentication — reduces runtime from 4 minutes to under 5 seconds. Stagger the schedule to :01, :06, :11 (offset by 1 minute) to avoid competing with other searches. Reduce the lookback from 30 days to 24 hours for the real-time detection use case, and use summary indexing for 30-day trend reports.
Scheduler.log shows no skipped runs. Search Job Inspector shows runtime under 10 seconds. Notable Events reappear in the ES Incident Review queue within 5 minutes of events landing on the indexer.
A Splunk correlation search that runs every 5 minutes is marked 'skipped' in the scheduler log. What is the most likely cause?
🤖 Ask the AI Tutor
Tap any question — instant, scoped to this lesson. No login, no waiting.
Pre-curated from vendor docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.
📝 Wrap-up assessment — six more
You've answered 4 inline. Six left. 70% (7 of 10) marks the lesson complete on your profile. Tap Submit all answers at the end.
🧠 In your own words
Type one line: what is Risk-Based Alerting, and how does it differ from a traditional ES correlation search? Then compare with the expert version.
🗣 Teach a friend
Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.
📖 Glossary
- Universal Forwarder (UF)
- A lightweight Splunk agent that ships raw compressed data to an indexer with minimal CPU and memory overhead. The default choice for every endpoint and server.
- Heavy Forwarder (HF)
- A full Splunk installation configured to forward. It runs the parsing pipeline, enabling filtering, PII masking, routing and enrichment before data reaches the indexer.
- Hot / Warm / Cold / Frozen
- The four bucket lifecycle states on a Splunk indexer. Hot is actively written; warm is recent read-only; cold is older, often on cheaper storage; frozen is archived or deleted.
- CIM (Common Information Model)
- Splunk's field-naming standard that defines canonical field names (src_ip, dest_ip, user, action) across data domains. Technology Add-ons map vendor-specific field names to CIM via field aliases.
- tstats
- A Splunk generating command that queries pre-built TSIDX acceleration files of an accelerated data model, skipping raw event scanning for dramatically faster results over large time ranges.
- Risk-Based Alerting (RBA)
- A Splunk ES alerting model where each detection adds risk points to a risk object (user, system, IP). A Risk Notable fires only when the cumulative score crosses a threshold, reducing alert fatigue.
- Splunk SOAR (Phantom)
- Splunk's Security Orchestration, Automation and Response platform. Python or visual playbooks automate enrichment, containment and notification triggered by ES Notable Events.
- Replication Factor / Search Factor
- In an indexer cluster, the replication factor sets how many raw bucket copies exist; the search factor sets how many searchable copies exist. Default RF=3, SF=2.
- Summary Index
- A Splunk index populated by a scheduled search that pre-aggregates expensive queries. Later searches query the lightweight summary rather than re-scanning raw events — useful for long-retention reports.
- Search Job Inspector
- A Splunk UI tool that shows the execution timeline of a search job, breaking it down by command to identify which step consumes the most time — the first tool for search performance tuning.
📚 Sources
- Splunk — Splunk Enterprise architecture: forwarders, indexers and search heads. docs.splunk.com/Documentation/Splunk/latest/Overview
- Splunk — Universal forwarder vs heavy forwarder and the indexer parsing pipeline. docs.splunk.com/Documentation/Forwarder
- Splunk — Splunk Enterprise Security: correlation searches, Notable Events and Risk-Based Alerting. docs.splunk.com/Documentation/ES
- Splunk — Common Information Model (CIM) and Technology Add-ons for field normalisation. docs.splunk.com/Documentation/CIM
- Splunk — Splunk SOAR (Phantom): playbook authoring, apps and adaptive response integration. docs.splunk.com/Documentation/SOAR
- Splunk — tstats command, data model acceleration and search performance tuning. docs.splunk.com/Documentation/Splunk/latest/SearchReference/Tstats
What's next?
Done with the interview prep? Go deeper on Splunk — the full SPL search language, Enterprise Security correlation rule authoring, Risk-Based Alerting design, and SOAR playbook development.