What is the role of the Splunk cluster manager (master node) in an indexer cluster?

Correct: a. The cluster manager orchestrates the indexer cluster: it tracks which peers hold which bucket copies, ensures the replication factor is met, handles peer failure by instructing peers to re-replicate, and manages rolling restarts and upgrades. Search heads and individual indexers do not perform this orchestration role.

Why does Splunk call itself a 'schema-on-read' platform?

Correct: b. Splunk stores raw event text at index time with minimal metadata. Field extraction rules (in props.conf and transforms.conf) are applied at search time — when you run a query — not when the data lands. This schema-on-read means you can define or change fields after ingestion without reindexing, at the cost of search-time CPU for extraction.

A new firewall source type is ingested into Splunk but the ES 'Network Traffic — Allowed' correlation search finds nothing. What is the most likely cause?

Correct: b. ES correlation searches query CIM data models using CIM field names (src_ip, dest_ip, action). If the firewall source type has not been normalised (via a TA with field aliases) to map raw vendor fields to CIM names, its events are invisible to the data model and the correlation search returns nothing. Hot bucket build time, search head HA and SOAR do not cause this.

Which of the following best describes when a Risk Notable fires in Splunk ES with RBA?

Correct: c. In RBA, individual detection rules add risk points to a risk object (user, system, IP) rather than creating Notable Events directly. A dedicated Risk Notable correlation search monitors cumulative risk scores and fires only when the threshold is crossed. This reduces alert fatigue by surfacing entities with multiple weak signals rather than every single detection.

You need a Splunk correlation search to detect brute-force logins across 7 days of data as fast as possible using the Authentication data model. Which approach is best?

Correct: b. tstats with summariesonly=true queries the pre-built TSIDX acceleration files of the Authentication data model, skipping raw event scanning. Over 7 days of Windows logs this is orders of magnitude faster than a raw stats search. The raw index search (a), timechart (c) and rex (d) all scan raw events and would be far slower at scale.

An interviewer asks what you would check first if a Splunk ES correlation search that was working last week is no longer generating Notable Events. Best answer?

Correct: c. A methodical troubleshooting approach checks the most likely causes first: CIM normalisation failure (run | datamodel search), scheduler skipping (check scheduler.log), and whether the threshold or logic was changed. Rebooting and recreating the search are disruptive and skip diagnosis; removing time filters masks the real problem.

Splunk Interview Questions & Answers (2026)

Q: Which Splunk forwarder does NOT parse events before shipping them?

Correct: a. The universal forwarder is a lightweight agent that ships raw compressed data to the indexer with minimal CPU and memory use. The heavy forwarder runs the full Splunk parsing pipeline and can filter, mask and route before forwarding. The search head and cluster manager have different roles.

Q: You want to search network traffic across 30 days as fast as possible using an accelerated CIM data model. Which command should you use?

Correct: b. tstats queries the pre-built TSIDX acceleration summaries of a data model without reading raw events, making it orders of magnitude faster for large time ranges. summariesonly=true restricts it to accelerated data. stats, timechart and rex all read raw events and would be far slower over 30 days.

Q: How does Risk-Based Alerting reduce alert fatigue compared to traditional Splunk ES correlation searches?

Correct: c. RBA changes the alerting model: each detection adds risk points to a risk object (user, system, IP) rather than directly creating a Notable Event. A Risk Notable fires only when cumulative risk exceeds a threshold, surfacing genuinely risky behaviour rather than individual detections. This dramatically reduces noise.

Q: A Splunk correlation search that runs every 5 minutes is marked 'skipped' in the scheduler log. What is the most likely cause?

Correct: c. Splunk's search scheduler skips lower-priority searches when concurrent demand exceeds available slots (controlled by max_searches_per_cpu). Fix by staggering schedules, converting expensive searches to tstats, or increasing capacity. Disk space, forwarder issues and CIM normalisation do not cause scheduler skipping.

Common interview slip

Many candidates treat Splunk as 'just a log search tool' and blur universal forwarders with heavy forwarders, or say RBA is just another alert rule. Both slip cost marks.

Universal forwarders are lightweight shippers — they send raw data with minimal parsing and almost no CPU/memory overhead, ideal for every endpoint. Heavy forwarders parse, filter, mask and route events before they reach the indexer, useful for data enrichment or compliance filtering at the source. And Risk-Based Alerting is not a single detection rule — it is a risk-score accumulation system: each detection contributes points to an entity's risk object (user, system, IP), and a Risk Notable Event fires only when the cumulative score crosses a threshold, dramatically reducing alert volume while surfacing genuinely risky behaviour. Knowing these distinctions is exactly what Splunk interviewers probe.

① Architecture & Forwarders — the pipeline and deployment components

Q: Describe the Splunk architecture — what are the three tiers?

Model answer: Splunk is a three-tier distributed pipeline. Forwarders sit on the data sources — servers, firewalls, endpoints — and ship data to the middle tier. Indexers receive the data stream, run it through the parsing pipeline (line breaking, timestamp extraction, field extraction metadata, and writing events to time-series buckets), and store the indexed events. Search heads receive user queries, translate them to SPL, and fan the search out to all indexers (distributed search), collect partial results and merge them. The clean one-liner: forwarders collect, indexers store, search heads query.

Q: Universal forwarder vs heavy forwarder — when do you use each?

Model answer: A universal forwarder (UF) is a lightweight agent — it does minimal parsing (just enough to identify events), uses almost no CPU or memory, and ships raw data compressed to the indexer. It is the default choice for every endpoint and server because its footprint is tiny. A heavy forwarder (HF) is a full Splunk installation configured only to forward. It runs the complete parsing pipeline — it can filter, transform, mask (for PII), route to different indexes and even do lookups before data reaches the indexer. Use an HF when you need to scrub sensitive fields before indexing, route by source type, or do compliance-driven data reduction at the collection tier. The interview rule: UF for volume/scale, HF for pre-processing.

Q: Explain the indexer bucket lifecycle — hot, warm, cold, frozen.

Model answer: When an indexer writes new events it opens a hot bucket — it is actively being written to and can be searched. When the bucket hits its size or time limit the indexer rolls it to warm (read-only, recent data, still on fast storage). After the warm retention age, it becomes cold (read-only, older data, typically moved to cheaper/slower storage). When cold retention expires the bucket is frozen — Splunk either deletes it or archives it to a path you specify (e.g. S3 via SmartStore). Only frozen data is inaccessible by default. The lifecycle drives storage tiering and is the answer to 'how does Splunk manage disk?'

Q: What is indexer clustering, and what does the replication factor do?

Model answer: Indexer clustering groups indexers under a cluster master (manager node) so that bucket copies are replicated across peers, providing data availability and high availability. The replication factor (RF) sets how many raw data copies exist (default 3 — data survives loss of RF-1 peers). The search factor (SF) sets how many searchable copies exist (default 2 — searches keep working if one peer is down). A search head cluster adds HA at the search tier with a deployer and captain role. The typical enterprise config: RF=3, SF=2, three or more indexer peers and a three-member search head cluster.

Figure 1 — Splunk data pipeline

Data flows from forwarders through indexers into time-series buckets, then search heads fan queries back to all indexers.

Figure 2 — Indexer bucket lifecycle

Events age through four bucket states — hot is actively written, warm is recent read-only, cold moves to cheaper storage, frozen is archived or deleted.

Name all three tiers and their job in one sentence

When asked about Splunk architecture, say: 'Forwarders collect data from sources, indexers parse and store it in time-series buckets, and search heads run queries by fanning out to all indexers and merging results.' That single sentence — plus knowing UF vs HF and hot/warm/cold/frozen — covers the architecture section of most Splunk interviews.

Quick check · Q1 of 10 · Remember

Which Splunk forwarder does NOT parse events before shipping them?

a) The universal forwarder — it ships raw data with minimal overheadb) The heavy forwarder — it runs the full parsing pipeline before forwardingc) The search head — it parses events at query timed) The cluster manager — it parses and replicates bucket metadata

Correct: a. The universal forwarder is a lightweight agent that ships raw compressed data to the indexer with minimal CPU and memory use. The heavy forwarder runs the full Splunk parsing pipeline and can filter, mask and route before forwarding. The search head and cluster manager have different roles.

👉 So far: Three tiers: forwarders (UF = lightweight shipper, HF = parsing + routing), indexers (parsing pipeline, hot/warm/cold/frozen buckets), search heads (distributed search fan-out). Indexer clustering: replication factor = raw copies, search factor = searchable copies. Cluster manager orchestrates bucket management.

② SPL & Data Models — search language, CIM and acceleration

Q: Explain how SPL works — what is the pipe model?

Model answer: SPL (Search Processing Language) is pipe-based: each command takes a result set, transforms it, and passes the output to the next command via |. A search always starts with a generating command (most commonly a keyword/field search against the index, written before the first pipe) that returns raw events. Then transforming commands aggregate: stats counts and aggregates (like SQL GROUP BY), timechart plots values over time, eval computes new fields, rex extracts fields with regex, lookup enriches with external data, where filters rows. The result flows left to right — each | step narrows or reshapes. A typical detection query: index=windows EventCode=4625 | stats count by user, src_ip | where count > 20 — counts failed logins per user and IP, then filters to those over 20.

Q: What is the Common Information Model (CIM), and why does Splunk ES depend on it?

Model answer: The CIM (Common Information Model) is Splunk's field-naming standard: it defines canonical field names for every data domain (e.g. src_ip, dest_ip, user, action in the Network Traffic data model). Different vendors use different raw field names — Cisco calls a source IP src, Palo Alto calls it srcaddr. CIM normalises them so an ES correlation search written against src_ip works across all sources. CIM is enforced through field aliases and calculated fields in add-ons (TAs), and Splunk Enterprise Security correlation searches are written against CIM data models — so if your data is not CIM-normalised, your ES detections do not fire.

Q: What is a data model and what is data model acceleration?

Model answer: A data model is a hierarchical schema that defines a structured view over raw events: it declares datasets (like Authentication, Network Traffic) and the fields each dataset expects. You search a data model with | datamodel Authentication All_Authentication search or with Pivot. Data model acceleration (also called TSIDX acceleration) pre-summarises the data model into time-series index (TSIDX) files so queries with tstats skip raw event reading entirely, returning results orders of magnitude faster. ES correlation searches almost always use | tstats count from datamodel=... for speed. The trade-off: acceleration uses extra disk and a background search job to keep the summaries current.

Figure 3 — UF vs Heavy Forwarder

Universal forwarders are lightweight shippers; heavy forwarders add pre-processing, filtering and routing at the cost of higher resource use.

📡

UF vs HF

tap to flip

Universal Forwarder: lightweight, ships raw data, minimal footprint — default for endpoints. Heavy Forwarder: full parsing pipeline, can filter/mask/route before indexing — use when you need pre-processing or compliance scrubbing.

🗂️

CIM

tap to flip

Common Information Model — Splunk's field-naming standard that normalises raw fields (src, srcaddr…) to canonical names (src_ip, dest_ip, user, action). ES correlation searches are written against CIM data models, so CIM compliance is mandatory.

⚠️

RBA

tap to flip

Risk-Based Alerting: each detection adds risk points to a risk object (user/system/IP). A Risk Notable fires only when the cumulative score crosses a threshold, cutting alert fatigue and surfacing genuinely risky behaviour.

🤖

SOAR playbook

tap to flip

A Python or visual workflow in Splunk SOAR (Phantom) triggered by a Notable Event. It orchestrates enrichment (VirusTotal, geo-IP), containment (block in proxy, quarantine in AD) and notification (Slack), reducing MTTR for routine incidents.

'My data is in Splunk so ES will work' — the CIM trap

Getting data into Splunk is not the same as getting it working with Enterprise Security. ES correlation searches are written against CIM data model field names (src_ip, user, action). If your source data has not been normalised via a Technology Add-on (TA) with field aliases, those fields do not exist in the data model and the correlation search fires on nothing. Always verify CIM normalisation with | datamodel Authentication All_Authentication search before assuming ES detections work.

Quick check · Q2 of 10 · Apply

You want to search network traffic across 30 days as fast as possible using an accelerated CIM data model. Which command should you use?

a) | stats count by src_ip from index=networkb) | tstats summariesonly=true count from datamodel=Network_Traffic.All_Traffic by All_Traffic.src_ipc) | timechart span=1d count by src_ipd) | rex field=_raw '(?P<src_ip>\d+\.\d+\.\d+\.\d+)'

Correct: b. tstats queries the pre-built TSIDX acceleration summaries of a data model without reading raw events, making it orders of magnitude faster for large time ranges. summariesonly=true restricts it to accelerated data. stats, timechart and rex all read raw events and would be far slower over 30 days.

👉 So far: SPL is pipe-based: generating command | transforming commands (stats, timechart, eval, rex, lookup). CIM normalises field names across sources so ES correlation searches work across vendors. Data model acceleration (TSIDX) enables tstats — the fastest way to query large time ranges.

③ ES, RBA & SOAR — Enterprise Security, risk scoring and automated response

Q: What is Splunk Enterprise Security (ES) and how does it generate Notable Events?

Model answer: Splunk Enterprise Security (ES) is a premium SIEM app built on top of Splunk. It provides pre-built correlation searches (scheduled SPL searches that run against CIM data models), a Notable Events framework (the analyst queue for investigating detections), Risk-Based Alerting (RBA), threat intelligence management, and asset and identity enrichment. When a correlation search fires it creates a Notable Event in the Incident Review dashboard — similar to a ticket. Analysts work Notable Events by reviewing the evidence, assigning status (New / In Progress / Resolved), and documenting the outcome. The gold line: ES = correlation searches on CIM data → Notable Events → analyst workflow.

Q: What is Risk-Based Alerting (RBA), and how does it differ from traditional alerting?

Model answer: Traditional alerting fires a Notable Event every time a single detection rule triggers — leading to alert fatigue when the same benign behaviour trips the same rule repeatedly. Risk-Based Alerting (RBA) changes the model: each detection rule adds risk points to a risk object (a user, system or other entity) rather than firing directly. A separate Risk Notable correlation search fires only when a risk object's cumulative risk score crosses a threshold — say 100 points — within a time window. The result is far fewer, higher-fidelity alerts. Interviewers like the one-liner: RBA aggregates weak signals per entity so only genuinely risky behaviour escalates.

Q: What is Splunk SOAR (Phantom) and how do playbooks work?

Model answer: Splunk SOAR (formerly Phantom) is a Security Orchestration, Automation and Response platform that connects to hundreds of security tools via apps and connectors. A playbook is an automated workflow — written in Python or via a visual editor — that executes a sequence of actions when triggered by an event (e.g. a Notable Event from ES). A typical phishing playbook: receive the alert → extract URLs and hashes → query VirusTotal → if malicious, block the URL in the proxy, quarantine the user in AD, and post a Slack notification → close the Notable Event. Playbooks can be fully automated or prompt an analyst for approval before a destructive action. The key concept: SOAR reduces mean time to respond (MTTR) by automating repetitive triage and containment steps.

Q: How does Splunk ES integrate with Splunk SOAR for end-to-end automation?

Model answer: The integration is bidirectional. ES → SOAR: a Notable Event can trigger a SOAR playbook automatically via the adaptive response framework (ES sends the event context to SOAR using the Splunk ES app for SOAR). The playbook enriches and responds — e.g. geo-IP lookup, endpoint isolation via EDR, block in firewall. SOAR → ES: SOAR playbooks can update the Notable Event status, add comments and close it once automated response is complete, giving analysts full audit trail in the ES Incident Review. The net result: detection in ES, automated triage and containment in SOAR, and a documented resolution back in ES — all without an analyst touching the keyboard for routine detections.

Figure 4 — Splunk ES ecosystem

Splunk ES sits at the centre connecting CIM data models, correlation searches, RBA risk scoring, SOAR playbooks and threat intelligence.

Figure 5 — RBA risk accumulation

Each detection adds points to an entity risk object; a Risk Notable fires only when the cumulative score crosses the threshold.

Test RBA risk contributions before going live

Before enabling RBA in production, verify that each detection rule is actually writing risk events to the risk_score index. Run index=risk to see recent risk events, check the risk_object and risk_score fields, and confirm they match your entity naming (user vs username matters). A misconfigured risk_object_type means every risk event lands on a different object and the score never accumulates — the most common RBA gotcha.

▶ Watch a Windows log become a Risk Notable — and find why detections go missing

Step through how a Windows failed-login event travels from forwarder to an ES Risk Notable. Press Play for the healthy path, then Break it to see the classic 'CIM not normalised' failure.

① ForwarderA universal forwarder on a Windows server ships a Security EventCode 4625 (failed login) event to the indexer.

▼

② IndexerThe indexer parses the event, extracts raw fields (EventCode, ComputerName, TargetUserName) and writes it to the windows_security index.

▼

③ CIM normalisationThe Windows TA maps raw fields to CIM names: TargetUserName becomes user, ComputerName becomes dest — making the event visible in the Authentication data model.

▼

④ ES correlationThe 'Excessive Failed Logins' correlation search runs tstats against the Authentication data model, detects the pattern, and adds risk points to the user risk object.

Press Play to step through a Windows failed-login event becoming an ES Risk Notable. Then press Break it.

Quick check · Q3 of 10 · Understand

How does Risk-Based Alerting reduce alert fatigue compared to traditional Splunk ES correlation searches?

a) It disables all detection rules so fewer alerts fireb) It fires a Notable Event on every single trigger but groups them by colourc) It accumulates risk scores per entity and fires a Risk Notable only when the cumulative score crosses a thresholdd) It runs SOAR playbooks automatically without any analyst involvement

Correct: c. RBA changes the alerting model: each detection adds risk points to a risk object (user, system, IP) rather than directly creating a Notable Event. A Risk Notable fires only when cumulative risk exceeds a threshold, surfacing genuinely risky behaviour rather than individual detections. This dramatically reduces noise.

👉 So far: Splunk ES: correlation searches on CIM data models → Notable Events → analyst workflow. RBA: each detection adds risk points to a risk object; Risk Notable fires only when cumulative score crosses threshold — reduces alert fatigue. SOAR (Phantom): Python playbooks automate enrichment and containment, feed status back to ES Notable Events.

④ Tuning & Scenarios — scheduler, tstats, summary indexing and real-world fixes

Q: What is summary indexing and when should you use it?

Model answer: Summary indexing is a technique where a scheduled search runs periodically, computes aggregated results (e.g. hourly counts, risk tallies), and writes those summaries as events to a dedicated summary index. A later search queries only the lightweight summary rather than replaying all raw events. Use summary indexing when: you have a very expensive scheduled search that needs to run frequently, you need long-retention aggregates beyond raw data retention, or your reporting queries span months of data. It predates data model acceleration and is still useful for custom aggregations that do not fit a CIM data model. Trade-off: adds complexity and another scheduled job to maintain.

Q: What is tstats, and why is it faster than a regular search?

Model answer: | tstats is a generating command that queries TSIDX (time-series index) files — the pre-built acceleration summaries of data models — without reading raw events. Because the TSIDX files contain only the indexed field metadata (not full event text), tstats returns results far faster than a stats search over raw data, especially over long time ranges. The typical ES correlation search pattern: | tstats summariesonly=true count from datamodel=Network_Traffic.All_Traffic where All_Traffic.action=blocked by All_Traffic.src_ip. The summariesonly=true flag means: only use the pre-built summaries, do not fall back to raw events. The trade-off is that tstats only knows fields that the data model acceleration has computed — you cannot tstats over arbitrary fields not in the accelerated model.

Q: How do you manage search scheduler concurrency — what causes searches to skip and how do you fix it?

Model answer: Splunk's search scheduler runs saved searches and correlations on a time-based queue. When more searches are due to run than there are available scheduler slots (controlled by max_searches_per_cpu and scheduler.max_searches_perc), the scheduler skips lower-priority searches, logging 'This search was skipped'. Fixes: (1) Stagger search schedules — spread correlation searches so they do not all fire at the :00 mark. (2) Increase indexer capacity so searches finish faster and slots free up. (3) Convert expensive raw-event searches to tstats so they complete in milliseconds. (4) Raise priority for critical detection searches. (5) Disable or tune infrequently used saved searches. The diagnostic: check the Search Job Inspector and the scheduler.log to find which searches are skipping and why.

Q: A Splunk analyst sees high lag on a correlation search that queries 30 days of Windows event logs. How do you approach the tuning?

Model answer: First, check whether the data is CIM-normalised and the data model is accelerated. If it is, rewrite the search using | tstats summariesonly=true against the correct data model — this alone often reduces runtime from minutes to seconds. If the search cannot use tstats (e.g. it queries a custom field not in the model), consider summary indexing: run a nightly aggregation job and query the summary for 30-day reporting. Also check the time range — searches that span months across unaccelerated indexes scan every raw event on every indexer; reducing to the last 7 days is often sufficient for detection. Check index= filters are specific (do not leave out the index= term — without it, Splunk scans all indexes). Finally, check if the search is accelerated at the report level (Splunk report acceleration). Use the Search Job Inspector to see which command consumes the most time.

Priya at SecureNova in Bengaluru faces this

SecureNova's Splunk ES has a correlation search 'Excessive Failed Logins' that runs every 5 minutes against 30 days of Windows Security logs. The search is skipping 80% of the time, and when it does run it takes 4 minutes — missing detections in the analyst queue.

Likely cause

The search runs a raw stats query over index=windows across 30 days without using data model acceleration. The Windows Authentication data model is enabled and accelerated but not used. The search also runs at exactly :00, :05, :10 — competing with a dozen other correlation searches for scheduler slots.

Diagnosis

In the Search Job Inspector the 'stats count by user' step accounts for 3.5 minutes — it is scanning 30 days of raw events. In scheduler.log the search shows 'search not executed: concurrency limit reached' at each :00 boundary.

Settings ▸ Searches, Reports and Alerts ▸ (find the search) ▸ Edit schedule / Search Job Inspector

Fix

Rewrite the search using tstats summariesonly=true against datamodel=Authentication.All_Authentication — reduces runtime from 4 minutes to under 5 seconds. Stagger the schedule to :01, :06, :11 (offset by 1 minute) to avoid competing with other searches. Reduce the lookback from 30 days to 24 hours for the real-time detection use case, and use summary indexing for 30-day trend reports.

Verify

Scheduler.log shows no skipped runs. Search Job Inspector shows runtime under 10 seconds. Notable Events reappear in the ES Incident Review queue within 5 minutes of events landing on the indexer.

Quick check · Q4 of 10 · Analyze

A Splunk correlation search that runs every 5 minutes is marked 'skipped' in the scheduler log. What is the most likely cause?

a) The search head is running out of TSIDX disk spaceb) The forwarder stopped sending data to the indexerc) More searches are due to run than available scheduler slots, so lower-priority searches are skippedd) The CIM data model has not been normalised for this source type

Correct: c. Splunk's search scheduler skips lower-priority searches when concurrent demand exceeds available slots (controlled by max_searches_per_cpu). Fix by staggering schedules, converting expensive searches to tstats, or increasing capacity. Disk space, forwarder issues and CIM normalisation do not cause scheduler skipping.

👉 So far: Summary indexing: pre-aggregate expensive searches into a summary index for fast reporting. tstats: queries TSIDX acceleration files without reading raw events — use summariesonly=true. Scheduler skipping: stagger schedules, convert searches to tstats, increase capacity. Job Inspector + scheduler.log for diagnosis.

🤖 Ask the AI Tutor

Tap any question — instant, scoped to this lesson. No login, no waiting.

Pre-curated from vendor docs + community Q&A, scoped to this lesson. For a live prod issue, paste your export into chat.techclick.in.

🧠 In your own words

Type one line: what is Risk-Based Alerting, and how does it differ from a traditional ES correlation search? Then compare with the expert version.

Expert version: A traditional Splunk ES correlation search fires a Notable Event every time a single detection triggers, which leads to alert fatigue in a busy environment. Risk-Based Alerting (RBA) changes the model: each detection rule writes risk points to a risk object (user, system or IP) in the risk_score index rather than creating a Notable directly. A dedicated Risk Notable correlation search fires only when the entity's cumulative risk score crosses a configured threshold within a time window — so many weak signals are absorbed silently and only entities showing a pattern of suspicious behaviour escalate to the analyst queue.

🗣 Teach a friend

Best way to lock it in — explain it in one line to a teammate. Tap to generate a paste-ready summary.

📩 Quiz me on this in 7 days. Opt in and we'll email 3 micro-questions on Splunk at Day 1, Day 7 and Day 30 — spaced repetition is how this sticks. Un-tick any time.

📖 Glossary

Universal Forwarder (UF): A lightweight Splunk agent that ships raw compressed data to an indexer with minimal CPU and memory overhead. The default choice for every endpoint and server.
Heavy Forwarder (HF): A full Splunk installation configured to forward. It runs the parsing pipeline, enabling filtering, PII masking, routing and enrichment before data reaches the indexer.
Hot / Warm / Cold / Frozen: The four bucket lifecycle states on a Splunk indexer. Hot is actively written; warm is recent read-only; cold is older, often on cheaper storage; frozen is archived or deleted.
CIM (Common Information Model): Splunk's field-naming standard that defines canonical field names (src_ip, dest_ip, user, action) across data domains. Technology Add-ons map vendor-specific field names to CIM via field aliases.
tstats: A Splunk generating command that queries pre-built TSIDX acceleration files of an accelerated data model, skipping raw event scanning for dramatically faster results over large time ranges.
Risk-Based Alerting (RBA): A Splunk ES alerting model where each detection adds risk points to a risk object (user, system, IP). A Risk Notable fires only when the cumulative score crosses a threshold, reducing alert fatigue.
Splunk SOAR (Phantom): Splunk's Security Orchestration, Automation and Response platform. Python or visual playbooks automate enrichment, containment and notification triggered by ES Notable Events.
Replication Factor / Search Factor: In an indexer cluster, the replication factor sets how many raw bucket copies exist; the search factor sets how many searchable copies exist. Default RF=3, SF=2.
Summary Index: A Splunk index populated by a scheduled search that pre-aggregates expensive queries. Later searches query the lightweight summary rather than re-scanning raw events — useful for long-retention reports.
Search Job Inspector: A Splunk UI tool that shows the execution timeline of a search job, breaking it down by command to identify which step consumes the most time — the first tool for search performance tuning.

📚 Sources

Splunk — Splunk Enterprise architecture: forwarders, indexers and search heads. docs.splunk.com/Documentation/Splunk/latest/Overview
Splunk — Universal forwarder vs heavy forwarder and the indexer parsing pipeline. docs.splunk.com/Documentation/Forwarder
Splunk — Splunk Enterprise Security: correlation searches, Notable Events and Risk-Based Alerting. docs.splunk.com/Documentation/ES
Splunk — Common Information Model (CIM) and Technology Add-ons for field normalisation. docs.splunk.com/Documentation/CIM
Splunk — Splunk SOAR (Phantom): playbook authoring, apps and adaptive response integration. docs.splunk.com/Documentation/SOAR
Splunk — tstats command, data model acceleration and search performance tuning. docs.splunk.com/Documentation/Splunk/latest/SearchReference/Tstats

What's next?

Done with the interview prep? Go deeper on Splunk — the full SPL search language, Enterprise Security correlation rule authoring, Risk-Based Alerting design, and SOAR playbook development.

Next · All interview lessons → Practice on exam.techclick.in →

Splunk Interview Questions — Architecture, SPL, ES & SOAR Answers

🎯 By the end you will be able to

Pick where you want to start

Architecture

SPL & Data Models

ES, RBA & SOAR

Tuning & Scenarios

① Architecture & Forwarders — the pipeline and deployment components

Q: Describe the Splunk architecture — what are the three tiers?

Q: Universal forwarder vs heavy forwarder — when do you use each?

Q: Explain the indexer bucket lifecycle — hot, warm, cold, frozen.

Q: What is indexer clustering, and what does the replication factor do?

② SPL & Data Models — search language, CIM and acceleration

Q: Explain how SPL works — what is the pipe model?

Q: What is the Common Information Model (CIM), and why does Splunk ES depend on it?

Q: What is a data model and what is data model acceleration?

③ ES, RBA & SOAR — Enterprise Security, risk scoring and automated response

Q: What is Splunk Enterprise Security (ES) and how does it generate Notable Events?

Q: What is Risk-Based Alerting (RBA), and how does it differ from traditional alerting?

Q: What is Splunk SOAR (Phantom) and how do playbooks work?

Q: How does Splunk ES integrate with Splunk SOAR for end-to-end automation?

▶ Watch a Windows log become a Risk Notable — and find why detections go missing

④ Tuning & Scenarios — scheduler, tstats, summary indexing and real-world fixes

Q: What is summary indexing and when should you use it?

Q: What is tstats, and why is it faster than a regular search?

Q: How do you manage search scheduler concurrency — what causes searches to skip and how do you fix it?

Q: A Splunk analyst sees high lag on a correlation search that queries 30 days of Windows event logs. How do you approach the tuning?

🤖 Ask the AI Tutor

📝 Wrap-up assessment — six more

🧠 In your own words

🗣 Teach a friend

📖 Glossary

📚 Sources

What's next?