Data Onboarding & Parsing
props.conf, transforms.conf, sourcetypes, time extraction, field extraction
Learning Objectives
- ›Onboard a new data source end-to-end
- ›Configure props.conf for line-breaking, timestamps, sourcetypes
- ›Use transforms.conf for routing, masking, index-time fields
- ›Validate parsing with btool and the data preview UI
Module 1 — Sourcetypes — The First Decision
Sourcetype tells Splunk how to parse a feed. Use Splunk-supplied where possible (e.g. cisco:asa, linux_secure, ms:o365:management).
Custom sourcetypes go in $SPLUNK_HOME/etc/apps/<your-app>/local/props.conf.
Module 2 — props.conf — Parsing Bible
TIME_PREFIX, TIME_FORMAT, MAX_TIMESTAMP_LOOKAHEAD — get timestamps right or every dashboard lies.
LINE_BREAKER, SHOULD_LINEMERGE = false — multi-line vs single-line.
EXTRACT-* / REPORT-* — search-time field extractions.
Module 3 — transforms.conf — Routing & Masking
TRANSFORMS-* = index-time. SEDCMD = simple regex masking (CC numbers, secrets).
Routing: send specific events to a different index or drop them entirely.
Module 4 — Validation Tools
Data Preview UI (Settings → Add Data) for interactive testing.
btool: $SPLUNK_HOME/bin/splunk btool props list <sourcetype> --debug to see what config wins.
SPL Queries
index=* sourcetype=too_small earliest=-1h | stats count by host, source
Lab 9 — Onboard a New Source
- Imagine Cisco ASA firewall logs are arriving with sourcetype=too_small.
- Write props.conf: set sourcetype, TIME_FORMAT, LINE_BREAKER.
- Add transforms.conf to mask credit-card numbers (SEDCMD).
- Validate with btool and a sample search.
Key Takeaways
- ✓Get parsing right at onboarding — fixing it later is 10x harder
- ✓props/transforms is where 80% of admin time lives
- ✓btool is your debugging superpower