What should I log in my data lake?
We’ve been asked a few times to update our What should I log in my SIEM? post. Since the Sentinel data lake is now available it seems like the perfect time to do so. You might be thinking “Ugh. This is even more confusing. Not only do I have to figure out which logs to ingest but I have to decide where to put them.” We’re here to make that a little easier for you.
Use your SIEM for:
Real-time detection and correlation: Immediate alerting on critical events (endpoints, identity, cloud security, perimeter).
Rapid investigation: Live searches for active incidents and threat responses.
High-fidelity, actionable logs: Focus on sources with direct security value (EDR signals, privileged access, authentication, threat alerts).
Use your data lake for:
High-volume, lower-priority logs: Sources that are valuable for deep forensics or periodic hunts but costly to keep “hot” in SIEM.
Historical analytics: Cross-log searching, long-term trend analysis, and retroactive threat hunting.
Batch analytics and summarization: Use Spark, SQL, or similar tools to enrich, correlate, or summarize data before forwarding only the high-risk signals to SIEM for active monitoring.
What to log?
Find a downloadable version of this chart at GitHub.