Why Your SIEM Architecture Needs to Change
Threat-Informed Defense Series
This series is all about building the Agentic SOC. We are building this from the ground up or for some taking a very close look at their existing SIEM and data architecture. We must have more data and the right data that the agentic SOC can leverage.
TL;DR
The architecture question for your SIEM isn’t which tier? — it’s what use case is each byte serving?
Every byte of security telemetry serves one or more of five use cases: Detection, Forensic, Hunting, Compliance, AI Agents.
Most SOCs collect data without naming the use case — which is how you end up with 60 days of firewall logs in the analytics tier that no rule ever touches.
A SIEM that treats every byte the same overspends on detection-grade storage for compliance data and underspends on hunting-depth retention for forensic data.
This week: inventory your top five highest-volume log sources and name the use cases each one actually serves. If you can’t, you’ve found your first thing to drop or re-tier.
We established in the pillar that there’s a gap between what attackers do and what most SOCs see. Here’s the second half of that problem: the architecture most SOCs run was built for a threat model that no longer exists.
I spent my early career running analytics-first SIEMs. They worked when they worked and broke quietly when they didn’t — every quarter we’d negotiate which sources got dropped from ingestion because the bill outpaced the budget. One of the lessons we learned the hard way was that every quarter, the threats that mattered crept further outside the 90-day window. Analytics-first SIEM defaults were designed for signature-based detection of malware payloads. That world is gone. Modern intrusions are credential theft chains, slow lateral movement, and living-off-the-land techniques that unfold across weeks and require behavioral context the analytics tier cannot economically provide. The Verizon Data Breach Investigations Report (DBIR) puts ransomware in 44 percent of breaches, with attacks running an average of 58 days before impact. Your 90-day window can technically see that timeline. Your budget to keep every relevant source in it cannot.
Here’s the thing: the fix is not “buy more SIEM.” It’s to stop treating the architecture conversation as a tier decision and start treating it as a use-case decision.
The Five Data Use Cases
Every byte of security telemetry you collect serves one or more of five use cases. Most SOCs collect data without naming the use case — which is how you end up with 60 days of firewall logs in the analytics tier that no rule ever touches and no analyst ever queries.
Detection — Real-time and near-real-time alerting on known patterns. Sub-five-minute latency for impossible travel, EDR alerts, suspicious mail flow, identity risk events. The job is to fire an alert and trigger a response while the attack is still in motion.
Forensic — Reconstructing what happened after the fact. Was this account compromised six months ago? What did the attacker access during dwell? Forensic work requires high-fidelity data preserved long enough to answer questions you didn’t know to ask when the data was collected.
Hunting — Proactively searching for things no rule fires on. Slow C2 beaconing, low-and-slow data staging, anomalous lateral movement. Hunting needs months of behavioral context and statistical analysis over volumes that would be cost-prohibitive in the analytics tier.
Compliance — Multi-year retention for SOX (7 years), HIPAA, PCI-DSS Req 10, and the regional regulations your legal team tracks. Data has to be searchable and demonstrably immutable, but query latency in minutes is fine. Cost per GB is the only thing that matters.
AI Agents — A use case that didn’t exist three years ago. Agents need structured access to security data through tools — not raw query strings — so they can investigate, triage, and synthesize across entities. The Sentinel MCP Server’s Entity Analyzer (GA as of April 2026) is the first production example.
Each use case has a different latency tolerance, retention requirement, and cost profile. A SIEM that treats every byte the same overspends on detection-grade storage for data that only serves compliance, and underspends on hunting-depth retention for data that should serve forensics.
Do This Next Week
Inventory your top five highest-volume log sources. For each one, name the use cases it serves — Detection, Forensic, Hunting, Compliance, AI Agents. Be honest: if a source is running 50+ GB/day into the analytics tier and the only use case it actually serves is forensic, you’ve found your first candidate to either drop or re-tier. If it serves three use cases but lives in only one tier, you’ve found something the architecture isn’t sized for.
That single exercise turns this article from a model into a worklist. From here, Article 05 — How Microsoft Sentinel Architecture Actually Works Now COMING SOON— covers the three-tier, four-surface platform that serves all five use cases from the same data, the new filter-and-split capability that makes dual-routing native, and the MITRE coverage math that justifies the model. That’s the next step after this one.
The Threat-Informed Defense Series — Where We’re Going
A complete blueprint for building the Agentic SOC. 11 articles across three arcs — from understanding the threat landscape to the architecture that detects it to the metrics that prove it works.
Arc 1 — Threat Landscape
Pillar — The Gap Between What Attackers Do and What SOCs See
Article 01 — 24 Attack Patterns Mapped to the Microsoft Detection Stack
Article 02 — Log Sources Your SOC Needs for Detection, Forensics, and Hunting
Arc 2 — SIEM Architecture and Data Lake
Article 05 — How Microsoft Sentinel Architecture Actually Works Now (coming soon)
Article 06 — Design Decisions for Detection Capability (coming soon)
Article 07 — Migrating to Unified SecOps and the Data Lake (coming soon)
Article 08 — Threat Hunting with KQL and the Data Lake (coming soon)
Article 09 — Graph Analytics: Seeing Attack Paths (coming soon)
Arc 3 — Measurement
Article 10 — SOC Metrics for the Agentic Era (coming soon)
Code and examples: github.com/mikepalitto/socautomators



