The Design Decisions That Will Define Your Detection Capability
Agentic SOC Series
The series kickoff made the transition clear: threat-informed defense is the foundation, but the Agentic SOC is the operating model.
That does not mean we skip the hard security engineering work. It means the hard security engineering work matters more.
Agents do not fix bad telemetry.
If the right logs are missing, too expensive to keep, retained for the wrong window, or trapped in the wrong tier, an agent will not reason its way into better detection. It will produce a confident answer faster from incomplete evidence.
Cloudflare’s Project Glasswing write-up is the forcing function. Their team used Anthropic’s Mythos Preview in a scoped vulnerability-research harness across 50+ repositories, then made the dual-use point plainly: “the same capabilities that helped us find bugs in our own code will, in the wrong hands, accelerate the attack side against every application on the Internet.”
That was offensive research, not a SOC case study. But the pattern transfers. Defenders need data architecture that can keep up.
Let’s walk through the decisions that matter.
1. What belongs in the detection engine?
Some data needs to live where detection and response happen.
In Microsoft Security architecture, that means the analytics tier plus Defender XDR: real-time analytics, incident correlation, alerting, and response.
Put high-signal sources here: sign-ins, audit events, Defender XDR incidents, endpoint alerts, identity alerts, privileged access activity, and exfiltration events.
The test is simple: if the SOC needs to act on it in minutes, do not bury it in a tier designed for long-retention analysis.
2. What belongs in the intelligence platform?
Some data is more valuable for history than speed.
Raw firewall sessions, DNS streams, proxy logs, network flows, endpoint history, and compliance evidence often belong in the data lake: too expensive to keep hot forever, but too valuable to throw away.
This matters because agents need history. A Triage agent can summarize hot alerts. An Investigator and Validator need deeper evidence.
If all you keep is alert data, your agents inherit the same blind spots your analysts already had.
3. Which sources need both?
Dual-ingest is not the default. It is a deliberate choice for sources that serve two jobs: operational detection and historical intelligence.
Start with identity. Entra sign-in data belongs hot for risky sign-ins, impossible travel, password spray, and token abuse. It also belongs in the lake because identity history powers blast-radius and lateral movement analysis.
Other candidates include cloud app activity, endpoint telemetry, DNS, and firewall data. If you cannot explain both paths, it probably should not be dual-ingested.
4. How long do you keep it?
Defaults are not strategy. Defaults are what you get before architecture shows up.
Retention should follow attacker behavior. Initial access and credential abuse need speed. Beaconing, lateral movement, and slow exfiltration need history.
The better question is not “how long should we keep logs?”
The better question is “how long does this tactic take to prove?”
5. What gets normalized?
ASIM helps with source-agnostic detection and hunting. Normalize the pivots: time, user, device, IP address, action, result, and source.
Do not normalize away forensic detail. Keep the raw fields where they matter.
Normalized pivots give agents a consistent way to move across sources. If every source names the same concept differently, the agent wastes time rediscovering your schema.
6. What comes back from the lake?
The data lake should not be a dead end.
If lake-side hunting or graph analysis produces a repeatable signal that should trigger response, promote it back into the detection engine. If it is informational, keep it in the lake.
This is the feedback loop the Agentic SOC needs: deep analysis creates better signals.
One caveat: Sentinel MCP and custom graph experiences are useful architecture directions, but do not make preview-sensitive patterns baseline requirements until readiness is verified.
7. Who owns the decision?
Without ownership, routing decisions drift.
A source gets added during an incident and never gets moved. Volume changes. Costs increase. No one notices until the next bill or missed detection.
This does not need a committee. One person with a source-to-tier spreadsheet is better than a governance board that never meets.
Track the source, tier, retention, volume, threat coverage, owner, review date, and rationale. If the answer is “because the person who built it knew why,” you do not have architecture. You have archaeology.
Executive Summary for Security Leadership
Agents cannot compensate for missing telemetry, poor routing, or short retention.
Treat every routing decision as a detection capability decision.
Require a source-to-tier map with retention, ownership, review date, and threat coverage.
What is next
Article 02 covers the Unified SecOps migration and the data lake: how to operationalize these decisions without treating a portal move as the project.
Your next step: pick one high-volume source in analytics this week. Write down what it covers, what would break if you moved it, and who owns it.
The Agentic SOC is only as good as the evidence it can reason over. Build the substrate first.
This is Article 01 of “The Agentic SOC on Microsoft Sentinel” series on socautomators.substack.com.




Great post! Garbage in - always gets garbage out!