What you'll learn.
Generic AI models produce inconsistent security reasoning at scale — domain-specific models trained on security data are what close the gap between "interesting demo" and production trust.
The convergence of investigation, detection engineering, and threat hunting into a single AI-augmented system democratizes capabilities that were previously locked behind mature, well-funded programs.
Trust in an AI SOC is measurable: track your disagreement rate with the system over time, and if it's not declining, the system isn't learning your environment.
Description.
The SOC has an economics problem that predates AI by a decade. Four and a half thousand alerts per analyst per day. Three hours burned on triage before anyone touches actual investigation. Seventy-four percent of analysts accepting this as normal while sixty-seven percent look for the exit. We've heard this framing before. What we haven't heard — and what Seth Summersett and Jeff Johns from Embed Security brought to this conversation — is a precise argument for why the current wave of AI SOC tooling either works or doesn't, and what separates the two.
Seth and Jeff launched Embed in 2024 after careers that ran through NSA, FireEye, Mandiant, Google, and Meta. They've sat on both sides of the offense-defense divide, and they're building what they call a security control plane — an agentic system that investigates every alert autonomously, shows its chain of evidence, and tells you when it doesn't have enough data to make a call. That last part matters more than the automation. Most AI security products pitch speed. Embed pitches epistemic honesty — the system knows what it doesn't know, and that's when it pulls a human in.
The conversation went deeper than product. We got into whether generic foundation models can reason like a security analyst (they can't, reliably), what happens to MSSPs and MDRs when false positive rates drop ninety percent, and why the analyst role isn't disappearing — it's shifting from reactive triage to proactive hunting. The defenders just got handed a better set of tools. The question is whether security leaders will reorganize around them.What we cover.
What we cover.
"Investigation is really the bottleneck at this point" — why detection will scale to infinity but the human investigation layer is where programs break.
"Show our work everywhere" — Embed's chain-of-evidence principle and why getting the right answer the wrong way destroys trust.
"Can I really build something that gets the right answer at scale?" — the gap between a promising demo against a general model and a production system that performs reliably.
"This flips the script for MSSP and MDRs" — how ninety-percent false positive reduction reshapes the unit economics of managed security services.
"Domain-specific models are really important at this point" — Jeff's argument against plugging MCP servers into generic agents and hoping for the best.
"The analyst is validating stuff a lot more than doing the full investigation" — how the SOC analyst role evolves from builder to reviewer, mirroring what's happened to software developers.
"This is one of the best advantages we've been given as defenders" — Seth on why AI changes the offense-defense calculus for the first time in his career.
The Conversation.
The investigation bottleneck is the real problem — detection was never the constraint
Jeff Johns framed this early and it set the trajectory for the rest of the conversation. Detection is here to stay forever. That problem scales to infinity. The bottleneck isn't finding signals — it's doing something with them. Security teams have spent years onboarding every alert source they can into their SIEM so they can claim detection coverage, but coverage without investigation capacity is theater.
Seth reinforced this with what he's seeing in the field. The maturity gap across organizations is wider than most people assume. Large enterprises still struggling to get their SIEM working. Small shops that are dialed in beyond what you'd expect. The common thread isn't size or budget — it's whether the team can actually work through what their detections surface. And the honest answer, for most, is no. Not at the volume they're dealing with.
"Detection is here to stay forever. That is a problem that will continue to go on to infinity. And investigation is really the bottleneck at this point."
The frame that emerged: if you can automate investigation with enough transparency and reliability, you don't just save analyst hours. You shift entire teams from reactive triage into proactive work — threat hunting, detection engineering, the stuff that actually reduces organizational risk. These capabilities have historically been reserved for mature programs with headcount to spare. AI investigation collapses that barrier.
Trust isn't a feature — it's a measurement discipline
This was the sharpest part of the conversation. Jeff and Seth didn't pitch trust as a brand attribute. They described it as something you quantify.
Jeff laid out the progression. When a customer first deploys Embed, the instruction is: look at everything. Check our reasoning on the cases we mark benign, the ones we flag as malicious, and the ones where we say we're not sure. That last category — where the system explicitly says it doesn't have enough data to make a judgment — is a design choice most AI products don't make. Most systems either give you an answer or stay silent. Embed tells you when it's uncertain, and that's when it pulls a human in.
"It's not just the case that you believe something is bad. It might just be that you know that there are gaps in your knowledge and that that's the time to pull somebody in."
Seth extended this into a concrete metric: your disagreement rate with the system should decline over time. If you're constantly telling the system "no" on the same types of cases, it's not learning your environment. That's your signal that something is broken. The system should be absorbing tribal knowledge — which users log in from VPNs, which applications are approved, who's on vacation — and using it to refine its judgments.
"Just getting the right answer is not really enough. I want to see what it did and say, yeah, it did pull this context, it did ask this question, and based on this, I do agree it is bad. If it's not doing that, then it's hard to build that trust."
Stuart landed the point cleanly: if you don't trust the outcome, the whole value proposition collapses. You end up checking every piece of reasoning anyway, which is arguably more work than not having the system at all. Trust isn't a nice-to-have. It's the load-bearing wall.
Domain-specific models vs. the "just plug it in" fallacy
Jeff was direct on this one and it's worth hearing from someone who's been building agent systems for longer than most. Generic foundation models are not reasoning the way a security analyst reasons. You can get interesting results from a general model on a one-off query. You can throw a suspicious artifact at Claude or GPT and get something that looks right. But "looks right once" and "performs reliably at scale" are different problems separated by an enormous engineering gap.
"You can get the right answer, but if you can't get there the right way, then people aren't going to trust it."
Seth put it in practical terms. As a security researcher, he can throw something against a general model and get an answer that seems right once or twice. The question is whether you can build something that gives you the right answer consistently, with transparency, at scale. That's what differentiates a production system from a demo.
Jeff went further on the infrastructure side. It's not enough to take all the MCP servers out there, expose them to your agent, and let it figure out how to use them. That overwhelms the agents. You get unreliable results. The way you organize tools, the models you train on security-specific data, the evaluation frameworks you build around analyst reasoning — all of that is the actual product. The model is a component, not the solution.
Stuart brought the practitioner's version of this: he's tried to build "almost everything" with generic models and they get him ninety-seven, ninety-eight percent of the way there. It looks good, feels good, smells good. But it's not quite at the point where you can trust it to really let go. And when something breaks, most non-technical users don't know where to go fix it. That's when someone tells you the whole thing needs to be re-architected.
The MSSP and MDR reckoning
Seth didn't hedge on this: AI SOC tooling flips the script for managed security providers. Either they lower costs or they offer more services. Providers that don't move in this direction get left behind.
The math is straightforward. People are the constraining cost in managed security. If you can reduce false positives by ninety percent and have one analyst working across multiple customer environments on the ten percent that matters, the unit economics of the service change entirely. An analyst who was drowning in triage across one environment can now do meaningful, proactive work across several.
This cuts two ways. The optimistic version: costs come down for buyers, or the same spend buys dramatically more capability. Mid-market security teams that could never afford threat hunting or detection engineering get access to those capabilities through their MSSP. The cynical version, which Stuart raised: corporate America will try to capture the margin improvement first. Some providers will keep prices steady and pocket the efficiency gains. Stuart's bet is they get found out — buyers will eventually understand the unit economics and the market will correct.
Jeff agreed on the trajectory. The pressure to drop prices will be real, and it'll be the first time the industry sees people putting their money where their mouth is on AI-driven efficiency. That's a good thing for security buyers.
The analyst role isn't dying — it's being recast
Seth made an analogy that stuck: the shift for security analysts mirrors what's happened to software developers. Developers are reading a lot more code now than they're writing. Analysts are going to be validating a lot more than they're investigating from scratch.
This isn't the "no more analysts" camp. Seth and Jeff are clear that human oversight is still necessary given today's technology. But the nature of the work changes. Embed works with the University of Montana, where student analysts use the platform to upskill rapidly because they can see how the system investigated a case, what it considered, why it reached its conclusion. The chain of evidence becomes a teaching tool, not just an audit trail.
The skills that matter going forward: understanding data, understanding context, and — as Stuart put it — curiosity. The best analysts have always been the ones who push past the textbook answer and ask what happens if I push this button, and what happens if I don't. Systems will change. The instinct to understand why won't.
"I don't think they need some newfangled detection device. The actual avenues to get in haven't changed all that much. It's that attackers can move much faster now with less knowledge."
The threat landscape isn't introducing novel attack vectors. It's accelerating the existing ones. Defenders who have strong fundamentals and augment with AI are in the best position they've been in. The job gets harder — but for the first time, the tools are keeping pace.
Guest links
Frameworks and concepts
Security control plane (Embed's term for unified investigation, detection engineering, and threat hunting)
Chain of evidence / "show our work" — Embed's transparency principle for AI-driven investigation
Human on the loop vs. human in the loop — distinction between oversight models for autonomous alert processing
Disagreement rate as a trust metric — tracking how often analysts override AI judgments over time
Tribal knowledge capture — encoding analyst context (approved apps, user behavior norms) into AI reasoning.
Hosted by Conor Sherman and Stuart Mitchell.