What you'll learn
AI pen-testing agents outperformed nine of ten human professionals in a real enterprise environment at roughly a third of the hourly cost — and the gap is going to widen.
The n8n "Nightmare" CVE (CVSS 10) is a preview of how AI orchestration platforms become supply-chain choke points the moment they're widely adopted.
Moody's just told its institutional audience that AI cyber risk belongs on the balance sheet — and that's the lever that will move security spend, governance, and insurance pricing this year.
Description
This episode runs a tight news loop through three signals that together describe where the security market actually moves in 2026. The first is a new arXiv paper on AI agents versus human pen testers in a live enterprise environment. The second is the n8n "Nightmare" CVE — a CVSS 10 affecting roughly 100,000 servers globally on what's becoming the default orchestration layer for AI-powered business processes. The third is Moody's 2026 Outlook for cyber risk, which finally reads the way security leaders have been writing for a year.
The pen-testing study is the cleanest data point on the AI-vs-human curve we've seen so far. Artemis, the multi-agent framework, found nine valid vulnerabilities across an 8,000-host university network spanning 12 subnets, hit an 82% valid submission rate, and beat nine of the ten human participants. Cost per hour was roughly $18 for the agent team versus $60 for the humans. Stuart's read is that this is the most natural fit for agent deployment in security — clear goal, no fixed playbook, parallelizable enumeration. Conor's read is that the differentiation between top-tier and mid-tier pen testers is going to widen the same way it's about to widen across security as a whole.
The n8n CVE reframes the supply-chain conversation. The exploit lets an authenticated user take over the entire instance of an orchestration layer that, by design, has access to your data and resources. The remediation is straightforward — patch — but the structural lesson is that as AI orchestration platforms get adopted, they become the new control plane, and the time-to-exploit window is now days, not weeks. The Moody's report ties it together. When a ratings agency tells the institutional market that AI cyber risk needs governance, defense, and pricing attention, that's the lever that moves insurance premiums and board-level prioritization. The closing segment on Cloud Code one-shotting work that took a Google team a year — and a public CISO ripping out SailPoint to rebuild it with Claude — sets up the harder question of when "build, don't buy" becomes the default and when it becomes a tech-debt trap.
What we cover
"Artemis beats nine of ten" — the arXiv pen-testing study and what it means for the offensive-security career path
"$18 versus $60 an hour" — the unit-economics flip and the question of AI inflation as the model market matures
"the n8n Nightmare" — a CVSS 10 on an AI orchestration layer running on ~100,000 servers
"runtime defense as non-negotiable" — the patching window collapses to hours when attackers have AI patch-diffing
"Moody's reads cyber" — when a ratings agency starts pricing AI risk, that's the lever
"build vs. buy gets re-litigated" — Cloud Code one-shotting in an hour what took a Google team a year
"software cannibalism" — public CISO replaces SailPoint with a Claude build, and the tech-debt question that follows
"two things you don't roll yourself" — Conor's hard line on identity and crypto
Thank you to our Sponsors:
Hampton North is the premier US based cybersecurity search firm. Start building your security team with Hampton North
Sysdig is the leader in AI-powered real-time cloud defense; stop watching and start defending
The conversation
Agents beat nine of ten human pen testers — at a third of the cost
The new arXiv study put Artemis, a multi-agent pen-testing framework, head-to-head against ten human professionals in a live enterprise environment — an 8,000-host university network across twelve subnets. Artemis discovered nine valid vulnerabilities, hit an 82% valid submission rate, and finished second overall. It outperformed nine of the ten humans. Cost per hour: roughly $18 for the agent team versus $60 for human pen testers, holding hours-to-valid-vulnerability roughly constant.
The strengths and weaknesses pattern matches the broader AI capability curve. Agents excel at systematic enumeration and parallel exploitation. They struggle with GUI-driven attacks and produce slightly higher false-positive rates. Stuart's framing is the most useful one — pen testing is the cleanest fit for agentic deployment in security because the goal is unambiguous, the playbook is intentionally not fixed, and the work is naturally parallelizable. The same conditions that make a senior pen tester effective are conditions an agent can replicate at scale.
The longer-term question is unit economics. Today, $18/hour is a 70% discount on the human rate. AI inflation is real — the model APIs are running at a loss for now, and prices will rise. But the productivity gap is widening too, and the second-order effect Stuart called out is that the top-tier human pen testers — XBOW-style operators who beat the leaderboard at HackerOne, who can find vulnerabilities the model can't articulate why it should look for — are about to be in extreme demand. The middle tier is where the squeeze lands.
The n8n Nightmare and the new orchestration supply chain
The CVSS 10 vulnerability disclosed this week in n8n — patched, responsibly disclosed by Theo Lasue — is a useful preview of the supply chain risk that's about to define 2026. n8n is the orchestration layer that an increasing number of organizations use to connect AI systems to business processes. By design, the platform has authenticated access to data and resources. By design, it executes workflows that engage tools, skills, and downstream systems. When the exploit allows authenticated users to take over an entire instance, the blast radius is the entire AI-orchestrated half of the business.
The remediation is straightforward — patch. The structural lesson is that as AI orchestration platforms become the control plane for autonomous business workflows, they become exactly the kind of tightly coupled, high-privilege chokepoint that supply-chain attackers target. Two operating implications follow. First, runtime protection in production is non-negotiable when the exploit window collapses to hours and your team can't reliably push business changes that fast. Second, your cloud defense platform should surface this kind of CVE within minutes of disclosure with clear, actionable workflow guidance — that capability isn't a nice-to-have anymore.
Moody's just put AI cyber risk on the balance sheet
The Moody's 2026 Outlook is the document worth circulating to your board. The headline language is uncharacteristically blunt for a ratings agency. AI-powered cyber attacks are rising — more sophisticated phishing, deepfakes, adaptive malware, autonomous attacks. Cryptocurrency theft is escalating. Recent cloud outages signal the catastrophic potential of an exploit at scale. AI-driven defenses are essential but not a silver bullet. Autonomous AI introduces new risks that require strong governance. Regulatory harmonization remains a problem — the EU is advancing coordinated frameworks, the US and APAC are diverging, and that divergence creates exploitable gaps.
For a security leader trying to ground a "why now" conversation with the executive team, this is the artifact to use. When a ratings agency frames AI cyber risk as a balance-sheet item, the next-order effect is in the insurance market — carriers take their cues from rating agencies, and AI-specific liability riders are coming. If your cyber insurance renewal is in the next few months, expect those conversations to start, and expect your AI governance posture to influence your premium. ISO 42001 is the framework worth getting close to. Walter Haydock at StackAware is the practitioner worth listening to on the governance side.
Cloud Code, build-vs-buy, and the software cannibalism debate
The provocative segment of the episode came from a public post by a Google engineer claiming Cloud Code built in one hour what their team had spent over a year building. That story landed alongside a public CISO ripping out SailPoint as their identity platform and rebuilding equivalent functionality with Claude. The conversation that follows is the one every CISO and CTO will be having this year.
Stuart's framing is sharp on the unit economics — if the choice is paying a vendor $170,000 a year for software you used to pay $70,000 for, and Cloud Code can give you 80% of the functionality in a week, the budget conversation gets interesting fast. Conor's pushback is on the tech debt and support cost. The finish line for software isn't when the build is done — it's the years of maintenance, edge cases, regression, and user support that follow. Build-it-yourself moves the support cost from the vendor's P&L to your engineering team's calendar, and most organizations underestimate that by an order of magnitude.
The middle-ground answer is probably that SaaS providers themselves need to use this same toolkit to make their products more flexible — letting customers buy two of ten features at proportional pricing instead of getting bloated all-in pricing they can't justify. That's a healthier outcome than every CISO rolling their own identity stack. Conor's hard line stays: don't roll your own crypto and don't roll your own identity. Everything else is on the table.
Show notes
Guests — solo episode (Conor Sherman and Stuart Mitchell, hosts; no in-studio guest)
Books mentioned — none
Frameworks / models / tools named — Artemis (multi-agent pen-testing framework, arXiv paper); HackerOne; n8n (orchestration platform); CVSS 10 "Nightmare" vulnerability in n8n; Sysdig runtime cloud defense (referenced as Conor's employer); Cloud Code (Anthropic); SailPoint (referenced as the identity platform a CISO replaced); ISO 42001 (referenced); Moody's 2026 Outlook for cyber risk
Other people / shows / resources referenced — Damien (Sysdig, hosting Boston event with Nebulock); Mark Sutton, CISO at Bain Capital (Boston CISO panel); Theo Lasue (n8n vulnerability disclosure researcher); XBOW (referenced as the AI agent ranked at the top of HackerOne's leaderboard at one point); Jonna Dugan (Google engineer whose Cloud Code post sparked the build-vs-buy segment); Walter Haydock, CEO of StackAware (prior Zero Signal guest, governance recommendation); Daniel Miessler (referenced re: top-vs-mid talent stratification prediction); Keith Hoodlet (referenced as prior Zero Signal Black Hat guest); Clint Gibler (referenced as prior guest); Plot AI (always-on hardware-recording device discussed in the closing philosophy segment); Black Mirror "The Entire History of You" (referenced as analogy)
Hosted by Conor Sherman and Stuart Mitchell.