What you'll learn
AI pen testing has already crossed the bar of finding non-trivial vulnerabilities at junior-tester level — the open question is consistency, not capability.
The shift-down framing — building security primitives into the languages, frameworks, and IDE rule files developers already use — is the strategic answer that scales when shift-left has hit its ceiling.
Vibe coding combined with rule-file-based security context could plausibly produce more secure code than traditional development — eliminating whole classes of OWASP top-10 vulnerabilities at the generation step.
Description
Clint Gibler is the head of security research at Semgrep, the creator of TLDR Sec, and the host of the Modern Security Podcast. This conversation works through where AI pen testing actually is in capability terms, what shift-down means for AppSec strategy, and the supply-chain attack surface now showing up across MCP servers, model artifacts, and typo-squatted libraries.
Clint's read on AI pen testing is calibrated rather than hyped. AI-based pen testing systems can already find non-trivial bugs at the level of an average junior pen tester — XBOW, Runsybl, and others have published proof points. The unanswered question is consistency. Capability is here; reliability is the next mountain. The career-pipeline implication is sharp — the junior on-ramp is closing, and the response can't just be more AI replacing more juniors. The way forward is augmented juniors with always-on senior mentorship via the model.
The middle of the episode pivots to the shift-down framing Clint developed in his recent conversation with Phil Venables. The shift-left era has hit a ceiling. The next move is to build secure-by-default primitives into the languages, libraries, and frameworks engineers already use. Combined with vibe coding and rule-file-based context (Cursor's rules, Claude Code's CONTEXT, equivalent files in every agentic IDE), the OWASP top 10 could meaningfully shrink in the next two-to-three years for the first time in a decade.
What we cover
"AI pen testers find real bugs today" — what XBOW and Runsybl have already proven, and what's still open
"the junior on-ramp problem" — and the AI-as-always-on-mentor counter
"agentic chains and the loss of items between steps" — why scaffolding matters as much as the model
"shift down instead of shift left" — Phil Venables's framing and why secure defaults beat AppSec gating
"vibe coding meets the rule file" — security awareness training the LLM reads on every prompt
"the long-tail OWASP top 10 reduction" — Clint's two-to-three-year prediction
"supply chain risk in the AI stack" — pickle files, typo-squatted packages, MCP server surface
"fuzzing as the perfect LLM use case" — and the broader pattern of "domains where you can validate the result"
Thank you to our Sponsors:
Hampton North is the premier US based cybersecurity search firm. Start building your security team with Hampton North
Sysdig is the leader in AI-powered real-time cloud defense; stop watching and start defending
The conversation
AI pen testing — capable today, consistency tomorrow
Clint's framing on AI pen testing is the most calibrated take on the topic the show has aired this season. The capability question is settled. AI-based penetration testing systems — XBOW, Runsybl, and the cohort of similar projects — have published write-ups demonstrating non-trivial vulnerability discovery in real applications, at the level of bugs an average junior pen tester would find. The unsettled question is consistency. Pointing the same system at the same application 20 times and asking how often it surfaces the same bug is the operating-quality question that the industry hasn't fully answered yet, because LLMs are non-deterministic and probabilistic by nature.
The implication for the talent pipeline is the harder problem. The junior security on-ramp — the SOC analyst seat, the entry-level pen tester role — is the one most directly being eaten by AI tooling that does the same job for less money. The honest response is the same one several Zero Signal guests have converged on: the people who win in this market are the ambitious ones who treat the AI as an always-on senior mentor and use it to compress their own learning curve. The middle-of-the-bell-curve juniors who don't will struggle.
Scaffolding matters more than the model
The most useful technical framing in the episode was Clint's distinction between model capability and system scaffolding. Most production agentic systems aren't pure model output — they're a mix of deterministic orchestration code calling the model at specific decision points, with structured prompts, tool-use boundaries, and consistency checks built around the LLM. The example he walked through — a webinar he ran with Scott Behrens at Netflix where they built an orchestration that threat-models, scans, triages, and reports on a repository — illustrated the failure mode. The scan step finds 13 vulnerabilities. The reporting step often only writes up 11 of them. Items get lost between steps as agentic chains get longer.
The fix isn't a better model. It's better scaffolding — deterministic code that holds the list of things to do, validates state between steps, and ensures consistency across the pipeline. Clint's estimate is that good scaffolding gets you 2-5x, sometimes 10x better performance than the same model with naive orchestration. The CISO implication is direct. When evaluating an AI security vendor, the right question isn't "what model do you use" — it's "show me your scaffolding architecture."
AI is one tool, but it's not the solution in itself.
Shift down — secure defaults as the next strategic move for AppSec
The strongest conceptual contribution of the episode was the shift-down framing Clint developed in his recent Modern Security Podcast conversation with Phil Venables. The shift-left era moved security checks from production-time PDFs back into pull requests, and then further back into the IDE. That made the feedback loop tighter, but it still puts the burden on developers to understand security and act on it. The shift-down move inverts that. Build security primitives into the programming languages, frameworks, and libraries developers already use, so the secure path is the default path. Output encoding handled by the framework eliminates whole classes of XSS. Parameterized queries by default eliminate SQL injection. Safe XML parsing by default eliminates XXE.
The historical examples Clint cited — Google, Netflix, DocuSign, Meta — have all built internal security engineering teams whose job is to produce these secure-by-default primitives. The hard part has been that this is genuinely difficult work that requires a dedicated security engineering function most organizations don't have. AI changes that calculus. With the right rule-file context inside Cursor, Claude Code, or any of the agentic IDEs, the LLM can be coached into generating code that follows the company's secure-by-default patterns by reference, without each developer needing to internalize them first.
There's the potential with the right engineering for vibe coding to potentially be more secure than just normally writing code
The OWASP top 10 has been substantively the same for 15 years. Clint's prediction is that the generic-class-of-vulnerability portion of that list could meaningfully shrink as vibe coding meets rule-file-based security context.
The rule file is the new security awareness training
The operational practice Clint surfaced is the one every CISO should adopt this week. Most companies have an internal wiki or onboarding training that describes how the team writes code. Nobody reads it consistently. The new pattern is to express the same content as a markdown rule file (Cursor's rules, Claude Code's CONTEXT, equivalent files in every agentic IDE) that the model loads on every prompt. The security guardrails that used to live as documentation nobody read are now context the LLM is constrained by every time it generates code. XSS patterns to avoid. Auth flows to use. Library categories that hallucinate (and so should be vetted). Internal frameworks that should be preferred over building from scratch.
There's now a fast-growing public ecosystem of these security rule files, and the early-adopter security teams are starting to publish their own internal versions. Generation-time guardrails get applied 100% of the time. Slide-deck security awareness training does not.
Supply chain risks in the AI stack
The closing technical segment walked through the supply-chain attack surface that's now active across the AI stack. Pickle files in Hugging Face — model artifacts that look benign on static scan because the malicious code only deserializes and executes at runtime. The defender's response is private model registries (the Artifactory equivalent for models, which Clint noted should exist if it doesn't already), runtime defense around the LLM serving environment, and treating any third-party model as untrusted code until proven otherwise. Vibe-coded package recommendations that don't exist — typo-squatting on hallucinated packages, surfaced by Socket and others. MCP server compromise — which Trail of Bits has built specific tooling against, including the context-protector proxy that pins tool descriptions and re-validates on change.
Fuzzing and the broader pattern of "domains where you can validate the result"
The closing spicy take Clint offered is the one to take into product and engineering planning. The best LLM use cases are domains where the result can be programmatically validated without a human in the loop. Fuzzing is the canonical example — an LLM can generate thousands of harnesses, most of which might be garbage, but the program either crashes or it doesn't. The validation is automatic. Google Project Zero and DeepMind have demonstrated this at scale. Any domain with that shape — automated test generation, security scanner rule writing, configuration validation, compliance-as-code — is where the highest-leverage AI integration happens.
Show notes
Guests — Clint Gibler, Head of Security Research at Semgrep; creator of TLDR Sec newsletter; host of the Modern Security Podcast
Books mentioned — Building Secure and Reliable Systems (Google) — referenced as part of the historical shift-down lineage
Frameworks / models / tools named — TLDR Sec (Clint's newsletter); Modern Security Podcast (Clint's podcast); Semgrep; XBOW (AI pen testing system); Runsybl (AI pen testing system); shift-down framing (Phil Venables conversation on Modern Security Podcast); secure defaults / secure-by-default primitives; Cursor rule files; Claude Code context files; OWASP top 10; CaMeL framework (Google DeepMind, dual-agent prompt-injection defense); the dual-agent model (Simon Willison); Trail of Bits MCP Context Protector (tool-description pinning and re-validation); Socket (typo-squatted package detection); compliance as code
Other people / shows / resources referenced — Phil Venables, former CISO of Google Cloud and Goldman Sachs; Scott Behrens, principal security engineer at Netflix; John Steven; Daniel Miessler; James Chiappetta at Blackstone; Sam Altman; Simon Willison; Rami McCarthy (Clint's collaborator on the prompt-injection defenses repo)
Hosted by Conor Sherman and Stuart Mitchell.