Why Agent Trust Must Be Computed, Not Granted

EPISODE #39 Sean Catlett Observability Is Not Legibility for Agents

```

What you'll learn

Why "computed trust" means an agent earns its autonomy from runtime evidence — a known identity, an explicit delegation, an observed intent, and a risk score it can act on — instead of getting approval gates bolted on after it has already broken something.
The difference between a "workforce" and a "workflow" model of AI, and why almost every governance tool on the market today only works on the bounded workflow half while the unbounded, assistive workforce half is where the real exposure is growing.
Why seeing an agent's activity is not the same as understanding it — and the missing runtime layer that would let an agent weigh second- and third-order effects, and the trust of the systems and people it's about to touch, before it acts.

Description

Sean Catlett has spent the last decade in the seats where security stops being theoretical. He was Reddit's founding CISO, the Chief Security Officer at Slack, and the leader who ran security and trust and safety as one organization at Bumble before most companies understood why those two functions belong in the same room. He is now the co-founder of Polymodal, an early-stage company — just past its pre-seed — building novel ways to observe and interact with AI agents. That arc, from control-position CISO to founder working on the trust primitives underneath agents, is exactly why this conversation lands where it does: on what it actually takes to lead an AI transformation instead of getting layered out of it.

The through-line is a claim Sean has strong conviction on: trust has to be computed, not granted. As organizations rush to make their business units agentic, there's a clear vision at the start and a measurable outcome at the end, but a gap in the middle where nobody can say why an agent should be trusted to act. Sean walks that gap concretely — the CISO's shifting mandate and shrinking margin for the "office of no," judgment as the one thing a leader can never delegate, the workforce-versus-workflow split that decides whether your governance holds, and the runtime sensing layer that could let agents earn autonomy through evidence. It's built for security and trust leaders who are being asked to enable AI adoption without opening the floodgates, and who would rather stay slightly ahead of what's coming than have it happen to them.

What we cover

"observability is not legibility" — the core idea that shipping logs and watching sessions tells you an agent did something, not what it means or whether it was safe.
"we are paid for our judgment" — why judgment is the one function a security leader can't outsource to a model, and what that implies for how teams are built.
"workforce instead of workflow" — the distinction that separates bounded, automatable AI from the persistent, assistive, non-deterministic kind, and why it matters for governance.
"a chat bot with extra steps" — what you actually have when you deploy an agent without knowing its identity, its delegation, or its intent.
"water flowing downhill" — why the reflexive security move of blocking an agent fails, and what a lack of observation hides from you when you think you've stopped it.
"here is root on a machine, have fun" — the horror of handing agents high-execution environments, and how they route around guardrails through subsystems you never thought to lock down.
"escape velocity" — why the companies that reach it with this technology win, and why the security team that enables that pace is the one that grows.
"the future is here, it's just not evenly distributed" — where this all lands for the CISO role, and why the leaders who keep building will pull away from the ones who don't.

Thank you to our Sponsors:

Hampton North is the premier US based cybersecurity search firm. Start building your security team with Hampton North.

Sysdig is the leader in AI-powered real-time cloud defense; stop watching and start defending.

The conversation

The CISO gets in the room or gets layered out

Sean has lived through enough platform shifts — cloud, mobile, the whole arc back to the dot-com era — to recognize the pattern in this one. Each time, the organization turns to the people in control positions and asks the same question: how are we going to make this safe? That moment is an opening, and it's a fork. A security leader can embrace the technology and become the person who figures out how to drive it, or they can construct walls, dictate the technology choices, and become the obstacle a top-down AI initiative has to route around. He points to the trend of CISO roles being converted into AI-enablement jobs, or CISOs being layered by a board-led or CEO-led mandate, as the visible edge of that fork.

What decides which side you land on isn't tenure. It's adaptability — and, uncomfortably, your batting average. A leader who has been right for years can still be lethal to a business if the way they're right is by saying no to capability the business desperately needs to scale.

❝

"you are correct, but like two percent of the time, I think that that's ultimately not a not a great place to be."

— Sean Catlett

The flip side is just as real: opening the floodgates exposes private company information and creates unsecured data through the hidden ways AI gets trained and used. Sean's answer is neither wall nor floodgate. It's adaptation — and the thing that makes this transformation different from the ones before it is that the technology itself can help you adapt. Embraced by a security or trust leader, it can move a program at or slightly ahead of the business. All it takes is being slightly correct, slightly ahead, instead of letting it happen to you.

Judgment is the one thing you can't delegate

The deepest thread in the conversation is what a security leader is actually paid for. Sean is unambiguous about it, and he says it to his own teams constantly.

❝

"We don't get to outsource that. We don't get to push that off into some other place."

— Sean Catlett

Judgment — parsing information to make good decisions, quickly, the way you would in incident response or a security operations center — is the function that survives automation. Conor sharpens it with a litmus test from the insurance world: professional-liability coverage backs a human's judgment call, but if an agent makes the decision on your behalf, that's not covered. Delegate your thinking to something else, and you delegate away your protection. Sean agrees it's true today, while noting he watched startups at London Tech Week pitching agent-error insurance — and asking the right question back: where are the actual loss tables, or is it just a hallucinated mess?

Crucially, the goal isn't to avoid error, because Sean doesn't think that's possible. It's to reduce the blast radius of error and eliminate it fast — a discipline security has practiced for years. Hook an agent to your production data stores with no training and let it build, and you're in for a tough time. Wall it off behind massive gates, and no one gets the capability. The path between is iterative: open certain things up, evaluate well, expand the way the industry already learned to with vulnerability management. His number-one piece of advice for leaders trying to build the judgment to do this — go try to build something yourself, and you'll gain both the skill and the empathy for the people in your business already doing it.

Workforce, not workflow

The distinction Sean keeps returning to is the one he thinks most people are missing. Most of the dollars spent on AI today go into workflow — applying intelligence to a bounded process to scale the business, with reasonable guardrails, doing a defined thing within a domain. That's real, and those roles are ripe for disruption. But it's not where the hard problem lives.

The harder, more future-facing category is workforce: long-running, persistent, assistive agents that you make your own, that operate with a non-deterministic shell, that surprise you or remind you of things you forgot — that start to feel less like a script and more like a colleague.

❝

"I think you will have this era of what I would consider workforce, which is helping you ideate, helping you create new capabilities and helping you shape maybe your team or your team structure and be more adaptive."

— Sean Catlett

The reason the distinction matters for a security leader is that the two demand different governance, and the tooling built for one doesn't cover the other. You can put a box around a workflow. A workforce agent, by design, doesn't stay in the box. Sean sees this producing smaller teams — more of them — instead of large teams where many people do the same disruptable work, and he's careful to reframe the industry's fixation on speed: for the roles security actually does, accuracy and the ability to take more inputs and make a better decision beat raw scale.

Computing trust: identity, delegation, intent

Ask Sean what an agent even is and he smirks, because the definition changes daily — a collection of sessions and skills applied to a workspace, a definition written in a Markdown file, a workflow with one non-deterministic step. The industry hasn't landed on it. But the primitives for trusting one are old and familiar: identity, delegation, the things security has worked with for a very long time.

❝

"the elements of what is this identity and what have I delegated to it? And then ultimately what's its intent and where is it trying to go?"

— Sean Catlett

The move he advocates is to treat this like workforce instead of workflow — to understand an agent the way you'd understand a team member: its skills, its abilities, the data it can reach, what you granted it. It starts to look a lot like access management, one of the genuinely hard problems the industry has never fully solved. The difference in the agent's favor is that these things can be described in code, hosted from known locations, and observed at initiation. And for now, there's no expectation of agent privacy — you can demand full inspection and transparency, which is a gift, at least until the agent becomes your digital twin and you start wanting the same guarantees for it that you'd want for yourself.

Without those primitives — identity, delegation, intent, boundaries — Sean's verdict is blunt: what you have is "just like a chat bot with extra steps." Stuart adds the uncomfortable mirror: we're actually far worse than we think at defining limitations and expectations for our human teams, and with agents the same undefined-scope problem plays out at a speed and power level where the damage is much larger before anyone catches it.

Observability is not legibility

This is the line that names the episode, and it's the crux of the trust problem. Early intrusion detection and EDR had the same failure mode: seeing an event is not knowing what it means.

❝

"just because I see something does not know mean I know what it means."

— Sean Catlett

Shipping logs and capturing sessions is a good and necessary first step — Sean is explicit that you need it. But without knowledge of how LLMs work, what providers and harnesses do differently, what an API return or even a "turn" is, you'll struggle to interpret what you're looking at. Apply only legacy knowledge to the space and you reach for the reflex that always feels good.

❝

"That feels good. We're gonna stop it. We're gonna say that can't do that."

— Sean Catlett

The problem is that blocking an agent is like water flowing downhill. If you think you've stopped it, you may just not have the observation capability to see where it went. Sean gives the concrete version: an agent on a Windows machine reaching the Windows Subsystem for Linux on the other side, or creating Docker containers and granting itself capability inside them, operating in parts of the machine you never thought to lock down. Conor names his own version of the horror — a CTO who wanted to run the Claude Chrome plugin, and the instinct to find a safer solution "for the love of God" before handing an agent that much reach.

The missing layer: teaching agents to compute their own risk

If blocking doesn't work and observing isn't enough, what does computed trust actually look like at runtime? This is where Sean gets specific about the layer he thinks is missing. Agents don't fail because they can't reason; they fail because they lack context and any reward-punishment signal about consequences.

❝

"can we give it more sensing capability where at runtime we're actually educating it about some of the things it's going to do?"

— Sean Catlett

Part of it is already emerging in the models themselves — newer systems are being trained around irreversibility, asking whether an action can be undone and how much effort recovery would take, and interestingly extending that to information disclosure, not just destructive CRUD operations. But training alone isn't comprehensive and gives no sense of second- and third-order effects. So the rest is a bit of business graph, a bit of loading the right runtime context — not your full ISO 27001 compliance stack, but the specific policy facts the agent needs to avoid a given error — and a bit of LLM-as-judge, providing a nudge or a grade at the pace the agent operates.

❝

"it actually is pretty good at consuming things like a score or some sort of a number that grades it or asking it for a grade."

— Sean Catlett

Conor's reaction captures why this is powerful: give an agent a computed harm score, and an action in the 85th percentile becomes a signal to step back and think deeper before proceeding. The catch — and both agree on it, laughing — is that computing that score drops you right back into the oldest unsolved problem in security: knowing your organization's actual risk appetite. But that's a problem the CISO is already supposed to own, and it puts the leader back in control of the conversation instead of on the sidelines.

Security and trust belong in one org

The last movement returns to something Sean did before it was common: at Reddit and Bumble he brought security and trust and safety under one banner. His case for it is practical, not ideological. The threats overlap more than the org charts admit — evaluating a submitted photo has technical sourcing signals you can block or rate-limit before a human moderator ever sees it, and treating that as pure infrastructure security misses the abuse angle entirely.

❝

"there's just an empathy to having teams work together on a similar problem, but see that they can actually kind of cross-pollinate and share."

— Sean Catlett

He ties it directly to agents through a point Conor raises from a prior guest, Nicole Kerrigan of Darktrace, whose prediction for the year is an explosion of insider threat driven by agents. Sean's framing dissolves the distinction: whether a token gets used to walk the back office by a person, by a person claiming they didn't know their LLM was doing it, or by an LLM doing it on its own while thinking it's being helpful — you don't need to tell them apart to respond, contain, and run normal incident response.

But single-owner accountability has a trap, and Sean names it. Consolidating security, trust, and integrity under one executive also collates all of that risk onto one person.

❝

"if you are the only person making those trade offs, then other people can leave a lot of risk for you."

— Sean Catlett

The better structure distributes the trade-offs — dotted lines into product and operations teams, embedded owners who make the risk calls where the work happens — so that the person dealing with a live attack can tell another team why their launch needs to slow down, and be heard. Conor closes on the same principle from a recent CISO guest: co-locate the risk with the business owner, and let the security leader show up as a credible witness who helps design the solution rather than absorbing all the exposure alone.

Show notes

Guests — Sean Catlett, co-founder of Polymodal; Reddit's founding CISO; former Chief Security Officer at Slack; ran security and trust and safety together at Bumble.

Books mentioned — None named in the conversation.

Frameworks / models / tools named — computed trust (identity, delegation, intent, boundaries); workforce vs. workflow; observability is not legibility; irreversibility as a trained model behavior; LLM-as-judge / runtime risk scoring; business / enterprise context graph; engineering-led and threat-led security teams; Google NotebookLM; the Claude Chrome plugin; Windows Subsystem for Linux; Docker containers.

Other people / shows / resources referenced — Stuart Mitchell (co-host); Guy Rosen (leaving Meta to advise on AI); Nicole Kerrigan (AI strategy, Darktrace — prior Zero Signal guest); Polymodal; Bumble; Reddit; Slack; London Tech Week; Black Hat EU

Hosted by Conor Sherman and Stuart Mitchell.

Why Agent Trust Must Be Computed, Not Granted

What you'll learn

Description

What we cover

Thank you to our Sponsors:

The conversation

The CISO gets in the room or gets layered out

Judgment is the one thing you can't delegate

Workforce, not workflow

Computing trust: identity, delegation, intent

Observability is not legibility

The missing layer: teaching agents to compute their own risk

Security and trust belong in one org

Show notes

Keep Reading