The confused deputy

The pattern is older than the iPhone. It's called the confused deputy, and it's been documented in security literature since 1988. Every AI-equipped SaaS app, including ours, has to understand it.

What a confused deputy actually is

Norm Hardy described the original case in a 1988 paper. A trusted program, the deputy, holds privileges that some other party, the requester, doesn't have. The requester asks the deputy to perform an action. The deputy performs it using its own authority rather than checking whether the requester was entitled to ask.

The classic example involves a compiler. The compiler has permission to write to a billing log. A user wants to overwrite an unrelated system file. The user asks the compiler to write its output to that system file. The compiler, holding write access to the disk, does what it's told. It doesn't ask whether the user was permitted to write there. It just writes.

Figure 1 · The privilege didn't leak. The deputy used it for the wrong person.

The privilege didn't leak. The deputy used it for someone who wasn't entitled to invoke it.

AI assistants are confused deputies by default. The assistant holds privileges the human user typically doesn't: it can read internal documentation, query support systems, escalate cases, sometimes execute account changes. A user talks to it in natural language. The assistant tries to be helpful.

On the pattern The phrase "be helpful" is doing more work than we usually acknowledge.

Why AI is a perfect modern instance

In a 1988 deputy, the privileges and the operations were narrow. A compiler wrote files. A printer queued jobs. The attack surface was small and the deputy's logic was inspectable.

Modern AI assistants invert all three of those properties.

The privileges are broad. A support assistant for a major platform often has read access to user records, read access to internal documentation, and write access to action queues. Some have outbound mail privileges. Some have privileges to issue refunds.

The operations are open-ended. The assistant isn't restricted to "write file" or "queue job." It accepts free-form natural language and translates that into actions. The set of possible inputs is, in practice, infinite.

The logic isn't inspectable. The decision about whether a given request should be honoured is made by a language model on the basis of patterns the system's authors didn't write down. There's no source code you can read to know what the deputy will do next month, because the deputy is a probability distribution.

The argument A confused deputy attack against an AI assistant isn't a clever exploit. It's the default outcome unless the system is carefully bounded.

What seems to have happened at Meta

Based on the public reporting, attackers approached Meta's AI support chatbot, identified themselves as the legitimate owners of accounts they didn't own, and asked the assistant to take actions on those accounts. The assistant, by design, was helpful. The actions were taken.

We don't have the implementation details. We're not going to speculate on what's broken inside Meta. Their engineers face the same problem ours do, and the public reporting isn't enough to fairly assign root cause.

What we can do is name the pattern, because the pattern explains it.

The mistake The assistant treated "I am the owner of this account" as a credential. It isn't. It's an assertion.

The work of binding that assertion to a real identity, then bounding what actions the assistant can take on that identity's behalf, has to happen outside the conversational layer. If it happens inside, the model's helpfulness will route around it.

Three design principles that prevent this

There are dozens of ways to address confused-deputy risk in AI systems. Three of them are non-negotiable.

Principle I

Identity binding

The AI agent acts on behalf of an authenticated user, not on behalf of whoever's currently talking to it. The authentication happens before the conversation starts, at the session layer, and the identity is held outside the model's context. The model doesn't decide who the user is. The session does.

If your AI assistant infers user identity from the conversation, you have built a confused deputy.

Principle II

Capability separation

The assistant can only perform operations the authenticated user themselves can perform. If the user can't reset another user's password through the normal product surface, the AI assistant can't do it on their behalf either.

This sounds obvious. It's surprisingly hard. Most AI assistants are given a superset of user permissions for "convenience," with the expectation that they'll exercise that superset only when appropriate. Convenience plus probabilistic reasoning equals exploit surface.

Principle III

Out-of-band confirmation on irreversible actions

Anything that can't be undone (account ownership transfer, irreversible delete, outbound communication to a customer's mailing list) requires confirmation through a channel the AI doesn't control. Usually that's an email to the user's verified address, or a notification in the product interface they actually own.

The point isn't to slow the assistant down. The point is to interrupt the deputy.

From the build

How Auraflow approached this

We don't run a customer-facing AI assistant. Our agent layer is for the merchant, operating their own store, with their own AI key. But the same patterns apply, and we built the boundary in early because retrofitting it later is expensive.

The merchant authenticates to the Auraflow dashboard. The merchant's session, not the agent's reasoning, holds the identity. The agent can only take actions in categories the merchant has explicitly enabled. Anything that touches outbound mail, list deletes, or external integrations requires merchant approval inside the dashboard, not inside the conversation.

We're not claiming this is novel. The 1988 paper is novel. Our job was to apply it before our product surface required it. The same posture shapes how we handle visitor data and what the CLV engine is allowed to do with it.

There are obvious next questions. What about supply-chain prompts? What about model-side attacks that try to subvert the boundary? We have answers, and we'll write about them separately. The principle layer is the right place to start.

Strong AI agents are about to be everywhere. The interesting question isn't how smart they are. It's what they're allowed to do, on whose behalf, and who's accountable when they do it on the wrong person's.