When the AI Editor Dispatched Yesterday’s Story

Archive note: This incident occurred on December 29, 2025, during
development of SoCalNomad’s experimental managing-editor agent.

I asked the automated editor to approve and dispatch one story
cluster.

It dispatched a different one.

The requested cluster was new. The dispatched cluster was an older
identifier that had appeared repeatedly during earlier debugging. The
agent had not invented a random number. It had remembered a real number
from the wrong part of the conversation and treated it as the current
instruction.

This was more troubling than an ordinary syntax error because the
response still looked intentional.

Conversation Is Not an
Execution Plan

The managing editor used conversational context to answer questions,
review candidates, and call tools. During the debugging session, one
cluster had become prominent because several failed attempts involved
it. Later, when another cluster was approved, the old identifier
remained salient in memory.

The tool call used the stale identifier.

From the model’s perspective, both numbers were plausible tokens in
the active context. From the workflow’s perspective, only one was
authorized for dispatch. The system had failed to encode that
distinction.

A prompt saying “use the cluster the user requested” was not enough.
The request needed to become structured, validated state before any side
effect occurred.

The Missing Contract

The dispatch tool accepted a cluster identifier. It did not prove
that the identifier matched the current reviewed object.

A stronger contract would bind several facts together:

  • The cluster identifier requested by the user
  • The decision record that approved that cluster
  • The assignment being created
  • A short-lived operation or confirmation token
  • The current conversation turn or command identifier

The dispatch operation could then reject a request if those facts did
not agree.

Instead of allowing the language model to restate an identifier from
memory, ordinary code should carry the selected identifier from the
review result into the confirmation and dispatch steps.

Memory Was Useful but
Overprivileged

Removing memory entirely would have reduced the risk, but it would
also have made the editor less useful for status questions and follow-up
discussion.

The better conclusion was that conversational memory should influence
explanation, not authorize irreversible action.

This resembles a familiar security principle: untrusted input can
help construct a request, but a privileged boundary must validate the
request before execution. An LLM’s memory is not malicious, but it is
probabilistic and can combine nearby facts incorrectly. That makes it
unsuitable as the sole carrier of an operational identifier.

Confirmation Must
Repeat the Critical Fact

A human-facing confirmation step could have exposed the error:

Dispatch cluster 916 to Publishing Desk?

The identifier, title, and destination should all be shown
immediately before execution. Confirmation should operate on a stored
pending action, not ask the model to reconstruct the action in another
free-form response.

For higher-impact operations, the pending action can be represented
as a database row:

  • operation_id
  • cluster_id
  • action
  • requested_by
  • expires_at
  • confirmed_at

The final tool call receives operation_id, and the
server resolves the cluster from that record. The model never gets
another opportunity to substitute a remembered identifier.

Capacity and
Idempotency Were Not Enough

The newsroom design already checked whether the Publishing Desk was
healthy and whether a cluster had an active assignment. Those controls
prevented some duplicate work, but they did not prove that the correct
cluster had been selected.

This incident separated three different questions:

  1. Is the requested action permitted?
  2. Is it safe to execute now?
  3. Is this the action the user actually requested?

Capacity checks answer the second. Unique constraints answer part of
the first. Neither answers the third.

Intent binding needed its own control.

The Practical Rule

After this failure, the useful rule became:

Never let conversational recall choose the primary key for a
consequential operation.

Let the model interpret the request. Let it explain alternatives. Let
it summarize the record. But once the user selects an object, persist
that selection and pass it through deterministic code.

The wrong cluster was recoverable because the older story had already
been published and the surrounding guards limited the damage. That was
luck assisted by partial safeguards.

The larger lesson was not that AI agents forget. It was that they
remember loosely. A system that gives them tools must decide which facts
can remain conversational and which facts need the rigidity of a
transaction.