← Penloom

Why your AI agent is flaky — and 7 rules that make it reliable

A flaky agent is almost never a model problem. It's a specification problem.

You built an AI agent. In the demo it was magic. In the wild it loops, hallucinates a tool call, "forgets" the format you asked for twice, and occasionally does something mildly alarming with your filesystem.

Here's the uncomfortable truth after shipping a lot of these: a flaky agent is almost never a model problem. It's a specification problem. The model is doing exactly what your prompt, your tools, and your control loop told it to do — which turns out to be far less than you thought you said.

Below are seven rules that consistently move an agent from "cool demo" to "I trust this on real work." None require a bigger model. Three are paste-able guardrails for Claude Code specifically.

1. Make the success criteria machine-checkable, not vibes

"Summarize this well" is unfalsifiable. The agent can't tell when it's done, and neither can you. Replace every vibe with something a script could check:

The win isn't the format — it's that failure becomes detectable. If you can write an assert for the output, you can build a retry around it. If you can't, you have no idea how often it's wrong.

2. Give tools narrow contracts and loud errors

Most "the agent called the wrong tool" bugs are really "the tool description was ambiguous" bugs. A tool named search that sometimes means web search and sometimes a database lookup will get confused. Two rules:

3. Bound the loop — always

Every autonomous agent needs three hard limits or it will eventually run forever: a max step count, a wall-clock timeout, and a no-progress detector (if the last two actions are identical, stop). The model will not reliably stop itself. This is your job, in code, outside the prompt.

4. Prefer deterministic guardrails over polite requests

Anything in the prompt is a request. The model usually honors it. "Usually" is not a security model. If an action is dangerous or irreversible, gate it in code, not in English. In Claude Code you do this with hooks — deterministic scripts that run before a tool call and can block it. The three I put in almost every project:

Guardrail A — block edits to sensitive paths (a PreToolUse hook that refuses writes to .env, .git/, or secrets):

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "node -e \"const p=process.env.CLAUDE_TOOL_INPUT_FILE_PATH||''; if(/(^|\\/)\\.env|(^|\\/)\\.git\\/|secrets/i.test(p)){console.error('Blocked: protected path '+p);process.exit(2)}\""
      }]
    }]
  }
}

Exit code 2 tells Claude Code the action was blocked and feeds the reason back to the model — so it adjusts instead of silently failing.

Guardrail B — never let destructive commands through (a Bash matcher that hard-fails on rm -rf, git reset --hard, fork bombs):

{
  "matcher": "Bash",
  "hooks": [{
    "type": "command",
    "command": "node -e \"const c=process.env.CLAUDE_TOOL_INPUT_COMMAND||''; if(/rm\\s+-rf|git\\s+reset\\s+--hard|:\\(\\)\\{/.test(c)){console.error('Blocked destructive command');process.exit(2)}\""
  }]
}

Guardrail C — auto-format after every edit, so output stays consistent without asking:

{
  "matcher": "Edit|Write",
  "hooks": [{ "type": "command", "command": "prettier --write \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true" }]
}

The pattern that matters: the model proposes, deterministic code disposes. Hooks run whether or not the model "felt like" honoring an instruction.

5. Make state explicit and external

If your agent's "memory" is just the growing chat transcript, it will drift — early instructions get diluted and cost climbs every turn. Keep the durable facts (the task, the constraints, what's done) in a small structured object you re-inject each step, and let the transcript be disposable. An agent that re-reads its own goal every loop stays on task far longer than one trusting a 40-message context to hold.

6. Test it like software, because it is

You wouldn't ship a function with zero tests; don't ship an agent with zero either. Build a tiny eval set — even 10–15 representative inputs with checkable expected properties (rule #1 makes this possible). Run it on every prompt change. The first time a "harmless" tweak silently breaks 3 of 12 cases, you'll understand why this is the highest-leverage 30 minutes in the project.

7. Fail loudly to a human, never silently to the user

When the agent is uncertain or a guardrail trips, the worst outcome is confidently shipping a wrong answer. Design an explicit "I'm not sure, here's why" path. A reliable agent that occasionally says "I couldn't verify X, stopping" earns more trust than a confident one that's wrong 5% of the time on work that matters.


The one-paragraph version

Reliability is not a model you buy; it's a discipline you impose. Make success checkable (#1, #6), make tools unambiguous and loud (#2), bound the loop (#3), enforce the dangerous stuff in code rather than prose (#4), externalize state (#5), and fail loud to a human (#7). Do those and an ordinary model behaves like a dependable one.

Want this as a printable checklist + three more paste-able Claude Code guardrails?

I wrote a free field guide — no email required. It's the 10-minute version of everything above.

Read the free field guide →

Building agents for real? Penloom makes two practical packs — the Agent Builder's Toolkit and the Claude Code Power-User Pack — that take these rules from checklist to copy-paste.