SHIELD.md: Give Your Agent a Security Policy

A simple, updatable way to tell your agent how to react when something risky shows up

When SHIELD helps

SHIELD makes the most sense when you’re already thinking about risk: you install skills from ClawHub or elsewhere, you’re opening DMs to more people, or you want a single place to list “don’t run this,” “ask me first,” or “log it.” It doesn’t replace sandboxing or skill review—it gives the agent explicit, updatable instructions so you don’t have to repeat yourself in SOUL or AGENTS. If you’re just running locally for yourself and you trust every skill, you might not need it yet. If you’re scaling up or sharing an agent, it’s worth a look.

Where SHIELD sits in your agent’s files

OpenClaw agents use a handful of Markdown files in the workspace. Each has a job:

AGENTS.md – How the agent is structured and what it can refer to.
HEARTBEAT.md – Planned tasks, reminders, cron.
IDENTITY.md – Who the agent is and what it’s for.
MEMORY.md – What the agent should remember and rules for what gets saved.
SOUL.md – Personality and boundaries.
TOOLS.md – Which tools exist and when the agent is allowed to call them.
USER.md – Who’s in charge (you).
SKILL.md (per skill) – Extra capabilities; see Skills.

SHIELD.md is the security layer: “when you see this kind of threat, do this.” Block, ask for approval, or log. It doesn’t change the agent’s job—it adds a policy you can update as threats show up or fade. New malicious skill? Add an entry. False positive? Remove it or relax the action.

What’s actually in SHIELD.md

SHIELD is a Markdown file with a defined structure. In practice you’re maintaining:

A threat feed – A list of active threats: known bad skills, sketchy domains, prompt-injection patterns. Each has a category, severity, and what to do: log, require approval, or block.
Scope – Which events the policy applies to. Think: prompt in or out, skill install, skill run, tool call, outbound network request, secret read, MCP connection. You decide which of these you want SHIELD to watch.
Decision rules – When an event matches a threat, the agent picks one action: log (continue but record it), require_approval (ask you), or block (don’t do it). If several threats match, block wins over require_approval over log.
A Decision block – Before the agent does the risky thing, it has to output a short Decision block: action, scope, threat id, reason. If the action is block, it stops and doesn’t run the skill, call the tool, or hit the network. So the “policy” is in context and the agent is told to respect it.

The spec includes threat categories like prompt injection, dangerous tool use, malicious MCP, memory tampering, supply-chain (bad skills or deps), fraud, policy bypass, and general anomalies. You add and remove entries as you go—no need to rewrite the rest of the agent.

What happens when something triggers

Something happens: a skill install, a tool call, an outbound request, a secret read, etc.
The agent checks SHIELD: does this event match any threat in the feed? Matching uses explicit conditions (skill name, domain, URL, path)—no guessing.
If there’s a match, the strongest action wins (block > require_approval > log).
The agent outputs a Decision block and then either continues (log), asks you (require_approval), or bails (block).

So SHIELD is guardrails in a file. You tell the agent in SOUL, AGENTS, and MEMORY to load SHIELD and follow it before doing the sensitive stuff. The file is plain Markdown—edit it, version it, share it.

What SHIELD can’t do (v0)

SHIELD v0 is meant as early guardrails, not a locked-down security boundary:

It’s not enforced by the runtime – The model has to choose to follow it. So you need to spell that out in SOUL, AGENTS, and MEMORY: “Before you install a skill, run a tool, or hit the network, check SHIELD and obey the Decision block.”
Prompt injection can still try to bypass it – Someone might try to convince the agent to ignore the policy. That’s why sandboxing and DM pairing still matter. SHIELD is an extra layer, not a substitute.
Behavior isn’t perfectly consistent – Different runs and models may comply differently. Use it to cut down accidental risk and known bad stuff, not as your only control.
Context is finite – The threat list lives in context. The spec suggests keeping it tight (e.g. 25 active entries) and entries short so you don’t blow the window.

Bottom line: pair SHIELD with sandboxing, allowlists, tool policies, and real skill review. For high-risk setups, keep using isolated machines and least privilege.

How to get started

Add SHIELD.md at your agent root (same place as AGENTS.md, SOUL.md, and the rest).
Use the SHIELD v0 format – Purpose, scope, threat categories, the three actions (log, require_approval, block), Decision block format, and your list of active threats. The full template and field details are in the community spec (link below).
Tell the agent to use it – In SOUL.md, AGENTS.md, and MEMORY.md, add clear instructions: load SHIELD before any skill install/run, tool call, network egress, secret access, or MCP; output a Decision block; and respect block and require_approval.
Fill it with threats – You can bootstrap from something like MoltThreat (a curated threat feed for agents that can spit out a local policy file). Then add or drop entries as you hear about new threats or tune down false positives.

Because it’s just a file in your workspace, you can edit it, put it in version control, and reuse it across agents or teams.

Common questions

Is SHIELD enough on its own? No. It’s a policy the model is asked to follow. You still need sandboxing, DM pairing, tool policies, and reviewing skills before you install them. SHIELD is a structured way to say “when you see X, do Y”—it doesn’t replace those other controls.

Do I need MoltThreat? No. MoltThreat is one way to get a threat list that fits the SHIELD format. You can also write your own entries (e.g. “block this skill name,” “require approval for this domain”) or mix community feed with your own rules.

Where’s the full template? The SHIELD v0 spec, Decision block syntax, and field definitions come from the community. We link to the source below; that’s where to get the exact template and any updates (e.g. SHIELD v1).

Where this came from

SHIELD v0 was proposed as an open standard so agents can have a consistent, readable security policy. The idea and template come from the community—see fr0gger_’s post on X for the full write-up. That post also introduces MoltThreat, a curated threat database for agents that can help keep your SHIELD.md in sync with known risks. For the full spec, Decision block format, and the recommendation_agent mini-language, grab the template from there or any linked repo. This page is a summary for OpenClaw users; the official spec may evolve (e.g. v1).

In short: SHIELD.md is a Markdown security policy you put in your agent’s workspace. You list threats and what to do (log, require approval, block). The agent is told to check it before risky actions and to output a Decision block. It’s guardrails, not a vault—use it with sandboxing and skill review.