Soul & Agent Identity
Why your agent's identity matters more than its tools
Why your agent's identity matters more than its tools
In OpenClaw, an agent's identity—often called its soul—lives in workspace files like SOUL.md and AGENTS.md. These files define who the agent is, how it reasons, and what it will or won't do. Research and practitioner experience consistently show that identity shapes behavior more than adding more skills or tools: a well-defined soul makes the same capabilities produce much better output; a generic or misaligned one degrades it. This page explains why, and how to design souls that work with OpenClaw's workspace and Setup Wizard.
The soul is the set of identity and role instructions you give the agent—typically in SOUL.md (personality, boundaries, anti-patterns) and AGENTS.md (structure, what it can refer to, operating rules). Together with TOOLS.md and optional SHIELD.md, these files live in your workspace and are loaded into the system prompt. They don't change what the model can do (that's tools and skills); they change how it uses them. Memory holds facts and history; the soul holds beliefs, constraints, and identity. If you want the agent to behave differently, change the soul—not just the memory file.
Studies across 162 different roles and four major LLM families found that adding role labels like "You are a helpful assistant" or "You are an expert at X" to system prompts produced no statistically significant improvement over having no role at all; the effect was largely random. So the most common pattern in agent building—a one-line identity—often burns context tokens for zero benefit. Think of it like a hospital: someone with a badge that says "Surgeon" doesn't get a scalpel until you know where they trained, what they specialize in, and what complications they've managed. The badge activates nothing; the depth behind it activates everything. The same holds for LLMs: "You are a mathematician" triggers broad, shallow associations; the model gestures vaguely at math. But a specific expert description—e.g. a number theorist with deep expertise in analytic methods, two decades on prime gap distributions, published work on sieve techniques—lights up specific knowledge clusters and locks the model onto much tighter, more reliable behavior. Research on expert prompting shows that detailed expert identities (generated or hand-written) lead to answers rated significantly higher in quality than generic prompts. So: use a detailed soul or none at all. The middle ground ("You are an expert at X") is usually a waste of tokens.
LLMs give the first and last tokens in the context window the most attention ("Lost in the Middle" and similar research). Information in the middle is weighted less and can even hurt performance when critical instructions are buried there. So: your agent's soul should go first in the system prompt. Every token you put before it (operational boilerplate, tool lists, long memory dumps) dilutes it. OpenClaw's prompt assembly uses your workspace files; keeping identity at the top ensures the model actually follows who the agent is before it processes tasks and tools. See Context for how context is built and compacted.
How you write the soul matters. Rules ("Always check X before Y") read like a checklist; the agent complies literally. Experiential identity ("I've learned that X because [experience that taught it]") reads like a person with judgment—and research on role-play and persona prompting shows it often outperforms rule lists. For example:
Converting behavioral rules into "I've learned that [insight] because [experience]" makes the agent behave like someone who has that experience, not just someone following instructions. A well-calibrated soul multiplies the value of the same tools; a misaligned one can actively degrade output.
You don't have to hand-write every soul. Research shows that LLM-generated expert identities often outperform human-written ones: the model is better at designing the expert it needs to become than a generic description you write for it. A practical pattern is two-stage prompting: (1) Give the agent the task or domain; (2) Have it generate a detailed expert identity—specific background, specializations, years of experience, what it would and wouldn't do; (3) Then have it answer or work as that expert. In OpenClaw you can run a session where the agent drafts or refines SOUL.md for a given role, you review and edit, then use that file in the workspace for future runs. The Setup Wizard helps keep workspace layout and config consistent when you add or change souls.
Identity and domain need to match. A "revenue" soul aimed at pricing and growth works when the task is revenue; the same soul on a pure engineering task can underperform or feel off. When the soul is well aimed at the task (e.g. a reviewer who can reason like a skeptic, a customer, and a technologist, then synthesize), performance doesn't just add—it multiplies. So: fewer, sharper agents, each with a soul tuned to a clear domain, often beat many generic agents. Use per-agent or per-session workspaces and distinct SOUL.md / AGENTS.md per role so each agent has a coherent identity.
OpenClaw supports sub-agents (spawned tasks that run with their own context). A sub-agent typically starts with zero identity—no soul, no long history. It gets a task and maybe some skills; it's more like a function call: spec in, result out. That's appropriate for bounded, well-defined work. The mistake is treating sub-agents like full agents: giving them a grand title ("You are the CTO") without the context to back it up produces generic output. Instead: give sub-agents the values and standards of the parent, plus a precise task. For example: "You are a code security auditor. Apply these standards: [specific list]. Your task: review this authentication module." Identity does not inherit; values and constraints do. For multi-agent setups, see Multi-Agent Team and Sessions.
When real decisions depend on agent output, a single soul can miss errors that multiple perspectives catch. Research on multi-expert prompting (e.g. simulating several experts in a chain of thought, each responding independently, then aggregating or selecting the best answer) has shown substantial gains in truthfulness, factuality, and reduced toxicity compared to a single expert. In practice: for high-stakes tasks, route outputs through multiple expert souls or reviewers—e.g. a second session with a different SOUL.md that critiques or validates the first agent's answer, or a small team of agents with distinct identities that vote or synthesize. It costs more tokens and time, but for outputs that actually matter it's often the minimum responsible approach. See Multi-Agent Team for session-based roles and coordination.
What an agent refuses to do is often as diagnostic of expertise as what it produces. Many practitioners budget a significant part of the soul (e.g. 30–40%) to anti-patterns—specific behaviors the agent will never do, written as clear identity claims. Not vague traits ("I don't micromanage") but concrete behaviors ("I don't rewrite a delegate's output instead of giving feedback") so the model can catch itself in real time. Hard constraints and "Not My Domain" sections tend to work better than soft guidance like "try to delegate." Every time you resolve an ambiguity in the soul, the agent tends to get sharper.
An agent whose soul never updates is frozen from day one. Treat the soul as a living document: refine it based on what the agent does well or poorly, and add a feedback loop (e.g. you or another process that reviews outputs and updates SOUL.md or AGENTS.md). That way the agent can improve over time instead of repeating the same mistakes. Your workspace is just files—edit with any editor, version with Git, and use the Setup Wizard to regenerate or merge config when you change structure; the soul files themselves you edit directly.