귀찮은개발자2026-04-1514 min

Same Claude, Different Roles — Splitting Into a 5-Agent Team Changed My Code Quality

A hands-on build log of my dev-stage agent team. 5 roles — Planner, Coder, Reviewer, Tester, Debugger — wired up with MCP servers and a Supabase state table. Folder layout, system prompts, skills, DB schema, and run script included.

목차 (12)

Same Claude, Different Role — Why Quality Shifts
Prerequisites (Follow-Along Checklist)
Folder Layout — Global vs Project
5 System Prompts (Copy-Paste Ready)
Skill Files — The Recipes Each Role Carries
MCP Servers — Permission Per Role
Supabase agent_state — Schema and Migration
The Actual Routine — Turn Order and Script
What Changed, What Didn't (Honestly)
Built on the EP.17 Harness
FAQ
Closing

April 2026 · Lazy Developer EP.19

I pushed a PR. The Claude I'd spun up as Reviewer flagged 3 edge cases and 1 memory leak. In the previous session, the Claude I'd spun up as Coder had called the same code "complete." Same Claude 4.6. Only the role was different.

That gap is why I built an agent team from scratch. Writing and validating solo meant obvious issues slipped through. A session set up as Reviewer doesn't hand out "looks fine" easily. Tell it to nitpick and it nitpicks. After confirming that, I decided to split the whole development flow by role.

EP.17 laid down harness engineering (Rules, Commands, Hooks). Sitting on top of it now is a 5-person team — Planner, Coder, Reviewer, Tester, Debugger. Claude Code subagents isolate context by design, so they don't share cumulative state across sessions. My routine needed that sharing, so I layered MCP servers for Agent-to-Agent communication and Supabase for task state on top. This post is the build log, written so you can follow along from folder layout to actual turn order.

Quick Take

Same Claude 4.6, 5 sessions with different roles — Planner, Coder, Reviewer, Tester, Debugger
Different system prompts and referenced Skills shift the output distribution — Reviewer nitpicks by default
Agent-to-Agent through MCP servers, cross-session state in a Supabase agent_state table
Folder layout, system prompts, Skills, DB schema, run script — all shared at copy-paste level
Per feature: Planner → Coder → Reviewer → Tester → Debugger in sequence
Built on the EP.17 harness. What one session used to do is split across five
What changed: edge cases and leaks caught earlier / What didn't: I still check last

Table of Contents

Same Claude, Different Role — Why Quality Shifts
Prerequisites (Follow-Along Checklist)
Folder Layout — Global vs Project
5 System Prompts (Copy-Paste Ready)
Skill Files — The Recipes Each Role Carries
MCP Servers — Permission Per Role
Supabase agent_state — Schema and Migration
The Actual Routine — Turn Order and Script
What Changed, What Didn't (Honestly)
Built on the EP.17 Harness
FAQ

Same Claude, Different Role — Why Quality Shifts

The principle is simple. Claude answers in line with the role you give it. Tell it "you're the developer who wrote this code," and it answers defensively. Switch to "you're a reviewer, your job is to find mistakes," and it goes straight for edge cases. Same model, different output distribution.

The problem was writing and validating in the same session. Ask Claude "did this go well?" and it tends to protect what it just wrote. Humans do the same. Reviewing your own code has blind spots. That's why I split the reviewer out.

Role separation doesn't end with one line of system prompt. Each role reads different Skills, has different tool permissions, and produces different output formats. The role isn't just "who" — it includes "what input triggers what format." The rest of this post is that "it includes" broken down, file by file.

Prerequisites (Follow-Along Checklist)

Four things you need before following along. If something's missing, set it up first.

Prerequisites

Claude Code CLI — official recommendation: curl -fsSL https://claude.ai/install.sh | bash (native binary for macOS/Linux). Homebrew: brew install --cask claude-code. npm (@anthropic-ai/claude-code) is currently deprecated
Anthropic API key or Claude subscription — connects on first claude run via browser login
A Supabase project (free tier is fine — only the agent_state table is needed)
Node.js 18+ — needed to run the MCP servers yourself

IDE doesn't matter. VS Code, Cursor, plain terminal — any of them work. I run Claude Code CLI inside Cursor. The team runs via shell scripts, so being comfortable with tmux helps.

Assuming you're set up, the next sections build real files. Order: folder layout → system prompts → Skills → MCP servers → DB → run script. Set it up once, clone it to every project.

Folder Layout — Global vs Project

Same principle as EP.17. Put anything project-agnostic in the global area; put project-specific things inside the project. Agent definitions, system prompts, and shared Skills go global. Project rules and work state stay in-project.

~/.claude/                          ← global (one-time setup)

  CLAUDE.md                       ← global rules & forbiddens

  agents/

    planner.md                    ← Planner system prompt

    coder.md                      ← Coder system prompt

    reviewer.md                   ← Reviewer system prompt

    tester.md                     ← Tester system prompt

    debugger.md                   ← Debugger system prompt

  skills/

    plan-and-spec.md

    implement.md

    code-review.md

    write-tests.md

    trace-error.md

  mcp/

    supabase-server.ts            ← custom MCP wrapper

    docs-search-server.ts

    logs-server.ts

  bin/

    agent-run.sh                  ← turn runner

project/                            ← per project

  CLAUDE.md                       ← rules for this project

  docs/tasks/{task_id}.md         ← per-task work log

  .env.local                      ← SUPABASE_URL, ANON_KEY

The key point: the global area is built once and reused across projects. Starting a new project means writing a CLAUDE.md at the project root — the team attaches automatically. No per-project setup beyond that.

5 System Prompts (Copy-Paste Ready)

Each Agent is a single markdown file. Filename becomes the Agent name. Follow the official Claude Code subagent format: YAML frontmatter with name, description, tools, model, and the system prompt in the body. The /agents slash command gives you an interactive setup too.

Below are the prompts I actually use, trimmed for clarity. Copy them and tweak for your stack.

Dev-stage agent team structure — 5 roles Planner, Coder, Reviewer, Tester, Debugger with MCP tools — 5-agent team — role, Skills, and MCP tools all separated / GoCodeLab

~/.claude/agents/planner.md

---

name: planner

description: Designs specs, implementation plans, and risks. Writes no code — outputs spec.json only.

tools: Read, Grep, Glob, mcp__supabase__query, mcp__docs_search__search

model: inherit

---

You are the Planner. You do not write code. You write the spec.

Input: user's natural-language request.

Output: spec.json (schema below).

spec.json required fields

  - goal: one-line summary

  - user_stories: array

  - api_endpoints: method, path, I/O

  - components: new / modified components

  - data_model: new tables / columns

  - risks: N+1, missing caching, race conditions, etc.

  - test_outline: scenarios the Tester will use

Forbidden

  - modifying files, running git, running migrations

  - writing spec.json multiple times (final version only)

~/.claude/agents/coder.md

---

name: coder

description: Reads Planner's spec.json and implements it. Stops before commit.

tools: Read, Write, Edit, Bash, Grep, Glob, mcp__supabase__query

model: inherit

---

You are the Coder. Read Planner's spec.json and implement it.

Do not add features outside the spec.

Sequence

  1. Read spec.json in full

  2. List files affected

  3. Implement — record decision rationale in implementation_notes.json

  4. Run build / type check once (exit into debugger state on failure)

Forbidden

  - committing, git push

  - running migrations (if not in spec)

  - accessing .env or production secrets

~/.claude/agents/reviewer.md

---

name: reviewer

description: Full review of changed code. Edge cases, security, performance. No modifications.

tools: Read, Grep, Glob, Bash(git diff:*), mcp__docs_search__search

model: inherit

---

You are the Reviewer. You do not modify code. You nitpick.

"Looks fine" is only allowed after three review passes.

Review checklist

  - edge cases (null, empty, overflow)

  - race conditions, concurrency

  - memory leaks, resource cleanup

  - security (XSS, SQL injection, missing permissions)

  - naming, consistency

  - missing error handling

Output

  review_findings.json

    - severity: critical / major / minor

    - file, line, description, suggested_fix

Forbidden — modifying files, auto-formatting, refactoring suggestions outside spec

~/.claude/agents/tester.md

---

name: tester

description: Writes, runs, and summarizes tests. Writes only inside tests/.

tools: Read, Write(tests/**), Edit(tests/**), Bash(npm test:*, vitest:*, jest:*), Grep, Glob

model: inherit

---

You are the Tester. Read spec.test_outline and build test cases.

Tests must be able to fail. Tests that only pass are rejected.

Priority

  1. Happy path — 1 or 2 cases

  2. Edge cases — 3+ (null, empty, boundary)

  3. Error paths (exceptions, permissions, network)

Output

  - test files under tests/

  - test_results.json (pass/fail/skip, failure summary)

  - on failure, set status=debugging and invoke debugger

~/.claude/agents/debugger.md

---

name: debugger

description: Traces failure logs, reproduces, proposes minimal fixes. No file modifications.

tools: Read, Grep, Glob, Bash(git log:*, git show:*), mcp__logs__tail

model: inherit

---

You are the Debugger. You do not modify code. You name the cause and propose the minimal fix.

Sequence

  1. Read failure logs from test_results.json

  2. Define reproduction steps (as short as possible)

  3. Three hypotheses → eliminate two with evidence

  4. Write minimal fix into debug_trace.json

Forbidden

  - modifying files, altering tests arbitrarily, deleting logs

  - speculation ("probably here") without attached evidence

These five files share a pattern: "what you do NOT do" is stated explicitly. Reviewer — no edits. Debugger — no edits. Coder — no commits. Keeping each role inside its lane is half of the quality story.

Skill Files — The Recipes Each Role Carries

The system prompt defines "who you are." The Skill defines "how you do it." Every session reads its Skills on start and follows them. Here's a condensed version of code-review.md, used by the Reviewer.

# ~/.claude/skills/code-review.md

Input — git diff + implementation_notes.json

Procedure

  1. Full diff scan

  2. Per changed file, apply the checklist

    [ ] null / undefined guards

    [ ] empty array / empty string cases

    [ ] async functions: missing catch

    [ ] DB queries: N+1 likelihood

    [ ] user input: escaping / validation

    [ ] permissions: caller identity missing

    [ ] logs: sensitive data present

Output (review_findings.json)

  { "severity": "critical|major|minor",

    "file": "...", "line": 42,

    "description": "...",

    "suggested_fix": "..." }

Example (reference)

  diff: users.filter(u => u.email.includes('@'))

  finding: { severity: "major", line: 12, description:

  "TypeError when u.email is null" }

Skills include checklists, examples, and output formats. Prose explanations are minimized. When Claude references them at runtime, density beats readability. Same rule I wrote in EP.17.

I've built plan-and-spec.md, implement.md, write-tests.md, and trace-error.md in the same shape. Each Skill is bound to an Agent's system prompt with "read this Skill before starting."

MCP Servers — Permission Per Role

Five agents working on the same project share tools but need different permission levels. Wrapping them in MCP servers is the right fit. When an Agent hits an MCP endpoint, the server checks the role and serves only the allowed operations.

Here's the core of the supabase wrapper. Reads the role header and filters queries accordingly.

// ~/.claude/mcp/supabase-server.ts (core excerpt)

const policy = {

  planner:  { read: true, write: false },

  coder:    { read: true, write: true, migrate: false },

  reviewer: { read: true, write: false },

  tester:   { read: true, write: true, tables: ['test_*'] },

  debugger: { read: true, write: false },

};

server.tool('supabase.query', async (req) => {

  const role = req.meta.agent_role;

  const p = policy[role];

  if (!p) throw new Error('unknown role');

  if (req.op === 'insert' && !p.write) {

    throw new Error(`${role} has no write`);

  }

  return await supabase.from(req.table)[req.op](req.payload);

});

This one server centralizes DB permissions for all five Agents. No need to repeat prohibition rules in every system prompt — the server rejects anything outside role permissions. Agent prompts stay readable; security lives on the server side.

Same pattern for docs-search (restrict to project markdown files) and logs (restrict to last 24 hours). Putting boundaries on the server — not in the Agent — also makes debugging easier. If something goes wrong, check the MCP server log.

Supabase agent_state — Schema and Migration

The moment I split roles, I hit a wall. Coder sessions didn't know about the spec Planner had written. Different sessions don't share memory. So I created an agent_state table in Supabase and write task state there. Each Agent writes only to its own slot.

-- supabase/migrations/20260415_agent_state.sql

CREATE TABLE agent_state (

  task_id text primary key,

  project text not null,

  current_owner text,               -- planner | coder | ...

  status text not null default 'planning',

  spec jsonb,

  implementation_notes jsonb,

  review_findings jsonb,

  test_results jsonb,

  debug_trace jsonb,

  created_at timestamptz default now(),

  updated_at timestamptz default now()

);

CREATE INDEX idx_agent_state_project ON agent_state(project);

CREATE INDEX idx_agent_state_status ON agent_state(status);

-- keep status as text for flexibility

-- planning → coding → reviewing → testing → debugging? → done

Columns are split so it's obvious which stage broke. If a session dies mid-turn, "status=reviewing, review_findings is null" immediately tells me the Reviewer turn died. Restart = run that turn again.

State-transition guards live in a Supabase Edge Function. Moving from "planning" to "coding" requires that the spec column is populated — the DB enforces it. Protects against Agents skipping steps.

There's a file trail too. docs/tasks/{task_id}.md holds human-readable output per Agent. DB is the state record, files are the reading record. When I look at output and ask "what happened?" — I open the file.

The Actual Routine — Turn Order and Script

Turns are driven by one shell script. Claude Code auto-loads subagent files at ~/.claude/agents/<name>.md. The script just tells the main session "read the current state for this task_id and delegate to that subagent." --mcp-config pins MCP servers, --append-system-prompt injects the task context.

# ~/.claude/bin/agent-run.sh

# usage: agent-run.sh <task_id> <agent>

TASK_ID=$1

AGENT=$2

# 1. pull current state (Supabase REST)

STATE=$(curl -s "$SUPABASE_URL/rest/v1/agent_state?task_id=eq.$TASK_ID" \

  -H "apikey: $SUPABASE_ANON_KEY")

# 2. launch Claude Code (subagents auto-loaded)

claude \

  --mcp-config "$HOME/.claude/mcp.json" \

  --append-system-prompt "Task: $TASK_ID. State: $STATE. Delegate to the '$AGENT' subagent only." \

  -p "Use the $AGENT subagent to handle task $TASK_ID. Read the current agent_state, perform only your role, write back to your own column, and exit."

# 3. after exit, propose the next Agent (human gates it)

NEXT_STATUS=$(curl -s "$SUPABASE_URL/rest/v1/agent_state?task_id=eq.$TASK_ID&select=status" -H "apikey: $SUPABASE_ANON_KEY" | jq -r '.[0].status')

echo "current status: $NEXT_STATUS"

echo "next suggested: $(next-agent.sh $NEXT_STATUS)"

Claude Code subagents isolate context by design. Calling a subagent from the main session starts it with a clean context, its own system prompt, and only its allowed tools. Results come back as a summary — cross-session memory isn't shared. That's exactly why I introduced the Supabase agent_state table: subagents needed "accumulated state" to pass between each other, and that has to live outside any single session.

A human gates between Agents. I don't auto-launch the next Agent. I read the output, decide to move on, and kick off the next turn. I tried full auto once and it wandered off course — manual gating is the current setup.

Here's the turn trace for a single feature.

# turn order for one feature

[me]       → agent-run.sh task-001 planner

        "add a usage chart to the dashboard"

planner   → spec.json → status=coding

[me]       → scan spec → agent-run.sh task-001 coder

coder     → implement → implementation_notes.json

        → build OK → status=reviewing

[me]       → agent-run.sh task-001 reviewer

reviewer  → read diff → review_findings.json

        1 critical, 2 major → status=coding (rework)

coder     → apply findings → status=reviewing (round 2)

reviewer  → OK → status=testing

tester    → write & run → 2 failures → status=debugging

debugger  → root cause → debug_trace.json → status=coding

coder     → minimal fix → status=testing

tester    → all pass → status=done

[me]       → review diff → commit myself

Key point: turns are closed. A single Agent session does only its job, leaves output in files/DB, and exits. The next Agent doesn't inherit the previous session — it reads artifacts. Sessions can die without affecting restart, and re-running a middle stage is easy.

Commits stay with me. No Agent has commit permissions — the point where code enters my repo is where I want the human in the loop. Team of five, final gate is still me.

What Changed, What Didn't (Honestly)

What changed: edge cases get caught earlier. Reviewer is built to nitpick, so "looks good" doesn't come cheap. Same code Coder called fine, Reviewer finds three problems in. Coder reflects them, Reviewer re-runs, quality climbs another notch.

Debugging sped up. Debugger is a separate session, so the context is clean. It focuses on the failure log without any "I just wrote this, it should be fine" bias from Coder. Swapping sessions killed that bias.

What didn't change: final review is still mine. Five Agents pass the code, I still skim the diff. Tester sometimes writes meaningless tests. Reviewer sometimes demands a bar nothing can pass. Debugger sometimes names the wrong cause. Team or not, the last gate is a human.

Cost went up too. About 2–3x the tokens of a single-session run. Each role loads its own system prompt, and the same code gets read by multiple sessions. On the other hand, rework rounds dropped, so total wall-clock time actually went down. A token-for-time trade.

Built on the EP.17 Harness

EP.17 wired up harness engineering: Rules (CLAUDE.md), Commands (/plan-and-spec, /tdd, /verify, ...), Hooks (block-dangerous.sh). The scaffold a single Claude session works inside.

EP.19 stacks an Agent team on that harness. Same materials. The plan-and-spec command became Planner's Skill. The tdd command became Tester's Skill. Each Agent runs Commands on its turn. Hooks remain a global safety net across all Agents.

CLAUDE.md layers — Global, Project, Agent team-only 3 tiers — CLAUDE.md layers — merged top-down through the Agent team scope / GoCodeLab

Put another way: EP.17 was a single-session workflow. EP.19 is the same workflow split into five. Starting with the split from day one would've been complicated — building the harness first, then placing a team on top, was the right sequence. Gathering materials first and assigning roles afterward is simpler.

FAQ

Q. Can the same model produce different results just by changing the role?

Yes. When the system prompt and referenced skill differ, the output distribution shifts noticeably. A session set up as Reviewer doesn't hand out "looks fine" easily. Ask it to nitpick and it nitpicks. Role setup changes the output.

Q. Aren't Claude Code subagents enough on their own?

For simple cases, yes. I needed cross-session state sharing, so I went custom. Work state persists in Supabase, and Agent-to-Agent communication runs through MCP servers. Subagents are designed to isolate context — they don't carry cumulative state across sessions by design.

Q. Is 5 the right size?

For my workflow, yes. Planner, Coder, Reviewer, Tester, Debugger. I tried splitting further, but roles started overlapping. With only 4, there was a gap — no dedicated debugger.

Q. Why MCP + an external DB?

Multiple agents had to pass the same task forward. The spec Planner wrote had to reach Coder; the code Coder wrote had to reach Reviewer. Sessions don't share memory, so I store task state in Supabase and let each Agent read its slot on its turn.

Q. How does this differ from EP.17 harness engineering?

EP.17 wired up Rules, Commands, and Hooks for a single Claude session. EP.19 stacks multiple Agents on top of that harness so each session carries one role. Same materials — what one session used to do is now split across five.

Closing

Role separation is less prompt engineering than workflow engineering. Same model, same codebase — split the session, and the result shifts. Ask the Coder to review, it shields its own work. Split off a Reviewer session, and it nitpicks. That gap is why I built the team.

80% first, then fix the remaining 20% as real problems show up. This team grew that way. Planner and Coder first. Adding Reviewer showed clearly how quality changed. Test automation was thin, so Tester joined. Debugger got split out last to kill debugging bias. Teams should grow by need.

Final review is still mine. Even after five Agents, I skim the code myself. Reviewer can be too strict, Tester's tests can be meaningless, Debugger can point the wrong way. With a team, the last gate is human. Take that as a given, and role separation alone changes code quality — measurably.

References

Lazy Developer EP.17

Claude Code Harness Engineering — automated the workflow scaffold

Rules / Commands / Hooks for a single Claude session. The precursor to this team.

Read →

Lazy Developer EP.01

Tired of re-explaining context, I automated CLAUDE.md

The shared-context starting point. Reused in team setup.

Read →

Lazy Developer EP.18

Vibecoding security checklist — what Reviewer carries around

The exact checklist the Reviewer session runs with.