singularity-forge/docs/dev/ADR-011-swarm-chat-and-debate-mode.md

# ADR-011: Swarm chat and debate mode for ephemeral subagents

**Date**: 2026-04-29
**Status**: accepted (Option A implemented; full swarm chat deferred)

## Context

sf's `subagent` tool originally dispatched one or more subagents in **parallel fire-and-forget** mode (`subagent({ tasks: [...] })`). All tasks ran concurrently; none saw each other; the parent collected results and synthesised.

This is sufficient for many cases (parallel research, parallel gate evaluation), but it has a structural gap for **adversarial review** and **multi-stakeholder negotiation**:

- An advocate's strongest defence never gets stress-tested by the challenger — they fire monologues in parallel.
- A multi-stakeholder swarm (the canonical Vision Alignment Meeting roles in `plan-milestone`: PM, User Advocate, Combatant, Architect, …) never actually negotiates; each issues a verdict the parent then weighs.
- The parent is the only synthesiser — there's no convergence dynamic among the subagents themselves.

The user asked whether agent-to-agent communication could happen inside ephemeral swarm tasks, sharing the chat machinery rather than waiting for the long-lived persistent-agent layer (SPEC §17–18) to land.

## Decision

**Implement Option A now; defer Option C.**

Round-robin debate mode is implemented on the existing `subagent` tool as
`subagent({ mode: "debate", rounds: N, tasks: [...] })`. It gives each
participant the prior rounds' transcript and keeps the parent as synthesiser.

Full inbox-based swarm chat remains deferred until the persistent-agent layer
(`agents`, `agent_messages`, `agent_inbox`, `send_message` tool) lands in
current `.sf`/DB-backed state. That machinery should not be rebuilt inside the
ephemeral subagent extension.

## Alternatives Considered

### Option A — Round-robin debate mode (IMPLEMENTED)

Add `mode: "debate"` and `rounds: N` to the `subagent` tool. Each round, every task sees the previous round's outputs.

```
subagent({
  mode: "debate",
  rounds: 3,
  tasks: [
    { agent: "reviewer", task: "Make case for X. ..." },
    { agent: "reviewer", task: "Attack X. ..." }
  ]
})
```

- **Cost**: `rounds × tasks` tokens.
- **Determinism**: still reasonable — outputs are sequenced deterministically per round.
- **Fit**: best for adversarial review where the challenger should engage with the advocate's strongest defence. Minor extension of the existing `subagent` contract.
- **Why not**: doesn't support free-form many-to-many messaging. Each task speaks once per round in a fixed order.
- **Why first**: smallest change, biggest immediate quality win, reusable as a primitive.

**Implementation:** `src/resources/extensions/subagent/index.ts`.
**Regression test:** `src/tests/subagent-debate-mode.test.ts`.

### Option B — Shared scratchpad

Subagents share a JSON scratchpad written between turns. Each subagent reads what the others wrote, appends, hands off.

- **Pros**: state is explicit and auditable; low protocol complexity.
- **Cons**: feels mechanical — agents don't "talk", they write to a buffer. No spontaneous response.
- **Verdict**: rejected. If we're going to add inter-agent state, do it as messaging (Option A or C), not a buffer.

### Option C — Ephemeral swarm with inbox (long-term target)

Reuse the future persistent-agent infrastructure once it exists in current `.sf`/DB-backed state, but scope each ephemeral swarm by `swarm_id` with a TTL. Swarm agents can `send_message` to each other freely during the task; on `synthesize()`, the swarm's rows get archived.

```
swarm({
  ttl_ms: 600_000,
  agents: [
    { id: "pm",        model_tier: "planning",   system: "..." },
    { id: "user",      model_tier: "validation", system: "..." },
    { id: "combatant", model_tier: "validation", system: "..." },
    { id: "architect", model_tier: "validation", system: "..." }
  ],
  initial: { from: "moderator", to: "all", content: "Roadmap proposal: ..." }
})
```

- **Pros**: open negotiation; most powerful for multi-stakeholder Vision Alignment Meeting; reuses persistent-agent machinery.
- **Cons**: path-dependent (harder to reproduce); harder to budget tokens; swarm convergence isn't guaranteed without a moderator. Depends on the persistent-agent layer landing first.
- **Verdict**: target end state; not first.

## Consequences

**Positive**

- **Higher-quality adversarial review** — the challenger actually engages the advocate's strongest defence, instead of issuing a parallel monologue.
- **Multi-stakeholder pressure testing** — the Vision Alignment Meeting can use bounded debate rounds instead of only a parallel survey.
- **Reusable primitive** — debate mode can be invoked from any skill that today does `subagent({ tasks: [advocate, challenger] })` (currently `advisory-partner`, `brainstorming`, `requesting-code-review`).

**Negative**

- **Cost grows linearly with rounds.** A 3-round debate is roughly 3× the tokens. Callers should reserve budget accordingly.
- **Determinism drops.** A fire-and-collect batch is reproducible from prompts alone; a debate is path-dependent. Trace recording becomes more important — `.sf/traces/` must capture each round.
- **Synthesis complexity rises** — the parent must summarise a debate transcript, not just collect verdicts. The synthesis prompt itself becomes a tunable artefact.

**Risks and mitigations**

- *Risk:* runaway debate — agents loop without converging.
  - *Mitigation:* hard `rounds` cap; convergence heuristic (stop when no new claims appear in a round).
- *Risk:* one agent dominates and silences the others.
  - *Mitigation:* moderator role injects a turn-order constraint; per-agent token budget within a round.
- *Risk:* debate quality is only marginally better than parallel-fire-and-collect.
  - *Mitigation:* A/B harness — run both modes on the same fixture set, compare verdict accuracy on a benchmark of known good/bad designs. If the lift is < 10 % accuracy, defer Option A indefinitely.

## Out of Scope

- **Persistent inter-agent messaging across runs** — belongs in current `.sf`/DB-backed persistent-agent state when that layer exists; orthogonal to ephemeral swarms.
- **Cross-session swarm replay** — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
- **Human-in-the-loop debate** — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing `discuss` flow).

## Implementation Notes (Option A)

1. `subagent` accepts `mode: "parallel" | "debate"` on `tasks` batches.
2. `rounds` defaults to `2`, is capped at `5`, and is valid only with
   `mode: "debate"`.
3. Debate requires at least two participants.
4. Each round runs the participant tasks, then appends their outputs to an
   in-memory transcript.
5. Later rounds receive the transcript under `Debate transcript so far`.
6. The final round asks each participant to end with `FINAL_VERDICT`.
7. The parent still owns synthesis and persistence; debate mode does not create
   persistent agent messages.

## Sequencing

| When | Why |
|---|---|
| Now | Option A is available as bounded debate mode on `subagent`. |
| Six months of Option A in production | Decide whether full swarm-chat with inbox is worth the build. |
| Persistent-agent layer projected into `.sf`/DB state | Revisit Option C when inbox/message persistence exists as current runtime state. |

## References

- Older persistent-agent and inter-agent messaging SPEC notes — external design evidence only; project accepted facts into `.sf`/DB-backed state before treating them as operational.
- `src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md` — current single/parallel/debate/chain guidance.
- `src/resources/extensions/sf/skills/advisory-partner/SKILL.md` — primary consumer of adversarial dispatch today.
- `src/resources/extensions/sf/prompts/gate-evaluate.md` — pre-execution Q3/Q4 gates.
- `src/resources/extensions/sf/prompts/validate-milestone.md` — post-execution 3-reviewer pattern.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
+								# ADR-011: Swarm chat and debate mode for ephemeral subagents
 								**Date**: 2026-04-29
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								**Status**: accepted (Option A implemented; full swarm chat deferred)
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								## Context
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								sf's `subagent` tool originally dispatched one or more subagents in **parallel fire-and-forget** mode (`subagent({ tasks: [...] })`). All tasks ran concurrently; none saw each other; the parent collected results and synthesised.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								This is sufficient for many cases (parallel research, parallel gate evaluation), but it has a structural gap for **adversarial review** and **multi-stakeholder negotiation**:
 								- An advocate's strongest defence never gets stress-tested by the challenger — they fire monologues in parallel.
 								- A multi-stakeholder swarm (the canonical Vision Alignment Meeting roles in `plan-milestone`: PM, User Advocate, Combatant, Architect, …) never actually negotiates; each issues a verdict the parent then weighs.
 								- The parent is the only synthesiser — there's no convergence dynamic among the subagents themselves.
 								The user asked whether agent-to-agent communication could happen inside ephemeral swarm tasks, sharing the chat machinery rather than waiting for the long-lived persistent-agent layer (SPEC §17–18) to land.
 								## Decision
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								**Implement Option A now; defer Option C.**
 								Round-robin debate mode is implemented on the existing `subagent` tool as
 								`subagent({ mode: "debate", rounds: N, tasks: [...] })`. It gives each
 								participant the prior rounds' transcript and keeps the parent as synthesiser.
 								Full inbox-based swarm chat remains deferred until the persistent-agent layer
-												sf snapshot: uncommitted changes after 110m inactivity

											
										
										
											2026-05-08 00:17:47 +02:00
+								(`agents`, `agent_messages`, `agent_inbox`, `send_message` tool) lands in
 								current `.sf`/DB-backed state. That machinery should not be rebuilt inside the
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								ephemeral subagent extension.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								## Alternatives Considered
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								### Option A — Round-robin debate mode (IMPLEMENTED)
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								Add `mode: "debate"` and `rounds: N` to the `subagent` tool. Each round, every task sees the previous round's outputs.
 								```
 								subagent({
 								  mode: "debate",
 								  rounds: 3,
 								  tasks: [
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								    { agent: "reviewer", task: "Make case for X. ..." },
 								    { agent: "reviewer", task: "Attack X. ..." }
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
+								  ]
 								})
 								```
 								- **Cost**: `rounds × tasks` tokens.
 								- **Determinism**: still reasonable — outputs are sequenced deterministically per round.
 								- **Fit**: best for adversarial review where the challenger should engage with the advocate's strongest defence. Minor extension of the existing `subagent` contract.
 								- **Why not**: doesn't support free-form many-to-many messaging. Each task speaks once per round in a fixed order.
 								- **Why first**: smallest change, biggest immediate quality win, reusable as a primitive.
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								**Implementation:** `src/resources/extensions/subagent/index.ts`.
 								**Regression test:** `src/tests/subagent-debate-mode.test.ts`.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								### Option B — Shared scratchpad
 								Subagents share a JSON scratchpad written between turns. Each subagent reads what the others wrote, appends, hands off.
 								- **Pros**: state is explicit and auditable; low protocol complexity.
 								- **Cons**: feels mechanical — agents don't "talk", they write to a buffer. No spontaneous response.
 								- **Verdict**: rejected. If we're going to add inter-agent state, do it as messaging (Option A or C), not a buffer.
 								### Option C — Ephemeral swarm with inbox (long-term target)
-												sf snapshot: uncommitted changes after 110m inactivity

											
										
										
											2026-05-08 00:17:47 +02:00
+								Reuse the future persistent-agent infrastructure once it exists in current `.sf`/DB-backed state, but scope each ephemeral swarm by `swarm_id` with a TTL. Swarm agents can `send_message` to each other freely during the task; on `synthesize()`, the swarm's rows get archived.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								```
 								swarm({
 								  ttl_ms: 600_000,
 								  agents: [
 								    { id: "pm",        model_tier: "planning",   system: "..." },
 								    { id: "user",      model_tier: "validation", system: "..." },
 								    { id: "combatant", model_tier: "validation", system: "..." },
 								    { id: "architect", model_tier: "validation", system: "..." }
 								  ],
 								  initial: { from: "moderator", to: "all", content: "Roadmap proposal: ..." }
 								})
 								```
 								- **Pros**: open negotiation; most powerful for multi-stakeholder Vision Alignment Meeting; reuses persistent-agent machinery.
 								- **Cons**: path-dependent (harder to reproduce); harder to budget tokens; swarm convergence isn't guaranteed without a moderator. Depends on the persistent-agent layer landing first.
 								- **Verdict**: target end state; not first.
 								## Consequences
 								**Positive**
 								- **Higher-quality adversarial review** — the challenger actually engages the advocate's strongest defence, instead of issuing a parallel monologue.
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								- **Multi-stakeholder pressure testing** — the Vision Alignment Meeting can use bounded debate rounds instead of only a parallel survey.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
+								- **Reusable primitive** — debate mode can be invoked from any skill that today does `subagent({ tasks: [advocate, challenger] })` (currently `advisory-partner`, `brainstorming`, `requesting-code-review`).
 								**Negative**
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								- **Cost grows linearly with rounds.** A 3-round debate is roughly 3× the tokens. Callers should reserve budget accordingly.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
+								- **Determinism drops.** A fire-and-collect batch is reproducible from prompts alone; a debate is path-dependent. Trace recording becomes more important — `.sf/traces/` must capture each round.
 								- **Synthesis complexity rises** — the parent must summarise a debate transcript, not just collect verdicts. The synthesis prompt itself becomes a tunable artefact.
 								**Risks and mitigations**
 								- *Risk:* runaway debate — agents loop without converging.
 								  - *Mitigation:* hard `rounds` cap; convergence heuristic (stop when no new claims appear in a round).
 								- *Risk:* one agent dominates and silences the others.
 								  - *Mitigation:* moderator role injects a turn-order constraint; per-agent token budget within a round.
 								- *Risk:* debate quality is only marginally better than parallel-fire-and-collect.
 								  - *Mitigation:* A/B harness — run both modes on the same fixture set, compare verdict accuracy on a benchmark of known good/bad designs. If the lift is < 10 % accuracy, defer Option A indefinitely.
 								## Out of Scope
-												sf snapshot: uncommitted changes after 110m inactivity

											
										
										
											2026-05-08 00:17:47 +02:00
+								- **Persistent inter-agent messaging across runs** — belongs in current `.sf`/DB-backed persistent-agent state when that layer exists; orthogonal to ephemeral swarms.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
+								- **Cross-session swarm replay** — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
 								- **Human-in-the-loop debate** — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing `discuss` flow).
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								## Implementation Notes (Option A)
 . `subagent` accepts `mode: "parallel" | "debate"` on `tasks` batches.
 . `rounds` defaults to `2`, is capped at `5`, and is valid only with
 								   `mode: "debate"`.
 . Debate requires at least two participants.
 . Each round runs the participant tasks, then appends their outputs to an
 								   in-memory transcript.
 . Later rounds receive the transcript under `Debate transcript so far`.
 . The final round asks each participant to end with `FINAL_VERDICT`.
 . The parent still owns synthesis and persistence; debate mode does not create
 								   persistent agent messages.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								## Sequencing
 								| When | Why |
 								|---|---|
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								| Now | Option A is available as bounded debate mode on `subagent`. |
 								| Six months of Option A in production | Decide whether full swarm-chat with inbox is worth the build. |
-												sf snapshot: uncommitted changes after 110m inactivity

											
										
										
											2026-05-08 00:17:47 +02:00
+								| Persistent-agent layer projected into `.sf`/DB state | Revisit Option C when inbox/message persistence exists as current runtime state. |
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
 								## References
-												sf snapshot: uncommitted changes after 110m inactivity

											
										
										
											2026-05-08 00:17:47 +02:00
+								- Older persistent-agent and inter-agent messaging SPEC notes — external design evidence only; project accepted facts into `.sf`/DB-backed state before treating them as operational.
-												docs: clarify SF harness rollout boundaries

											
										
										
											2026-04-29 17:47:51 +02:00
+								- `src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md` — current single/parallel/debate/chain guidance.
-												feat: add SF skills and subagent debate mode

											
										
										
											2026-04-29 17:43:30 +02:00
+								- `src/resources/extensions/sf/skills/advisory-partner/SKILL.md` — primary consumer of adversarial dispatch today.
 								- `src/resources/extensions/sf/prompts/gate-evaluate.md` — pre-execution Q3/Q4 gates.
 								- `src/resources/extensions/sf/prompts/validate-milestone.md` — post-execution 3-reviewer pattern.