sf's `subagent` tool originally dispatched one or more subagents in **parallel fire-and-forget** mode (`subagent({ tasks: [...] })`). All tasks ran concurrently; none saw each other; the parent collected results and synthesised.
This is sufficient for many cases (parallel research, parallel gate evaluation), but it has a structural gap for **adversarial review** and **multi-stakeholder negotiation**:
- An advocate's strongest defence never gets stress-tested by the challenger — they fire monologues in parallel.
- A multi-stakeholder swarm (the canonical Vision Alignment Meeting roles in `plan-milestone`: PM, User Advocate, Combatant, Architect, …) never actually negotiates; each issues a verdict the parent then weighs.
- The parent is the only synthesiser — there's no convergence dynamic among the subagents themselves.
The user asked whether agent-to-agent communication could happen inside ephemeral swarm tasks, sharing the chat machinery rather than waiting for the long-lived persistent-agent layer (SPEC §17–18) to land.
- **Determinism**: still reasonable — outputs are sequenced deterministically per round.
- **Fit**: best for adversarial review where the challenger should engage with the advocate's strongest defence. Minor extension of the existing `subagent` contract.
- **Why not**: doesn't support free-form many-to-many messaging. Each task speaks once per round in a fixed order.
- **Why first**: smallest change, biggest immediate quality win, reusable as a primitive.
Reuse the future persistent-agent infrastructure once it exists in current `.sf`/DB-backed state, but scope each ephemeral swarm by `swarm_id` with a TTL. Swarm agents can `send_message` to each other freely during the task; on `synthesize()`, the swarm's rows get archived.
- **Pros**: open negotiation; most powerful for multi-stakeholder Vision Alignment Meeting; reuses persistent-agent machinery.
- **Cons**: path-dependent (harder to reproduce); harder to budget tokens; swarm convergence isn't guaranteed without a moderator. Depends on the persistent-agent layer landing first.
- **Verdict**: target end state; not first.
## Consequences
**Positive**
- **Higher-quality adversarial review** — the challenger actually engages the advocate's strongest defence, instead of issuing a parallel monologue.
- **Reusable primitive** — debate mode can be invoked from any skill that today does `subagent({ tasks: [advocate, challenger] })` (currently `advisory-partner`, `brainstorming`, `requesting-code-review`).
- **Determinism drops.** A fire-and-collect batch is reproducible from prompts alone; a debate is path-dependent. Trace recording becomes more important — `.sf/traces/` must capture each round.
- **Synthesis complexity rises** — the parent must summarise a debate transcript, not just collect verdicts. The synthesis prompt itself becomes a tunable artefact.
**Risks and mitigations**
- *Risk:* runaway debate — agents loop without converging.
- *Mitigation:* hard `rounds` cap; convergence heuristic (stop when no new claims appear in a round).
- *Risk:* one agent dominates and silences the others.
- *Mitigation:* moderator role injects a turn-order constraint; per-agent token budget within a round.
- *Risk:* debate quality is only marginally better than parallel-fire-and-collect.
- *Mitigation:* A/B harness — run both modes on the same fixture set, compare verdict accuracy on a benchmark of known good/bad designs. If the lift is <10%accuracy,deferOptionAindefinitely.
- **Persistent inter-agent messaging across runs** — belongs in current `.sf`/DB-backed persistent-agent state when that layer exists; orthogonal to ephemeral swarms.
- **Cross-session swarm replay** — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
- **Human-in-the-loop debate** — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing `discuss` flow).
- Older persistent-agent and inter-agent messaging SPEC notes — external design evidence only; project accepted facts into `.sf`/DB-backed state before treating them as operational.