Merge branch 'main' into feat/gsd-headless-command

This commit is contained in:
TÂCHES 2026-03-16 18:44:18 -06:00 committed by GitHub
commit e0c1cc2f9d
53 changed files with 7593 additions and 208 deletions

View file

@ -6,6 +6,17 @@ Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Added
- **`gsd sessions`** — interactive session picker: lists all saved sessions for the current directory with date, message count, and first-message preview; lets you pick one to resume. Compare with `--continue` which always resumes the most recent session. (#721)
- **10 new browser tools** — shipped from the #698 feature additions: `browser_save_pdf`, `browser_save_state`, `browser_restore_state`, `browser_mock_route`, `browser_block_urls`, `browser_clear_routes`, `browser_emulate_device`, `browser_extract`, `browser_visual_diff`, `browser_zoom_region`, `browser_generate_test`, `browser_check_injection`, `browser_action_cache` (#698)
### Fixed
- Shift-Tab now navigates to previous tab in the workflow visualizer (#717)
- Capture resolutions are now executed after triage instead of only being classified (#714)
- Screenshot constraining uses independent width/height caps to prevent squishing (#725)
- `auto.lock` is written at process startup; remote sessions are now detected in the dashboard (#723)
- Cross-platform test compatibility: use `process.ppid` instead of PID 1
## [2.22.0] - 2026-03-16
### Added

View file

@ -277,6 +277,7 @@ On first run, GSD launches a branded setup wizard that walks you through LLM pro
| `gsd update` | Update GSD to the latest version |
| `gsd headless [cmd]` | Run `/gsd` commands without TUI (CI, cron, scripts) |
| `gsd --continue` (`-c`) | Resume the most recent session for the current directory |
| `gsd sessions` | Interactive session picker — browse and resume any saved session |
---
@ -418,7 +419,7 @@ GSD ships with 14 extensions, all loaded automatically:
| Extension | What it provides |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **GSD** | Core workflow engine, auto mode, commands, dashboard |
| **Browser Tools** | Playwright-based browser with form intelligence, intent-ranked element finding, and semantic actions |
| **Browser Tools** | Playwright-based browser with form intelligence, intent-ranked element finding, semantic actions, PDF export, session state persistence, network mocking, device emulation, structured extraction, visual diffing, region zoom, test code generation, and prompt injection detection |
| **Search the Web** | Brave Search, Tavily, or Jina page extraction |
| **Google Search** | Gemini-powered web search with AI-synthesized answers |
| **Context7** | Up-to-date library/framework documentation |

View file

@ -73,6 +73,7 @@
| `gsd --print "msg"` (`-p`) | Single-shot prompt mode (no TUI) |
| `gsd --mode <text\|json\|rpc\|mcp>` | Output mode for non-interactive use |
| `gsd --list-models [search]` | List available models and exit |
| `gsd sessions` | Interactive session picker — list all saved sessions for the current directory and choose one to resume |
| `gsd --debug` | Enable structured JSONL diagnostic logging for troubleshooting dispatch and state issues |
| `gsd config` | Re-run the setup wizard (LLM provider + tool keys) |
| `gsd update` | Update GSD to the latest version |

View file

@ -126,6 +126,14 @@ gsd --continue # or gsd -c
Resumes the most recent session for the current directory.
To browse and pick from all saved sessions:
```bash
gsd sessions
```
Shows each session's date, message count, and first-message preview so you can choose which one to resume.
## Next Steps
- [Auto Mode](./auto-mode.md) — deep dive into autonomous execution

View file

@ -1,15 +1,15 @@
# Browser-Tools Feature Additions — Implementation Requirements
> Ref: [#698](https://github.com/gsd-build/gsd-2/issues/698)
> Status: Proposal — open for contributor review
> Status: **Shipped** — all 10 features implemented and merged to main
## Current State
Browser-tools ships **47 tools** across 10 modules (~8,300 lines). The extension wraps Playwright's Chromium instance with intent resolution, semantic actions, assertions, state diffing, an action timeline, HAR/trace export, and a deterministic ref system. Context is managed via `lifecycle.ts` (browser/context/page lifecycle) and `state.ts` (session tracking).
Browser-tools shipped **47 tools** across 10 modules (~8,300 lines) at the time this proposal was written. After implementation of these 10 features, the tool count expanded with 13 additional tools (some features map to multiple tools).
Key existing capabilities: `browser_navigate`, `browser_click`, `browser_evaluate`, `browser_assert`, `browser_diff`, `browser_batch`, `browser_find_best`, `browser_act`, `browser_trace_start/stop`, `browser_export_har`, `browser_set_viewport`, `browser_screenshot`, `browser_snapshot_refs`.
Key existing capabilities at proposal time: `browser_navigate`, `browser_click`, `browser_evaluate`, `browser_assert`, `browser_diff`, `browser_batch`, `browser_find_best`, `browser_act`, `browser_trace_start/stop`, `browser_export_har`, `browser_set_viewport`, `browser_screenshot`, `browser_snapshot_refs`.
No existing support for: storage state persistence, route interception, PDF export, structured data extraction, device emulation profiles, visual diffing, or test code generation.
**Implemented tools** (shipped in main): `browser_save_pdf`, `browser_save_state`, `browser_restore_state`, `browser_mock_route`, `browser_block_urls`, `browser_clear_routes`, `browser_emulate_device`, `browser_extract`, `browser_visual_diff`, `browser_zoom_region`, `browser_generate_test`, `browser_check_injection`, `browser_action_cache`.
---

View file

@ -0,0 +1,91 @@
/**
* bash-background.test.ts Tests for rewriteBackgroundCommand
*
* Regression for #733: `cmd &` causes the bash tool to hang indefinitely
* because the background process inherits the piped stdout/stderr and keeps
* them open. rewriteBackgroundCommand injects >/dev/null 2>&1 before & when
* the command does not already redirect stdout.
*/
import { describe, it } from "node:test";
import assert from "node:assert/strict";
import { rewriteBackgroundCommand } from "./bash.js";
describe("rewriteBackgroundCommand", () => {
describe("no-op cases (no & operator)", () => {
it("passes through a plain command unchanged", () => {
const r = rewriteBackgroundCommand("python -m http.server 8080");
assert.equal(r.rewritten, false);
assert.equal(r.command, "python -m http.server 8080");
});
it("passes through a command with && (logical AND)", () => {
const r = rewriteBackgroundCommand("npm install && npm start");
assert.equal(r.rewritten, false);
});
it("passes through a command with & inside a string", () => {
const r = rewriteBackgroundCommand("echo 'foo & bar'");
assert.equal(r.rewritten, false);
});
});
describe("rewrite cases (& backgrounding)", () => {
it("rewrites bare background command", () => {
const r = rewriteBackgroundCommand("python -m http.server 8080 &");
assert.equal(r.rewritten, true);
assert.ok(r.command.includes(">/dev/null 2>&1"), "injects stdout redirect");
assert.ok(r.command.includes("&"), "preserves background operator");
});
it("rewrites background command with trailing whitespace", () => {
const r = rewriteBackgroundCommand("python -m http.server 8080 & ");
assert.equal(r.rewritten, true);
assert.ok(r.command.includes(">/dev/null 2>&1"));
});
it("rewrites background command with & disown", () => {
const r = rewriteBackgroundCommand("node server.js & disown");
assert.equal(r.rewritten, true);
assert.ok(r.command.includes(">/dev/null 2>&1"));
});
it("does NOT double-inject when stdout already redirected (>)", () => {
const r = rewriteBackgroundCommand("python -m http.server 8080 > server.log &");
assert.equal(r.rewritten, false, "already has > redirect");
});
it("does NOT inject when already redirected to /dev/null", () => {
const r = rewriteBackgroundCommand("python -m http.server 8080 >/dev/null 2>&1 &");
assert.equal(r.rewritten, false, "already fully redirected");
});
it("does NOT inject when command uses a pipe", () => {
const r = rewriteBackgroundCommand("python -m http.server 8080 | tee server.log &");
assert.equal(r.rewritten, false, "stdout piped elsewhere");
});
});
describe("compound commands", () => {
it("rewrites only the backgrounded segment in a compound command", () => {
const r = rewriteBackgroundCommand("echo starting; python -m http.server 8080 &");
assert.equal(r.rewritten, true);
assert.ok(r.command.includes(">/dev/null 2>&1 &"));
assert.ok(r.command.includes("echo starting"), "non-background part preserved");
});
it("handles multiple backgrounded commands", () => {
const r = rewriteBackgroundCommand("node server.js &\npython worker.py &");
assert.equal(r.rewritten, true);
const occurrences = (r.command.match(/\/dev\/null/g) ?? []).length;
assert.ok(occurrences >= 2, "both background commands rewritten");
});
});
describe("nohup / already-safe patterns pass through", () => {
it("nohup ... & passes through unchanged (already redirects)", () => {
const r = rewriteBackgroundCommand("nohup python -m http.server 8080 > /dev/null 2>&1 &");
assert.equal(r.rewritten, false);
});
});
});

View file

@ -43,6 +43,71 @@ function getTempFilePath(): string {
return join(tmpdir(), `pi-bash-${id}.log`);
}
/**
* Detect whether a command fragment ends with an unquoted & (background operator).
* Returns true for patterns like: `cmd &`, `cmd arg &`, `cmd & disown`, `(cmd) &`.
* Returns false when & appears inside a string literal or as &&.
*/
function endsWithBackgroundOperator(fragment: string): boolean {
// Remove content inside single-quoted strings to avoid false positives
const stripped = fragment.replace(/'[^']*'/g, "''");
// Match trailing & not preceded by another & (i.e., not &&)
return /(?<!&)&\s*(?:disown\s*)?(?:#.*)?$/.test(stripped.trim());
}
/**
* Determine whether a command segment already redirects stdout away from the terminal.
* Checks for >, >>, &>, |, /dev/null redirects.
*/
function hasOutputRedirect(segment: string): boolean {
// Remove single-quoted strings to avoid matching inside them
const stripped = segment.replace(/'[^']*'/g, "''");
// Match >, >> not preceded by 2 (stderr-only) — we only care about stdout
// Also match &> (combined), >&, or a pipe | which routes stdout elsewhere
return /(?<!\d)(?:>>?|&>|>&|\|)/.test(stripped);
}
/**
* Rewrite a command that uses & for backgrounding so the background process
* does not inherit the bash tool's stdout/stderr pipes.
*
* Without this, `python -m http.server 8080 &` causes the bash tool to hang
* indefinitely because Node.js keeps the pipe open until every process that
* inherited it exits including the long-running server.
*
* The rewrite adds `>/dev/null 2>&1` before each & where stdout is not already
* redirected, ensuring the background process detaches from the pipes while
* still producing a human-readable notice in the tool output.
*
* Returns { command: string; rewritten: boolean }.
*/
export function rewriteBackgroundCommand(command: string): { command: string; rewritten: boolean } {
// Quick pre-check: if there's no & at all, skip the more expensive processing
if (!command.includes("&")) return { command, rewritten: false };
// Split on ; and newlines to handle compound commands.
// We rewrite each segment independently.
// Note: this is intentionally simple and covers the common LLM patterns.
// It does not attempt to parse complex nested subshells.
const segments = command.split(/(?<=[;\n])/);
let anyRewritten = false;
const rewrittenSegments = segments.map((segment) => {
if (!endsWithBackgroundOperator(segment)) return segment;
if (hasOutputRedirect(segment)) return segment;
anyRewritten = true;
// Insert >/dev/null 2>&1 before the trailing & (and optional disown/comment)
return segment.replace(
/(?<!&)(&\s*(?:disown\s*)?(?:#.*)?)$/,
">/dev/null 2>&1 $1",
);
});
if (!anyRewritten) return { command, rewritten: false };
return { command: rewrittenSegments.join(""), rewritten: true };
}
const bashSchema = Type.Object({
command: Type.String({ description: "Bash command to execute" }),
timeout: Type.Optional(Type.Number({ description: "Timeout in seconds (optional, no default timeout)" })),
@ -239,8 +304,25 @@ export function createBashTool(cwd: string, options?: BashToolOptions): AgentToo
}
}
// Rewrite background commands (&) to redirect output away from the pipes.
// Without this, `cmd &` causes the tool to hang because the background
// process inherits the piped stdout/stderr and keeps them open indefinitely.
const bgResult = rewriteBackgroundCommand(command);
const effectiveCommand = bgResult.command;
if (bgResult.rewritten) {
// Surface a brief advisory so the LLM knows what happened.
// The rewrite is transparent for the common case; explicit detachment
// (nohup, start_new_session) is preferred for robustness.
onUpdate?.({
content: [{
type: "text" as const,
text: "Note: Background command output redirected to /dev/null to prevent pipe hang. Use nohup or setsid for reliable detachment.",
}],
details: undefined,
});
}
// Apply command prefix if configured (e.g., "shopt -s expand_aliases" for alias support)
const resolvedCommand = sanitizeCommand(commandPrefix ? `${commandPrefix}\n${command}` : command);
const resolvedCommand = sanitizeCommand(commandPrefix ? `${commandPrefix}\n${effectiveCommand}` : effectiveCommand);
const spawnContext = resolveSpawnContext(resolvedCommand, cwd, spawnHook);
return new Promise((resolve, reject) => {

View file

@ -7,6 +7,7 @@ export {
type BashToolOptions,
bashTool,
createBashTool,
rewriteBackgroundCommand,
} from "./bash.js";
export {
type BashInterceptorRule,

View file

@ -235,6 +235,7 @@ export {
type BashToolInput,
type BashToolOptions,
bashTool,
rewriteBackgroundCommand,
checkBashInterception,
type CompiledInterceptor,
compileInterceptor,

View file

@ -190,7 +190,11 @@ const authStorage = AuthStorage.create(authFilePath)
loadStoredEnvKeys(authStorage)
migratePiCredentials(authStorage)
const modelRegistry = new ModelRegistry(authStorage)
// Resolve models.json path with fallback to ~/.pi/agent/models.json
const { resolveModelsJsonPath } = await import('./models-resolver.js')
const modelsJsonPath = resolveModelsJsonPath()
const modelRegistry = new ModelRegistry(authStorage, modelsJsonPath)
const settingsManager = SettingsManager.create(agentDir)
// Run onboarding wizard on first launch (no LLM provider configured)

View file

@ -1,4 +1,8 @@
interface McpTool {
/**
* Minimal tool interface matching GSD's AgentTool shape.
* Avoids a direct dependency on @gsd/pi-agent-core from this compiled module.
*/
export interface McpToolDef {
name: string
description: string
parameters: Record<string, unknown>
@ -17,8 +21,22 @@ interface McpTool {
// specifiers dynamically so tsc treats them as `any`.
const MCP_PKG = '@modelcontextprotocol/sdk'
/**
* Starts a native MCP (Model Context Protocol) server over stdin/stdout.
*
* This enables GSD's tools (read, write, edit, bash, grep, glob, ls, etc.)
* to be used by external AI clients such as Claude Desktop, VS Code Copilot,
* and any MCP-compatible host.
*
* The server registers all tools from the agent session's tool registry and
* maps MCP tools/list and tools/call requests to GSD tool definitions and
* execution, respectively.
*
* All MCP SDK imports are dynamic to avoid subpath export resolution issues
* with TypeScript's NodeNext module resolution.
*/
export async function startMcpServer(options: {
tools: McpTool[]
tools: McpToolDef[]
version?: string
}): Promise<void> {
const { tools, version = '0.0.0' } = options
@ -31,7 +49,8 @@ export async function startMcpServer(options: {
const StdioServerTransport = stdioMod.StdioServerTransport
const { ListToolsRequestSchema, CallToolRequestSchema } = typesMod
const toolMap = new Map<string, McpTool>()
// Build a lookup map for fast tool resolution on calls
const toolMap = new Map<string, McpToolDef>()
for (const tool of tools) {
toolMap.set(tool.name, tool)
}
@ -41,14 +60,16 @@ export async function startMcpServer(options: {
{ capabilities: { tools: {} } },
)
// tools/list — return every registered GSD tool with its JSON Schema parameters
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: tools.map((t: McpTool) => ({
tools: tools.map((t: McpToolDef) => ({
name: t.name,
description: t.description,
inputSchema: t.parameters,
})),
}))
// tools/call — execute the requested tool and return content blocks
server.setRequestHandler(CallToolRequestSchema, async (request: any) => {
const { name, arguments: args } = request.params
const tool = toolMap.get(name)
@ -60,7 +81,14 @@ export async function startMcpServer(options: {
}
try {
const result = await tool.execute(`mcp-${Date.now()}`, args ?? {}, undefined, undefined)
const result = await tool.execute(
`mcp-${Date.now()}`,
args ?? {},
undefined, // no AbortSignal
undefined, // no onUpdate callback
)
// Convert AgentToolResult content blocks to MCP content format
const content = result.content.map((block: any) => {
if (block.type === 'text') return { type: 'text' as const, text: block.text ?? '' }
if (block.type === 'image') return { type: 'image' as const, data: block.data ?? '', mimeType: block.mimeType ?? 'image/png' }
@ -73,6 +101,7 @@ export async function startMcpServer(options: {
}
})
// Connect to stdin/stdout transport
const transport = new StdioServerTransport()
await server.connect(transport)
process.stderr.write(`[gsd] MCP server started (v${version})\n`)

55
src/models-resolver.ts Normal file
View file

@ -0,0 +1,55 @@
/**
* Models.json resolution with fallback to ~/.pi/agent/models.json
*
* GSD uses ~/.gsd/agent/models.json, but for a smooth migration/development
* experience, this module provides resolution logic that:
*
* 1. Reads ~/.gsd/agent/models.json if it exists
* 2. Falls back to ~/.pi/agent/models.json if GSD file doesn't exist
* 3. Merges both files if both exist (GSD takes precedence)
*/
import { existsSync, readFileSync } from 'node:fs'
import { homedir } from 'node:os'
import { join } from 'node:path'
import { agentDir } from './app-paths.js'
const GSD_MODELS_PATH = join(agentDir, 'models.json')
const PI_MODELS_PATH = join(homedir(), '.pi', 'agent', 'models.json')
/**
* Resolve the path to models.json with fallback logic.
*
* Priority:
* 1. ~/.gsd/agent/models.json (exists) return this path
* 2. ~/.pi/agent/models.json (exists) return this path (fallback)
* 3. Neither exists return GSD path (will be created)
*
* @returns The path to use for models.json
*/
export function resolveModelsJsonPath(): string {
if (existsSync(GSD_MODELS_PATH)) {
return GSD_MODELS_PATH
}
if (existsSync(PI_MODELS_PATH)) {
return PI_MODELS_PATH
}
return GSD_MODELS_PATH
}
/**
* Check if both GSD and PI models.json files exist.
*/
export function hasBothModelsFiles(): boolean {
return existsSync(GSD_MODELS_PATH) && existsSync(PI_MODELS_PATH)
}
/**
* Get the paths to both models.json files.
*/
export function getModelsPaths(): { gsd: string; pi: string } {
return {
gsd: GSD_MODELS_PATH,
pi: PI_MODELS_PATH,
}
}

View file

@ -10,12 +10,19 @@ import {
import type { BgProcess, OutputDigest, OutputLine, GetOutputOptions } from "./types.js";
import {
ERROR_PATTERNS,
ERROR_PATTERN_UNION,
WARNING_PATTERN_UNION,
READINESS_PATTERN_UNION,
BUILD_COMPLETE_PATTERN_UNION,
TEST_RESULT_PATTERN_UNION,
WARNING_PATTERNS,
URL_PATTERN,
PORT_PATTERN,
PORT_PATTERN_SOURCE,
READINESS_PATTERNS,
BUILD_COMPLETE_PATTERNS,
TEST_RESULT_PATTERNS,
LINE_DEDUP_MAX,
} from "./types.js";
import { addEvent, pushAlert } from "./process-manager.js";
import { transitionToReady } from "./readiness-detector.js";
@ -24,8 +31,8 @@ import { formatUptime, formatTimeAgo } from "./utilities.js";
// ── Output Analysis ────────────────────────────────────────────────────────
export function analyzeLine(bg: BgProcess, line: string, stream: "stdout" | "stderr"): void {
// Error detection
if (ERROR_PATTERNS.some(p => p.test(line))) {
// Error detection — single union regex instead of .some(p => p.test(line))
if (ERROR_PATTERN_UNION.test(line)) {
bg.recentErrors.push(line.trim().slice(0, 200)); // Cap line length
if (bg.recentErrors.length > 50) bg.recentErrors.splice(0, bg.recentErrors.length - 50);
@ -40,8 +47,8 @@ export function analyzeLine(bg: BgProcess, line: string, stream: "stdout" | "std
}
}
// Warning detection
if (WARNING_PATTERNS.some(p => p.test(line))) {
// Warning detection — single union regex
if (WARNING_PATTERN_UNION.test(line)) {
bg.recentWarnings.push(line.trim().slice(0, 200));
if (bg.recentWarnings.length > 50) bg.recentWarnings.splice(0, bg.recentWarnings.length - 50);
}
@ -56,9 +63,10 @@ export function analyzeLine(bg: BgProcess, line: string, stream: "stdout" | "std
}
}
// Port extraction
// Port extraction — PORT_PATTERN has /g flag so must be re-created per call
// Use PORT_PATTERN_SOURCE (string) to avoid re-parsing the literal each time
const portRe = new RegExp(PORT_PATTERN_SOURCE, "gi");
let portMatch: RegExpExecArray | null;
const portRe = new RegExp(PORT_PATTERN.source, PORT_PATTERN.flags);
while ((portMatch = portRe.exec(line)) !== null) {
const port = parseInt(portMatch[1], 10);
if (port > 0 && port <= 65535 && !bg.ports.includes(port)) {
@ -71,7 +79,7 @@ export function analyzeLine(bg: BgProcess, line: string, stream: "stdout" | "std
}
}
// Readiness detection
// Readiness detection — single union regex
if (bg.status === "starting") {
// Check custom ready pattern first
if (bg.readyPattern) {
@ -83,14 +91,14 @@ export function analyzeLine(bg: BgProcess, line: string, stream: "stdout" | "std
}
// Check built-in readiness patterns
if (bg.status === "starting" && READINESS_PATTERNS.some(p => p.test(line))) {
if (bg.status === "starting" && READINESS_PATTERN_UNION.test(line)) {
transitionToReady(bg, `Readiness pattern matched: ${line.trim().slice(0, 100)}`);
}
}
// Recovery detection: if we were in error and see a success pattern
if (bg.status === "error") {
if (READINESS_PATTERNS.some(p => p.test(line)) || BUILD_COMPLETE_PATTERNS.some(p => p.test(line))) {
if (READINESS_PATTERN_UNION.test(line) || BUILD_COMPLETE_PATTERN_UNION.test(line)) {
bg.status = "ready";
bg.recentErrors = [];
addEvent(bg, { type: "recovered", detail: "Process recovered from error state" });
@ -98,10 +106,22 @@ export function analyzeLine(bg: BgProcess, line: string, stream: "stdout" | "std
}
}
// Dedup tracking
// Dedup tracking — evict oldest entry when map exceeds LINE_DEDUP_MAX (LRU via Map insertion order)
bg.totalRawLines++;
const lineHash = line.trim().slice(0, 100);
bg.lineDedup.set(lineHash, (bg.lineDedup.get(lineHash) || 0) + 1);
const existing = bg.lineDedup.get(lineHash);
if (existing !== undefined) {
// Re-insert to update insertion order (move to tail = most recent)
bg.lineDedup.delete(lineHash);
bg.lineDedup.set(lineHash, existing + 1);
} else {
if (bg.lineDedup.size >= LINE_DEDUP_MAX) {
// Evict oldest entry (Map iteration order = insertion order = LRU at head)
const oldest = bg.lineDedup.keys().next().value;
if (oldest !== undefined) bg.lineDedup.delete(oldest);
}
bg.lineDedup.set(lineHash, 1);
}
}
// ── Digest Generation ──────────────────────────────────────────────────────
@ -154,12 +174,12 @@ export function getHighlights(bg: BgProcess, maxLines: number = 15): string[] {
for (let i = 0; i < bg.output.length; i++) {
const entry = bg.output[i];
let score = 0;
if (ERROR_PATTERNS.some(p => p.test(entry.line))) score += 10;
if (WARNING_PATTERNS.some(p => p.test(entry.line))) score += 5;
if (ERROR_PATTERN_UNION.test(entry.line)) score += 10;
if (WARNING_PATTERN_UNION.test(entry.line)) score += 5;
if (URL_PATTERN.test(entry.line)) score += 3;
if (READINESS_PATTERNS.some(p => p.test(entry.line))) score += 8;
if (TEST_RESULT_PATTERNS.some(p => p.test(entry.line))) score += 7;
if (BUILD_COMPLETE_PATTERNS.some(p => p.test(entry.line))) score += 6;
if (READINESS_PATTERN_UNION.test(entry.line)) score += 8;
if (TEST_RESULT_PATTERN_UNION.test(entry.line)) score += 7;
if (BUILD_COMPLETE_PATTERN_UNION.test(entry.line)) score += 6;
// Boost recent lines so highlights favor fresh output over stale
if (i >= bg.output.length - 50) score += 2;
if (score > 0) {

View file

@ -39,6 +39,8 @@ export function setPendingAlerts(alerts: string[]): void {
export function addOutputLine(bg: BgProcess, stream: "stdout" | "stderr", line: string): void {
bg.output.push({ stream, line, ts: Date.now() });
if (stream === "stdout") bg.stdoutLineCount++;
else bg.stderrLineCount++;
if (bg.output.length > MAX_BUFFER_LINES) {
const excess = bg.output.length - MAX_BUFFER_LINES;
bg.output.splice(0, excess);
@ -60,8 +62,6 @@ export function pushAlert(bg: BgProcess, message: string): void {
}
export function getInfo(p: BgProcess): BgProcessInfo {
const stdoutLines = p.output.filter(l => l.stream === "stdout").length;
const stderrLines = p.output.filter(l => l.stream === "stderr").length;
return {
id: p.id,
label: p.label,
@ -72,8 +72,8 @@ export function getInfo(p: BgProcess): BgProcessInfo {
exitCode: p.exitCode,
signal: p.signal,
outputLines: p.output.length,
stdoutLines,
stderrLines,
stdoutLines: p.stdoutLineCount,
stderrLines: p.stderrLineCount,
status: p.status,
processType: p.processType,
ports: p.ports,
@ -161,6 +161,8 @@ export function startProcess(opts: StartOptions): BgProcess {
commandHistory: [],
lineDedup: new Map(),
totalRawLines: 0,
stdoutLineCount: 0,
stderrLineCount: 0,
envKeys: Object.keys(opts.env || {}),
restartCount: 0,
startConfig: {

View file

@ -90,10 +90,14 @@ export interface BgProcess {
lastWarningCount: number;
/** Command history for shell-type sessions */
commandHistory: string[];
/** Dedup tracker: hash → count of repeated lines */
/** Dedup tracker: hash → count of repeated lines (capped at LINE_DEDUP_MAX entries) */
lineDedup: Map<string, number>;
/** Total raw lines (before dedup) for token savings calc */
totalRawLines: number;
/** Tracked stdout line count (incremented in addOutputLine, avoids O(n) filter) */
stdoutLineCount: number;
/** Tracked stderr line count (incremented in addOutputLine, avoids O(n) filter) */
stderrLineCount: number;
/** Env snapshot (keys only, no values for security) */
envKeys: string[];
/** Restart count */
@ -163,6 +167,8 @@ export interface ProcessManifest {
export const MAX_BUFFER_LINES = 5000;
export const MAX_EVENTS = 200;
export const DEAD_PROCESS_TTL = 10 * 60 * 1000;
/** Maximum unique entries in the per-process lineDedup Map before LRU eviction. */
export const LINE_DEDUP_MAX = 500;
export const PORT_PROBE_TIMEOUT = 500;
export const READY_POLL_INTERVAL = 250;
export const DEFAULT_READY_TIMEOUT = 30000;
@ -249,3 +255,29 @@ export const BUILD_COMPLETE_PATTERNS: RegExp[] = [
/webpack\s+\d+\.\d+/i,
/bundle\s+(?:is\s+)?ready/i,
];
// ── Compiled union regexes (single-pass alternatives to .some(p => p.test(line))) ──
// Built once at module load — eliminates per-line RegExp construction overhead.
export const ERROR_PATTERN_UNION = new RegExp(
ERROR_PATTERNS.map(p => p.source).join("|"),
"i",
);
export const WARNING_PATTERN_UNION = new RegExp(
WARNING_PATTERNS.map(p => p.source).join("|"),
"i",
);
export const READINESS_PATTERN_UNION = new RegExp(
READINESS_PATTERNS.map(p => p.source).join("|"),
"i",
);
export const BUILD_COMPLETE_PATTERN_UNION = new RegExp(
BUILD_COMPLETE_PATTERNS.map(p => p.source).join("|"),
"i",
);
export const TEST_RESULT_PATTERN_UNION = new RegExp(
TEST_RESULT_PATTERNS.map(p => p.source).join("|"),
"i",
);
/** PORT_PATTERN compiled once for reuse in analyzeLine (needs exec, so must be re-created per call with /g) */
export const PORT_PATTERN_SOURCE = PORT_PATTERN.source;

View file

@ -41,6 +41,8 @@ export interface AutoDashboardData {
profileDowngraded?: boolean;
/** Number of pending captures awaiting triage (0 if none or file missing) */
pendingCaptureCount: number;
/** Cross-process: another auto-mode session detected via auto.lock (PID, startedAt) */
remoteSession?: { pid: number; startedAt: string; unitType: string; unitId: string };
}
// ─── Unit Description Helpers ─────────────────────────────────────────────────

View file

@ -130,6 +130,16 @@ export function verifyExpectedArtifact(unitType: string, unitId: string, base: s
if (!absPath) return unitType === "replan-slice";
if (!existsSync(absPath)) return false;
// plan-slice must produce a plan with actual task entries, not just a scaffold.
// The plan file may exist from a prior discussion/context step with only headings
// but no tasks. Without this check the artifact is considered "complete" and the
// unit gets skipped — but deriveState still returns phase:"planning" because the
// plan has no tasks, creating an infinite skip loop (#699).
if (unitType === "plan-slice") {
const planContent = readFileSync(absPath, "utf-8");
if (!/^- \[[xX ]\] \*\*T\d+:/m.test(planContent)) return false;
}
// execute-task must also have its checkbox marked [x] in the slice plan
if (unitType === "execute-task") {
const parts = unitId.split("/");

View file

@ -223,6 +223,11 @@ const MAX_LIFETIME_DISPATCHES = 6;
/** Tracks recovery attempt count per unit for backoff and diagnostics. */
const unitRecoveryCount = new Map<string, number>();
/** Track consecutive skips per unit catches infinite skip loops where deriveState
* keeps returning the same already-completed unit. Reset on any real dispatch. */
const unitConsecutiveSkips = new Map<string, number>();
const MAX_CONSECUTIVE_SKIPS = 3;
/** Persisted completed-unit keys — survives restarts. Loaded from .gsd/completed-units.json. */
const completedKeySet = new Set<string>();
@ -349,8 +354,12 @@ let lastBaselineCharCount: number | undefined;
/** SIGTERM handler registered while auto-mode is active — cleared on stop/pause. */
let _sigtermHandler: (() => void) | null = null;
/** Tool calls currently being executed — prevents false idle detection during long-running tools. */
const inFlightTools = new Set<string>();
/**
* Tool calls currently being executed prevents false idle detection during long-running tools.
* Maps toolCallId start timestamp (ms) so the idle watchdog can detect tools that have been
* running suspiciously long (e.g., a Bash command hung because `&` kept stdout open).
*/
const inFlightTools = new Map<string, number>();
type BudgetAlertLevel = 0 | 75 | 90 | 100;
@ -429,11 +438,11 @@ export function isAutoPaused(): boolean {
/**
* Mark a tool execution as in-flight. Called from index.ts on tool_execution_start.
* Prevents the idle watchdog from declaring the agent idle while tools are executing.
* Records start time so the idle watchdog can detect tools hung longer than the idle timeout.
*/
export function markToolStart(toolCallId: string): void {
if (!active) return;
inFlightTools.add(toolCallId);
inFlightTools.set(toolCallId, Date.now());
}
/**
@ -443,6 +452,16 @@ export function markToolEnd(toolCallId: string): void {
inFlightTools.delete(toolCallId);
}
/**
* Returns the age (ms) of the oldest currently in-flight tool, or 0 if none.
* Exported for testing.
*/
export function getOldestInFlightToolAgeMs(): number {
if (inFlightTools.size === 0) return 0;
const oldestStart = Math.min(...inFlightTools.values());
return Date.now() - oldestStart;
}
/**
* Return the base path to use for the auto.lock file.
* Always uses the original project root (not the worktree) so that
@ -636,6 +655,7 @@ export async function stopAuto(ctx?: ExtensionContext, pi?: ExtensionAPI): Promi
stepMode = false;
unitDispatchCount.clear();
unitRecoveryCount.clear();
unitConsecutiveSkips.clear();
inFlightTools.clear();
lastBudgetAlertLevel = 0;
unitLifetimeDispatches.clear();
@ -726,6 +746,7 @@ export async function startAuto(
basePath = base;
unitDispatchCount.clear();
unitLifetimeDispatches.clear();
unitConsecutiveSkips.clear();
// Re-initialize metrics in case ledger was lost during pause
if (!getLedger()) initMetrics(base);
// Ensure milestone ID is set on git service for integration branch resolution
@ -798,6 +819,9 @@ export async function startAuto(
pausedSessionFile = null;
}
// Write lock on resume so cross-process status detection works (#723).
writeLock(lockBase(), "resuming", currentMilestoneId ?? "unknown", completedUnits.length);
await dispatchNextUnit(ctx, pi);
return;
}
@ -1004,6 +1028,7 @@ export async function startAuto(
basePath = base;
unitDispatchCount.clear();
unitRecoveryCount.clear();
unitConsecutiveSkips.clear();
lastBudgetAlertLevel = 0;
unitLifetimeDispatches.clear();
completedKeySet.clear();
@ -1133,6 +1158,11 @@ export async function startAuto(
: "Will loop until milestone complete.";
ctx.ui.notify(`${modeLabel} started. ${scopeMsg}`, "info");
// Write initial lock file immediately so cross-process status detection
// works even before the first unit is dispatched (#723).
// The lock is updated with unit-specific info on each dispatch and cleared on stop.
writeLock(lockBase(), "starting", currentMilestoneId ?? "unknown", 0);
// Secrets collection gate — collect pending secrets before first dispatch
const mid = state.activeMilestone!.id;
try {
@ -1585,7 +1615,7 @@ export async function handleAgentEnd(
return;
}
const sessionFile = ctx.sessionManager.getSessionFile();
writeLock(basePath, triageUnitType, triageUnitId, completedUnits.length, sessionFile);
writeLock(lockBase(), triageUnitType, triageUnitId, completedUnits.length, sessionFile);
// Start unit timeout for triage (use same supervisor config as hooks)
clearUnitTimeout();
@ -1931,6 +1961,7 @@ async function dispatchNextUnit(
// Reset stuck detection for new milestone
unitDispatchCount.clear();
unitRecoveryCount.clear();
unitConsecutiveSkips.clear();
unitLifetimeDispatches.clear();
// Clear completed-units.json for the finished milestone
try {
@ -2298,6 +2329,26 @@ async function dispatchNextUnit(
// Cross-validate: does the expected artifact actually exist?
const artifactExists = verifyExpectedArtifact(unitType, unitId, basePath);
if (artifactExists) {
// Guard against infinite skip loops: if deriveState keeps returning the
// same completed unit, consecutive skips will trip this breaker. Evict the
// key so the next dispatch forces full reconciliation instead of looping.
const skipCount = (unitConsecutiveSkips.get(idempotencyKey) ?? 0) + 1;
unitConsecutiveSkips.set(idempotencyKey, skipCount);
if (skipCount > MAX_CONSECUTIVE_SKIPS) {
unitConsecutiveSkips.delete(idempotencyKey);
completedKeySet.delete(idempotencyKey);
removePersistedKey(basePath, idempotencyKey);
invalidateStateCache();
ctx.ui.notify(
`Skip loop detected: ${unitType} ${unitId} skipped ${skipCount} times without advancing. Evicting completion record and forcing reconciliation.`,
"warning",
);
_skipDepth++;
await new Promise(r => setTimeout(r, 50));
await dispatchNextUnit(ctx, pi);
_skipDepth = Math.max(0, _skipDepth - 1);
return;
}
ctx.ui.notify(
`Skipping ${unitType} ${unitId} — already completed in a prior session. Advancing.`,
"info",
@ -2327,6 +2378,24 @@ async function dispatchNextUnit(
persistCompletedKey(basePath, idempotencyKey);
completedKeySet.add(idempotencyKey);
invalidateStateCache();
// Same consecutive-skip guard as the idempotency path above.
const skipCount2 = (unitConsecutiveSkips.get(idempotencyKey) ?? 0) + 1;
unitConsecutiveSkips.set(idempotencyKey, skipCount2);
if (skipCount2 > MAX_CONSECUTIVE_SKIPS) {
unitConsecutiveSkips.delete(idempotencyKey);
completedKeySet.delete(idempotencyKey);
removePersistedKey(basePath, idempotencyKey);
invalidateStateCache();
ctx.ui.notify(
`Skip loop detected: ${unitType} ${unitId} skipped ${skipCount2} times without advancing. Evicting completion record and forcing reconciliation.`,
"warning",
);
_skipDepth++;
await new Promise(r => setTimeout(r, 50));
await dispatchNextUnit(ctx, pi);
_skipDepth = Math.max(0, _skipDepth - 1);
return;
}
ctx.ui.notify(
`Skipping ${unitType} ${unitId} — artifact exists but completion key was missing. Repaired and advancing.`,
"info",
@ -2342,6 +2411,8 @@ async function dispatchNextUnit(
// Pattern A→B→A→B would reset retryCount every time; this map catches it.
const dispatchKey = `${unitType}/${unitId}`;
const prevCount = unitDispatchCount.get(dispatchKey) ?? 0;
// Real dispatch reached — clear the consecutive-skip counter for this unit.
unitConsecutiveSkips.delete(dispatchKey);
debugLog("dispatch-unit", {
type: unitType,
@ -2856,13 +2927,27 @@ async function dispatchNextUnit(
if (Date.now() - runtime.lastProgressAt < idleTimeoutMs) return;
// Agent has tool calls currently executing (await_job, long bash, etc.) —
// not idle, just waiting for tool completion.
// not idle, just waiting for tool completion. But only suppress recovery
// if the tool started recently. A tool in-flight for longer than the idle
// timeout is likely stuck — e.g., `python -m http.server 8080 &` keeps the
// shell's stdout/stderr open, causing the Bash tool to hang indefinitely.
if (inFlightTools.size > 0) {
writeUnitRuntimeRecord(basePath, unitType, unitId, currentUnit.startedAt, {
lastProgressAt: Date.now(),
lastProgressKind: "tool-in-flight",
});
return;
const oldestStart = Math.min(...inFlightTools.values());
const toolAgeMs = Date.now() - oldestStart;
if (toolAgeMs < idleTimeoutMs) {
writeUnitRuntimeRecord(basePath, unitType, unitId, currentUnit.startedAt, {
lastProgressAt: Date.now(),
lastProgressKind: "tool-in-flight",
});
return;
}
// Oldest tool has been running >= idleTimeoutMs — treat as a stuck/hung
// tool (e.g., background process holding stdout open). Fall through to
// idle recovery without resetting the progress clock.
ctx.ui.notify(
`Stalled tool detected: a tool has been in-flight for ${Math.round(toolAgeMs / 60000)}min. Treating as hung — attempting idle recovery.`,
"warning",
);
}
// Before triggering recovery, check if the agent is actually producing
@ -3287,6 +3372,14 @@ export {
buildLoopRemediationSteps,
} from "./auto-recovery.js";
/**
* Test-only: expose skip-loop state for unit tests.
* Not part of the public API.
*/
export function _getUnitConsecutiveSkips(): Map<string, number> { return unitConsecutiveSkips; }
export function _resetUnitConsecutiveSkips(): void { unitConsecutiveSkips.clear(); }
export { MAX_CONSECUTIVE_SKIPS };
/**
* Dispatch a hook unit directly, bypassing normal pre-dispatch hooks.
* Used for manual hook triggers via /gsd run-hook.

View file

@ -319,16 +319,23 @@ export class GSDDashboardOverlay {
const centered = (content: string) => row(centerLine(content, contentWidth));
const title = th.fg("accent", th.bold("GSD Dashboard"));
const isRemote = !!this.dashData.remoteSession;
const status = this.dashData.active
? `${Date.now() % 2000 < 1000 ? th.fg("success", "●") : th.fg("dim", "○")} ${th.fg("success", "AUTO")}`
: this.dashData.paused
? th.fg("warning", "⏸ PAUSED")
: th.fg("dim", "idle");
: isRemote
? `${Date.now() % 2000 < 1000 ? th.fg("success", "●") : th.fg("dim", "○")} ${th.fg("success", "AUTO")} ${th.fg("dim", `(PID ${this.dashData.remoteSession!.pid})`)}`
: th.fg("dim", "idle");
const worktreeName = getActiveWorktreeName();
const worktreeTag = worktreeName
? ` ${th.fg("warning", `${worktreeName}`)}`
: "";
const elapsed = th.fg("dim", formatDuration(this.dashData.elapsed));
const elapsed = this.dashData.active || this.dashData.paused
? th.fg("dim", formatDuration(this.dashData.elapsed))
: isRemote
? th.fg("dim", `since ${this.dashData.remoteSession!.startedAt.replace("T", " ").slice(0, 19)}`)
: "";
lines.push(row(joinColumns(`${title} ${status}${worktreeTag}`, elapsed, contentWidth)));
lines.push(blank());
@ -344,6 +351,13 @@ export class GSDDashboardOverlay {
} else if (this.dashData.paused) {
lines.push(row(th.fg("dim", "/gsd auto to resume")));
lines.push(blank());
} else if (isRemote) {
const rs = this.dashData.remoteSession!;
const unitDisplay = rs.unitType === "starting" || rs.unitType === "resuming"
? rs.unitType
: `${unitLabel(rs.unitType)} ${rs.unitId}`;
lines.push(row(th.fg("text", `Remote session: ${unitDisplay}`)));
lines.push(blank());
} else {
lines.push(row(th.fg("dim", "No unit running · /gsd auto to start")));
lines.push(blank());

View file

@ -6,7 +6,7 @@
* Standalone module: only imports node:child_process and node:path.
*/
import { execFileSync } from "node:child_process";
import { execFileSync, execFile } from "node:child_process";
import { resolve } from "node:path";
// ─── Types ──────────────────────────────────────────────────────────────────
@ -32,10 +32,23 @@ const EXEC_OPTS = {
stdio: ["pipe", "pipe", "pipe"] as ["pipe", "pipe", "pipe"],
};
function git(args: string[], cwd: string): string {
/** Synchronous git — used where sequential control flow is required (fallback paths). */
function gitSync(args: string[], cwd: string): string {
return execFileSync("git", args, { ...EXEC_OPTS, cwd }).trim();
}
/** Async git — returns stdout on success, empty string on any error. */
function gitAsync(args: string[], cwd: string): Promise<string> {
return new Promise((resolve) => {
execFile(
"git",
args,
{ encoding: "utf-8", timeout: 5000, cwd },
(err, stdout) => resolve(err ? "" : stdout.trim()),
);
});
}
function splitLines(output: string): string[] {
return output
.split("\n")
@ -49,6 +62,8 @@ function splitLines(output: string): string[] {
* Returns recently-changed file paths, deduplicated and sorted by recency
* (most recent first). Combines committed diffs, staged changes, and
* unstaged/untracked files from `git status`.
*
* The three git queries (log, diff --cached, status) run concurrently.
*/
export async function getRecentlyChangedFiles(
cwd: string,
@ -59,40 +74,23 @@ export async function getRecentlyChangedFiles(
const dir = resolve(cwd);
try {
// 1. Committed changes in the last N commits (or since sinceDays)
let committedFiles: string[] = [];
try {
const days = Math.max(1, Math.floor(Number(sinceDays)));
if (!Number.isFinite(days)) throw new Error("invalid sinceDays");
const raw = git(["log", "--diff-filter=ACMR", "--name-only", "--pretty=format:", `--since=${days} days ago`], dir);
committedFiles = splitLines(raw);
} catch {
// Fallback: use HEAD~10
try {
const raw = git(["diff", "--name-only", "HEAD~10"], dir);
committedFiles = splitLines(raw);
} catch {
// Shallow clone or <10 commits — ignore
}
}
const days = Math.max(1, Math.floor(Number(sinceDays)));
if (!Number.isFinite(days)) throw new Error("invalid sinceDays");
// 2. Staged changes
let stagedFiles: string[] = [];
try {
const raw = git(["diff", "--cached", "--name-only"], dir);
stagedFiles = splitLines(raw);
} catch {
// ignore
}
// Run all three queries concurrently — they read independent git state
const [logRaw, stagedRaw, statusRaw] = await Promise.all([
// 1. Committed changes since N days ago (fallback to HEAD~10 on error)
gitAsync(["log", "--diff-filter=ACMR", "--name-only", "--pretty=format:", `--since=${days} days ago`], dir)
.then((out) => out || gitAsync(["diff", "--name-only", "HEAD~10"], dir)),
// 2. Staged changes
gitAsync(["diff", "--cached", "--name-only"], dir),
// 3. Unstaged / untracked
gitAsync(["status", "--porcelain"], dir),
]);
// 3. Unstaged / untracked via porcelain status
let statusFiles: string[] = [];
try {
const raw = git(["status", "--porcelain"], dir);
statusFiles = splitLines(raw).map((line) => line.slice(3)); // strip XY + space
} catch {
// ignore
}
const committedFiles = splitLines(logRaw);
const stagedFiles = splitLines(stagedRaw);
const statusFiles = splitLines(statusRaw).map((line) => line.slice(3)); // strip XY + space
// Deduplicate, preserving insertion order (most-recent-first: status → staged → committed)
const seen = new Set<string>();
@ -113,6 +111,9 @@ export async function getRecentlyChangedFiles(
/**
* Returns richer change metadata: change type and approximate line counts.
*
* The three git queries (diff --cached --numstat, diff --numstat, status --porcelain)
* run concurrently they read independent git state.
*/
export async function getChangedFilesWithContext(
cwd: string,
@ -120,6 +121,13 @@ export async function getChangedFilesWithContext(
const dir = resolve(cwd);
try {
// Run all three queries concurrently
const [cachedNumstat, unstagedNumstat, statusRaw] = await Promise.all([
gitAsync(["diff", "--cached", "--numstat"], dir),
gitAsync(["diff", "--numstat"], dir),
gitAsync(["status", "--porcelain"], dir),
]);
const result: ChangedFileInfo[] = [];
const seen = new Set<string>();
@ -131,57 +139,42 @@ export async function getChangedFilesWithContext(
};
// 1. Staged files with numstat
try {
const numstat = git(["diff", "--cached", "--numstat"], dir);
for (const line of splitLines(numstat)) {
const [added, deleted, filePath] = line.split("\t");
if (!filePath) continue;
const lines =
added === "-" || deleted === "-"
? undefined
: Number(added) + Number(deleted);
add({ path: filePath, changeType: "staged", linesChanged: lines });
}
} catch {
// ignore
for (const line of splitLines(cachedNumstat)) {
const [added, deleted, filePath] = line.split("\t");
if (!filePath) continue;
const lines =
added === "-" || deleted === "-"
? undefined
: Number(added) + Number(deleted);
add({ path: filePath, changeType: "staged", linesChanged: lines });
}
// 2. Unstaged modifications with numstat
try {
const numstat = git(["diff", "--numstat"], dir);
for (const line of splitLines(numstat)) {
const [added, deleted, filePath] = line.split("\t");
if (!filePath) continue;
const lines =
added === "-" || deleted === "-"
? undefined
: Number(added) + Number(deleted);
add({ path: filePath, changeType: "modified", linesChanged: lines });
}
} catch {
// ignore
for (const line of splitLines(unstagedNumstat)) {
const [added, deleted, filePath] = line.split("\t");
if (!filePath) continue;
const lines =
added === "-" || deleted === "-"
? undefined
: Number(added) + Number(deleted);
add({ path: filePath, changeType: "modified", linesChanged: lines });
}
// 3. Untracked / deleted from porcelain status
try {
const raw = git(["status", "--porcelain"], dir);
for (const line of splitLines(raw)) {
const code = line.slice(0, 2);
const filePath = line.slice(3);
if (seen.has(filePath)) continue;
for (const line of splitLines(statusRaw)) {
const code = line.slice(0, 2);
const filePath = line.slice(3);
if (seen.has(filePath)) continue;
if (code.includes("?")) {
add({ path: filePath, changeType: "added" });
} else if (code.includes("D")) {
add({ path: filePath, changeType: "deleted" });
} else if (code.includes("A")) {
add({ path: filePath, changeType: "added" });
} else {
add({ path: filePath, changeType: "modified" });
}
if (code.includes("?")) {
add({ path: filePath, changeType: "added" });
} else if (code.includes("D")) {
add({ path: filePath, changeType: "deleted" });
} else if (code.includes("A")) {
add({ path: filePath, changeType: "added" });
} else {
add({ path: filePath, changeType: "modified" });
}
} catch {
// ignore
}
return result;

View file

@ -41,7 +41,8 @@ export type DoctorIssueCode =
| "activity_log_bloat"
| "state_file_stale"
| "state_file_missing"
| "gitignore_missing_patterns";
| "gitignore_missing_patterns"
| "unresolvable_dependency";
export interface DoctorIssue {
severity: DoctorSeverity;
@ -1041,6 +1042,24 @@ export async function runGSDDoctor(basePath: string, options?: { fix?: boolean;
});
}
// Check for unresolvable dependency IDs — catches range syntax like "S01-S04"
// that the parser expanded but that don't match any actual slice in the roadmap.
// Also catches plain typos or IDs referencing slices not yet defined.
const knownSliceIds = new Set(roadmap.slices.map(s => s.id));
for (const dep of slice.depends) {
if (!knownSliceIds.has(dep)) {
issues.push({
severity: "warning",
code: "unresolvable_dependency",
scope: "slice",
unitId,
message: `Slice ${unitId} depends on "${dep}" which is not a slice ID in this roadmap. This permanently blocks the slice. Use comma-separated IDs: \`depends:[S01,S02]\``,
file: relMilestoneFile(basePath, milestoneId, "ROADMAP"),
fixable: false,
});
}
}
const slicePath = resolveSlicePath(basePath, milestoneId, slice.id);
if (!slicePath) continue;

View file

@ -27,6 +27,7 @@ import { isAutoActive } from "./auto.js";
import { loadPrompt } from "./prompt-loader.js";
import { gsdRoot } from "./paths.js";
import { formatDuration } from "./history.js";
import { getAutoWorktreePath } from "./auto-worktree.js";
// ─── Types ────────────────────────────────────────────────────────────────────
@ -54,6 +55,7 @@ interface ForensicReport {
basePath: string;
activeMilestone: string | null;
activeSlice: string | null;
activeWorktree: string | null;
unitTraces: UnitTrace[];
metrics: MetricsLedger | null;
completedKeys: string[];
@ -143,8 +145,11 @@ async function buildForensicReport(basePath: string): Promise<ForensicReport> {
activeSlice = state.activeSlice?.id ?? null;
} catch { /* state derivation failure is non-fatal */ }
// 2. Scan activity logs (last 5)
const unitTraces = scanActivityLogs(basePath);
// 1b. Check for active auto-worktree
const activeWorktree = activeMilestone ? getAutoWorktreePath(basePath, activeMilestone) : null;
// 2. Scan activity logs (last 5) — worktree-aware
const unitTraces = scanActivityLogs(basePath, activeMilestone);
// 3. Load metrics
const metrics = loadLedgerFromDisk(basePath);
@ -178,20 +183,16 @@ async function buildForensicReport(basePath: string): Promise<ForensicReport> {
}
}
// 8. GSD version
let gsdVersion = "unknown";
try {
const pkgPath = join(dirname(fileURLToPath(import.meta.url)), "../../../../package.json");
if (existsSync(pkgPath)) {
gsdVersion = JSON.parse(readFileSync(pkgPath, "utf-8")).version ?? "unknown";
}
} catch { /* non-fatal */ }
// 8. GSD version — use GSD_VERSION env var set by the loader at startup.
// Extensions run from ~/.gsd/agent/extensions/gsd/ at runtime, so path-traversal
// from import.meta.url would resolve to ~/package.json (wrong on every system).
const gsdVersion = process.env.GSD_VERSION || "unknown";
// 9. Run anomaly detectors
if (metrics?.units) detectStuckLoops(metrics.units, anomalies);
if (metrics?.units) detectCostSpikes(metrics.units, anomalies);
detectTimeouts(unitTraces, anomalies);
detectMissingArtifacts(completedKeys, basePath, anomalies);
detectMissingArtifacts(completedKeys, basePath, activeMilestone, anomalies);
detectCrash(crashLock, anomalies);
detectDoctorIssues(doctorIssues, anomalies);
detectErrorTraces(unitTraces, anomalies);
@ -202,6 +203,7 @@ async function buildForensicReport(basePath: string): Promise<ForensicReport> {
basePath,
activeMilestone,
activeSlice,
activeWorktree: activeWorktree ? relative(basePath, activeWorktree) : null,
unitTraces,
metrics,
completedKeys,
@ -216,48 +218,78 @@ async function buildForensicReport(basePath: string): Promise<ForensicReport> {
const ACTIVITY_FILENAME_RE = /^(\d+)-(.+?)-(.+)\.jsonl$/;
function scanActivityLogs(basePath: string): UnitTrace[] {
const activityDir = join(gsdRoot(basePath), "activity");
if (!existsSync(activityDir)) return [];
function scanActivityLogs(basePath: string, activeMilestone?: string | null): UnitTrace[] {
const activityDirs = resolveActivityDirs(basePath, activeMilestone);
const allTraces: UnitTrace[] = [];
const files = readdirSync(activityDir).filter(f => f.endsWith(".jsonl")).sort();
const lastFiles = files.slice(-5);
const traces: UnitTrace[] = [];
for (const activityDir of activityDirs) {
if (!existsSync(activityDir)) continue;
for (const file of lastFiles) {
const match = ACTIVITY_FILENAME_RE.exec(file);
if (!match) continue;
const files = readdirSync(activityDir).filter(f => f.endsWith(".jsonl")).sort();
const lastFiles = files.slice(-5);
const seq = parseInt(match[1]!, 10);
const unitType = match[2]!;
const unitId = match[3]!;
const filePath = join(activityDir, file);
for (const file of lastFiles) {
const match = ACTIVITY_FILENAME_RE.exec(file);
if (!match) continue;
let entries: unknown[] = [];
const nativeResult = nativeParseJsonlTail(filePath, MAX_JSONL_BYTES);
if (nativeResult) {
entries = nativeResult.entries;
} else {
try {
const raw = readFileSync(filePath, "utf-8");
entries = parseJSONL(raw);
} catch { continue; }
const seq = parseInt(match[1]!, 10);
const unitType = match[2]!;
const unitId = match[3]!;
const filePath = join(activityDir, file);
let entries: unknown[] = [];
const nativeResult = nativeParseJsonlTail(filePath, MAX_JSONL_BYTES);
if (nativeResult) {
entries = nativeResult.entries;
} else {
try {
const raw = readFileSync(filePath, "utf-8");
entries = parseJSONL(raw);
} catch { continue; }
}
const trace = extractTrace(entries);
const stat = statSync(filePath, { throwIfNoEntry: false });
allTraces.push({
file: activityDirs.length > 1 ? `[${relative(basePath, activityDir)}] ${file}` : file,
unitType,
unitId,
seq,
trace,
mtime: stat?.mtimeMs ?? 0,
});
}
const trace = extractTrace(entries);
const stat = statSync(filePath, { throwIfNoEntry: false });
traces.push({
file,
unitType,
unitId,
seq,
trace,
mtime: stat?.mtimeMs ?? 0,
});
}
return traces.sort((a, b) => b.seq - a.seq);
// Sort by mtime descending so the most recent traces (regardless of source) come first
return allTraces.sort((a, b) => b.mtime - a.mtime).slice(0, 5);
}
/**
* Resolve activity directories to scan for forensics.
* If an active auto-worktree exists for the milestone, its activity dir
* is included first (preferred) so stale root logs don't mask worktree progress.
*/
function resolveActivityDirs(basePath: string, activeMilestone?: string | null): string[] {
const dirs: string[] = [];
// Check for active auto-worktree activity logs
if (activeMilestone) {
const wtPath = getAutoWorktreePath(basePath, activeMilestone);
if (wtPath) {
const wtActivityDir = join(wtPath, ".gsd", "activity");
if (existsSync(wtActivityDir)) {
dirs.push(wtActivityDir);
}
}
}
// Always include root activity logs
const rootActivityDir = join(gsdRoot(basePath), "activity");
dirs.push(rootActivityDir);
return dirs;
}
// ─── Completed Keys Loader ────────────────────────────────────────────────────
@ -336,21 +368,27 @@ function detectTimeouts(traces: UnitTrace[], anomalies: ForensicAnomaly[]): void
}
}
function detectMissingArtifacts(completedKeys: string[], basePath: string, anomalies: ForensicAnomaly[]): void {
function detectMissingArtifacts(completedKeys: string[], basePath: string, activeMilestone: string | null, anomalies: ForensicAnomaly[]): void {
// Also check the worktree path for artifacts — they may exist there but not at root
const wtBasePath = activeMilestone ? getAutoWorktreePath(basePath, activeMilestone) : null;
for (const key of completedKeys) {
const slashIdx = key.indexOf("/");
if (slashIdx === -1) continue;
const unitType = key.slice(0, slashIdx);
const unitId = key.slice(slashIdx + 1);
if (!verifyExpectedArtifact(unitType, unitId, basePath)) {
const rootHasArtifact = verifyExpectedArtifact(unitType, unitId, basePath);
const wtHasArtifact = wtBasePath ? verifyExpectedArtifact(unitType, unitId, wtBasePath) : false;
if (!rootHasArtifact && !wtHasArtifact) {
anomalies.push({
type: "missing-artifact",
severity: "error",
unitType,
unitId,
summary: `Completed key ${key} but artifact missing or invalid`,
details: `The unit is recorded as completed but verifyExpectedArtifact() returns false. The completion state is stale.`,
details: `The unit is recorded as completed but verifyExpectedArtifact() returns false at both project root and worktree. The completion state is stale.`,
});
}
}
@ -416,6 +454,7 @@ function saveForensicReport(basePath: string, report: ForensicReport, problemDes
`**GSD Version:** ${report.gsdVersion}`,
`**Active Milestone:** ${report.activeMilestone ?? "none"}`,
`**Active Slice:** ${report.activeSlice ?? "none"}`,
`**Active Worktree:** ${report.activeWorktree ?? "none"}`,
``,
`## Problem Description`,
``,
@ -559,6 +598,10 @@ function formatReportForPrompt(report: ForensicReport): string {
sections.push(`### GSD Version: ${report.gsdVersion}`);
sections.push(`### Active Milestone: ${report.activeMilestone ?? "none"}`);
sections.push(`### Active Slice: ${report.activeSlice ?? "none"}`);
if (report.activeWorktree) {
sections.push(`### Active Worktree: ${report.activeWorktree}`);
sections.push(`Note: Activity logs were scanned from both the worktree and the project root. Worktree logs take priority.`);
}
let result = sections.join("\n");
if (result.length > MAX_BYTES) {

View file

@ -821,8 +821,9 @@ export async function showDiscuss(
if (choice === "discuss_draft") {
const discussMilestoneTemplates = inlineTemplate("context", "Context");
const structuredQuestionsAvailable = pi.getActiveTools().includes("ask_user_questions") ? "true" : "false";
const basePrompt = loadPrompt("guided-discuss-milestone", {
milestoneId: mid, milestoneTitle, inlinedTemplates: discussMilestoneTemplates,
milestoneId: mid, milestoneTitle, inlinedTemplates: discussMilestoneTemplates, structuredQuestionsAvailable,
});
const seed = draftContent
? `${basePrompt}\n\n## Prior Discussion (Draft Seed)\n\n${draftContent}`
@ -831,9 +832,10 @@ export async function showDiscuss(
dispatchWorkflow(pi, seed, "gsd-discuss");
} else if (choice === "discuss_fresh") {
const discussMilestoneTemplates = inlineTemplate("context", "Context");
const structuredQuestionsAvailable = pi.getActiveTools().includes("ask_user_questions") ? "true" : "false";
pendingAutoStart = { ctx, pi, basePath, milestoneId: mid, step: false };
dispatchWorkflow(pi, loadPrompt("guided-discuss-milestone", {
milestoneId: mid, milestoneTitle, inlinedTemplates: discussMilestoneTemplates,
milestoneId: mid, milestoneTitle, inlinedTemplates: discussMilestoneTemplates, structuredQuestionsAvailable,
}), "gsd-discuss");
} else if (choice === "skip_milestone") {
const milestoneIds = findMilestoneIds(basePath);
@ -1136,8 +1138,9 @@ export async function showSmartEntry(
if (choice === "discuss_draft") {
const discussMilestoneTemplates = inlineTemplate("context", "Context");
const structuredQuestionsAvailable = pi.getActiveTools().includes("ask_user_questions") ? "true" : "false";
const basePrompt = loadPrompt("guided-discuss-milestone", {
milestoneId, milestoneTitle, inlinedTemplates: discussMilestoneTemplates,
milestoneId, milestoneTitle, inlinedTemplates: discussMilestoneTemplates, structuredQuestionsAvailable,
});
const seed = draftContent
? `${basePrompt}\n\n## Prior Discussion (Draft Seed)\n\n${draftContent}`
@ -1146,9 +1149,10 @@ export async function showSmartEntry(
dispatchWorkflow(pi, seed, "gsd-discuss");
} else if (choice === "discuss_fresh") {
const discussMilestoneTemplates = inlineTemplate("context", "Context");
const structuredQuestionsAvailable = pi.getActiveTools().includes("ask_user_questions") ? "true" : "false";
pendingAutoStart = { ctx, pi, basePath, milestoneId, step: stepMode };
dispatchWorkflow(pi, loadPrompt("guided-discuss-milestone", {
milestoneId, milestoneTitle, inlinedTemplates: discussMilestoneTemplates,
milestoneId, milestoneTitle, inlinedTemplates: discussMilestoneTemplates, structuredQuestionsAvailable,
}), "gsd-discuss");
} else if (choice === "skip_milestone") {
const milestoneIds = findMilestoneIds(basePath);
@ -1220,8 +1224,9 @@ export async function showSmartEntry(
}));
} else if (choice === "discuss") {
const discussMilestoneTemplates = inlineTemplate("context", "Context");
const structuredQuestionsAvailable = pi.getActiveTools().includes("ask_user_questions") ? "true" : "false";
dispatchWorkflow(pi, loadPrompt("guided-discuss-milestone", {
milestoneId, milestoneTitle, inlinedTemplates: discussMilestoneTemplates,
milestoneId, milestoneTitle, inlinedTemplates: discussMilestoneTemplates, structuredQuestionsAvailable,
}));
} else if (choice === "skip_milestone") {
const milestoneIds = findMilestoneIds(basePath);

View file

@ -1,15 +1,24 @@
// @ts-ignore — @modelcontextprotocol/sdk types may not be in extensions tsconfig
import { Server } from '@modelcontextprotocol/sdk/server'
// @ts-ignore
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio'
// @ts-ignore
import { ListToolsRequestSchema, CallToolRequestSchema } from '@modelcontextprotocol/sdk/types'
/**
* MCP (Model Context Protocol) server for the GSD extension.
*
* This module provides the same MCP server functionality as src/mcp-server.ts
* but can be loaded via jiti in the extension runtime context. It enables
* GSD's tools to be used by external AI clients (Claude Desktop, VS Code
* Copilot, etc.) via the MCP standard protocol over stdin/stdout.
*/
interface McpTool {
name: string
description: string
parameters: Record<string, unknown>
execute(toolCallId: string, params: Record<string, unknown>, signal?: AbortSignal, onUpdate?: unknown): Promise<{ content: Array<{ type: string; text?: string; data?: string; mimeType?: string }> }>
execute(
toolCallId: string,
params: Record<string, unknown>,
signal?: AbortSignal,
onUpdate?: unknown,
): Promise<{
content: Array<{ type: string; text?: string; data?: string; mimeType?: string }>
}>
}
export async function startMcpServer(options: {
@ -18,6 +27,16 @@ export async function startMcpServer(options: {
}): Promise<void> {
const { tools, version = '0.0.0' } = options
// Dynamic imports — MCP SDK subpath exports use a "./*" wildcard pattern
// that cannot be statically resolved by all TypeScript configurations.
// @ts-ignore
const { Server } = await import('@modelcontextprotocol/sdk/server')
// @ts-ignore
const { StdioServerTransport } = await import('@modelcontextprotocol/sdk/server/stdio.js')
// @ts-ignore
const sdkTypes = await import('@modelcontextprotocol/sdk/types')
const { ListToolsRequestSchema, CallToolRequestSchema } = sdkTypes
const toolMap = new Map<string, McpTool>()
for (const tool of tools) {
toolMap.set(tool.name, tool)
@ -28,9 +47,10 @@ export async function startMcpServer(options: {
{ capabilities: { tools: {} } },
)
// tools/list — return every registered GSD tool with its JSON Schema parameters
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: tools.map((t) => ({
tools: tools.map((t: McpTool) => ({
name: t.name,
description: t.description,
inputSchema: t.parameters,
@ -38,6 +58,7 @@ export async function startMcpServer(options: {
}
})
// tools/call — execute the requested tool and return content blocks
server.setRequestHandler(CallToolRequestSchema, async (request: any) => {
const { name, arguments: args } = request.params
const tool = toolMap.get(name)
@ -56,15 +77,15 @@ export async function startMcpServer(options: {
undefined,
)
const content = result.content.map((block) => {
const content = result.content.map((block: any) => {
if (block.type === 'text') {
return { type: 'text' as const, text: block.text }
return { type: 'text' as const, text: block.text ?? '' }
}
if (block.type === 'image') {
return {
type: 'image' as const,
data: block.data,
mimeType: block.mimeType,
data: block.data ?? '',
mimeType: block.mimeType ?? 'image/png',
}
}
return { type: 'text' as const, text: JSON.stringify(block) }

View file

@ -31,6 +31,11 @@ Then:
3. Build the real thing. If the task plan says "create login endpoint", build an endpoint that actually authenticates against a real store, not one that returns a hardcoded success response. If the task plan says "create dashboard page", build a page that renders real data from the API, not a component with hardcoded props. Stubs and mocks are for tests, not for the shipped feature.
4. Write or update tests as part of execution — tests are verification, not an afterthought. If the slice plan defines test files in its Verification section and this is the first task, create them (they should initially fail).
5. When implementing non-trivial runtime behavior (async flows, API boundaries, background processes, error paths), add or preserve agent-usable observability. Skip this for simple changes where it doesn't apply.
**Background process rule:** Never use bare `command &` to run background processes. The shell's `&` operator leaves stdout/stderr attached to the parent, which causes the Bash tool to hang indefinitely waiting for those streams to close. Always redirect output before backgrounding:
- Correct: `command > /dev/null 2>&1 &` or `nohup command > /dev/null 2>&1 &`
- Example: `python -m http.server 8080 > /dev/null 2>&1 &` (NOT `python -m http.server 8080 &`)
- Preferred: use the `bg_shell` tool if available — it manages process lifecycle correctly without stream-inheritance issues
6. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
7. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
8. If the task touches UI, browser flows, DOM behavior, or user-visible web state:

View file

@ -1,5 +1,108 @@
Discuss milestone {{milestoneId}} ("{{milestoneTitle}}"). Identify gray areas, ask the user about them, and write `{{milestoneId}}-CONTEXT.md` in the milestone directory with the decisions. Use the **Context** output template below. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow; do not override required artifact rules.
**Structured questions available: {{structuredQuestionsAvailable}}**
{{inlinedTemplates}}
**Investigate between question rounds to make your questions smarter.** Before each round of questions, do enough lightweight research that your questions are grounded in reality — not guesses about what exists or what's possible. Check library docs (`resolve_library`/`get_library_docs`) when tech choices are relevant, search the web (`search-the-web` with `freshness`/`domain` filters, then `fetch_page` for full content) to verify the landscape, scout the codebase (`rg`, `find`, `scout`) to understand what already exists. Don't go deep — just enough that your next question reflects what's actually true. The goal is to ask questions the user can't answer by saying "did you check the docs?" or "look at the code."
---
## Interview Protocol
### Before your first question round
Do a lightweight targeted investigation so your questions are grounded in reality:
- Scout the codebase (`rg`, `find`, or `scout`) to understand what already exists that this milestone touches or builds on
- Check the roadmap context above (if present) to understand what surrounds this milestone
- Identify the 35 biggest behavioural and architectural unknowns: things where the user's answer will materially change what gets built
Do **not** go deep — just enough that your questions reflect what's actually true rather than what you assume.
### Question rounds
Ask **13 questions per round**. Keep each question focused on one of:
- **What they're building** — concrete enough to explain to a stranger
- **Why it needs to exist** — the problem it solves or the desire it fulfills
- **Who it's for** — user, team, themselves
- **What "done" looks like** — observable outcomes, not abstract goals
- **The biggest technical unknowns / risks** — what could fail, what hasn't been proven
- **What external systems/services this touches** — APIs, databases, third-party services
**If `{{structuredQuestionsAvailable}}` is `true`:** use `ask_user_questions` for each round. 13 questions per call, each as a separate question object. Keep option labels short (35 words). Always include a freeform "Other / let me explain" option. When the user picks that option or writes a long freeform answer, switch to plain text follow-up for that thread before resuming structured questions.
**If `{{structuredQuestionsAvailable}}` is `false`:** ask questions in plain text. Keep each round to 13 focused questions. Wait for answers before asking the next round.
After the user answers, investigate further if any answer opens a new unknown, then ask the next round.
### Check-in after each round
After each round of answers, ask:
> "I think I have a solid picture of this milestone. Ready to wrap up and write the context file, or is there more to cover?"
**If `{{structuredQuestionsAvailable}}` is `true`:** use `ask_user_questions` with options:
- "Wrap up — write the context file" *(recommended after ~23 rounds)*
- "Keep going — more to discuss"
**If `{{structuredQuestionsAvailable}}` is `false`:** ask in plain text.
If the user wants to keep going, keep asking. Stop when they say wrap up.
---
## Questioning philosophy
**Start open, follow energy.** Let the user's enthusiasm guide where you dig deeper.
**Challenge vagueness, make abstract concrete.** When the user says something abstract ("it should be smart" / "good UX"), push for specifics.
**Questions must be about the experience, not the implementation.** Never ask "what auth provider?" — ask "when someone logs in, what should that feel like?" Implementation is your job. Understanding what they want to experience is the discussion's job.
**Position-first framing.** Have opinions. "I'd lean toward X because Y — does that match your thinking?" is better than "what do you think about X vs Y?"
**Negative constraints.** Ask what would disappoint them. What they explicitly don't want. Negative constraints are sharper than positive wishes.
**Anti-patterns — never do these:**
- Checklist walking through predetermined topics regardless of what the user said
- Canned generic questions that could apply to any project
- Corporate speak ("What are your key success metrics?")
- Rapid-fire questions without acknowledging answers
- Asking about technical skill level
---
## Depth Verification
Before moving to the wrap-up gate, verify you have covered:
- [ ] What they're building — concrete enough to explain to a stranger
- [ ] Why it needs to exist
- [ ] Who it's for
- [ ] What "done" looks like
- [ ] The biggest technical unknowns / risks
- [ ] What external systems/services this touches
**Print a structured depth summary in chat first** — using the user's own terminology. Cover what you understood, what shaped your understanding, and any areas of remaining uncertainty.
**Then confirm:**
**If `{{structuredQuestionsAvailable}}` is `true`:** use `ask_user_questions` with:
- header: "Depth Check"
- question: "Did I capture the depth right?"
- options: "Yes, you got it (Recommended)", "Not quite — let me clarify"
- **The question ID must contain `depth_verification`** (e.g. `depth_verification_confirm`) — this enables the write-gate downstream.
**If `{{structuredQuestionsAvailable}}` is `false`:** ask in plain text: "Did I capture that correctly? Anything I missed?" Wait for confirmation before proceeding.
If they clarify, absorb the correction and re-verify.
---
## Output
Once the user confirms depth:
1. Use the **Context** output template below
2. `mkdir -p` the milestone directory if needed
3. Write `{{milestoneId}}-CONTEXT.md` — preserve the user's exact terminology, emphasis, and framing. Do not paraphrase nuance into generic summaries. The context file is downstream agents' only window into this conversation.
4. Commit: `git add {{milestoneId}}-CONTEXT.md && git commit -m "docs({{milestoneId}}): milestone context from discuss"`
5. Say exactly: `"{{milestoneId}} context written."` — nothing else.

View file

@ -51,6 +51,7 @@ Apply these when decomposing and ordering slices:
- **Completion must imply capability.** If every slice in this roadmap were completed exactly as written, the milestone's promised outcome should actually work at the proof level claimed. Do not write slices that can all be checked off while the user-visible capability still does not exist.
- **Don't invent risks.** If the project is straightforward, skip the proof strategy and just ship value in smart order. Not everything has major unknowns.
- **Ship features, not proofs.** A completed slice should leave the product in a state where the new capability is actually usable through its real interface. A login flow slice ends with a working login page, not a middleware function. An API slice ends with endpoints that return real data from a real store, not hardcoded fixtures. A dashboard slice ends with a real dashboard rendering real data, not a component that renders mock props. If a slice can't ship the real thing yet because a dependency isn't built, it should ship with realistic stubs that are clearly marked for replacement — but the user-facing surface must be real.
- **Dependency format is comma-separated, never range syntax.** Write `depends:[S01,S02,S03]` — not `depends:[S01-S03]`. Range syntax is not a valid format and permanently blocks the slice.
- **Ambition matches the milestone.** The number and depth of slices should match the milestone's ambition. A milestone promising "core platform with auth, data model, and primary user loop" should have enough slices to actually deliver all three as working features — not two proof-of-concept slices and a note that "the rest will come in the next milestone." If the milestone's context promises an outcome, the roadmap must deliver it.
- **Right-size the decomposition.** Match slice count to actual complexity. If the work is small enough to build and verify in one pass, it's one slice — don't split it into three just because you can identify sub-steps. Multiple requirements can share a single slice. Conversely, don't cram genuinely independent capabilities into one slice just to keep the count low. Let the work dictate the structure.

View file

@ -154,7 +154,7 @@ Templates showing the expected format for each artifact type are in:
**External facts:** Use `search-the-web` + `fetch_page`, or `search_and_read` for one-call extraction. Use `freshness` for recency. Never state current facts from training data without verification.
**Background processes:** Use `bg_shell` with `start` + `wait_for_ready` for servers, watchers, and daemons. Never poll with `sleep`/retry loops — `wait_for_ready` exists for this. For status checks, use `digest` (~30 tokens), not `output` (~2000 tokens). Use `highlights` (~100 tokens) when you need significant lines only. Use `output` only when actively debugging.
**Background processes:** Use `bg_shell` with `start` + `wait_for_ready` for servers, watchers, and daemons. Never use `bash` with `&` or `nohup` to background a process — the `bash` tool waits for stdout to close, so backgrounded children that inherit the file descriptors cause it to hang indefinitely. Never poll with `sleep`/retry loops — `wait_for_ready` exists for this. For status checks, use `digest` (~30 tokens), not `output` (~2000 tokens). Use `highlights` (~100 tokens) when you need significant lines only. Use `output` only when actively debugging.
**One-shot commands:** Use `async_bash` for builds, tests, and installs. The result is pushed to you when the command exits — no polling needed. Use `await_job` to block on a specific job.
@ -169,6 +169,7 @@ Templates showing the expected format for each artifact type are in:
- Never use `cat` to read a file you might edit — `read` gives you the exact text `edit` needs.
- Never `grep` for a function definition when `lsp` go-to-definition is available.
- Never poll a server with `sleep 1 && curl` loops — use `bg_shell` `wait_for_ready`.
- Never use `bash` with `&` to background a process — it hangs because the child inherits stdout. Use `bg_shell` `start` instead.
- Never use `bg_shell` `output` for a status check — use `digest`.
- Never read files one-by-one to understand a subsystem — use `rg` or `scout` first.
- Never guess at library APIs from training data — use `get_library_docs`.

View file

@ -0,0 +1,91 @@
You are executing GSD auto-mode.
## UNIT: Validate Milestone {{milestoneId}} ("{{milestoneTitle}}") — Remediation Round {{remediationRound}}
## Working Directory
Your working directory is `{{workingDirectory}}`. All file reads, writes, and shell commands MUST operate relative to this directory. Do NOT `cd` to any other directory.
## Your Role in the Pipeline
All slices are done. Before the **complete-milestone agent** closes this milestone, you reconcile planned work against what was actually delivered. You audit success criteria against evidence, inventory deferred work across all slice summaries and UAT results, and classify gaps. If auto-remediable gaps exist on the first pass, you append remediation slices to the roadmap so the pipeline can execute them before completion. After remediation slices run, you re-validate. The milestone only proceeds to completion once validation passes.
This is a gate, not a formality. But most milestones pass — bias toward "pass" unless you find concrete evidence of unmet criteria or meaningful gaps.
All relevant context has been preloaded below — the roadmap, all slice summaries, UAT results, requirements, decisions, and project context are inlined. Start working immediately without re-reading these files.
{{inlinedContext}}
If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during validation, without relaxing required verification or artifact rules.
Then:
### Step 1: Audit Success Criteria
Enumerate each success criterion from the roadmap's `## Success Criteria` section. For each criterion, map it to concrete evidence from slice summaries, UAT results, or observable behavior.
Format each criterion as:
- `Criterion text`**MET** — evidence: {{specific slice summary, UAT result, test output, or observable behavior}}
- `Criterion text`**NOT MET** — gap: {{what's missing and why}}
Every criterion must have a definitive verdict. Do not mark a criterion as MET without specific evidence.
### Step 2: Inventory Deferred Work
Scan ALL slice summaries for:
- `Known Limitations` sections
- `Follow-ups` sections
- `Deviations` sections
Scan ALL UAT results for:
- `Not Proven By This UAT` sections
- Any PARTIAL or FAIL verdicts
Check:
- `.gsd/REQUIREMENTS.md` for Active requirements not yet Validated
- `.gsd/CAPTURES.md` for unresolved deferred captures
Collect every item into a single inventory. Do not skip items because they seem minor — the classification step handles prioritization.
### Step 3: Classify Each Gap
For every unmet criterion and every deferred work item, classify it as one of:
- **auto-remediable** — can be fixed by adding a new slice (missing feature, unfixed bug, untested path, incomplete integration)
- **human-required** — needs Lex's input (design decision, external service dependency, manual verification, judgment call, ambiguous requirement)
- **acceptable** — known limitation that's OK to ship (documented trade-off, explicitly scoped for a future milestone, minor rough edge with no user impact)
Be conservative with **auto-remediable**. Only classify a gap as auto-remediable if you're confident a slice can resolve it without human judgment. When in doubt, classify as **human-required**.
### Step 4: Act on Gaps
**If this is remediation round 0 AND auto-remediable gaps exist:**
1. Define remediation slices to address auto-remediable gaps. Follow the exact roadmap slice format:
`- [ ] **S0X: Title** \`risk:medium\` \`depends:[]\``
Include a brief description of what each slice must accomplish.
2. Append these slices to `{{roadmapPath}}` after existing slices (do not modify completed slices).
3. Update the boundary map in the roadmap if the new slices introduce new integration points.
4. Set verdict to `needs-remediation`.
**If this is remediation round 1 or higher:**
Do NOT add more slices. At this point either:
- All remaining gaps are acceptable — set verdict to `pass`
- Remaining gaps need Lex's input — set verdict to `needs-attention`
Never add remediation slices after round 0. If round 0 remediation didn't close the gaps, escalate.
**If no auto-remediable gaps exist (any round):**
- If all criteria are MET and deferred items are acceptable or human-required only — set verdict to `pass` (with human-required items noted)
- If human-required items are blocking — set verdict to `needs-attention`
### Step 5: Write Validation Report
Write `{{validationPath}}` using the milestone-validation template. Fill all frontmatter fields and every section. The report must be a complete record of the validation — a future agent reading only this file should understand what was checked, what passed, and what remains.
**You MUST write `{{validationPath}}` before finishing.**
When done, say: "Milestone {{milestoneId}} validated."

View file

@ -1,5 +1,45 @@
import type { RoadmapSliceEntry, RiskLevel } from "./types.js";
/**
* Expand dependency shorthand into individual slice IDs.
*
* Handles two common LLM-generated patterns that the roadmap parser
* previously treated as single literal IDs (silently blocking slices):
*
* "S01-S04" ["S01", "S02", "S03", "S04"] (range syntax)
* "S01..S04" ["S01", "S02", "S03", "S04"] (dot-range syntax)
*
* Plain IDs ("S01", "S02") and empty strings pass through unchanged.
*/
export function expandDependencies(deps: string[]): string[] {
const result: string[] = [];
for (const dep of deps) {
const trimmed = dep.trim();
if (!trimmed) continue;
// Match range syntax: S01-S04 or S01..S04 (case-insensitive prefix)
const rangeMatch = trimmed.match(/^([A-Za-z]+)(\d+)(?:-|\.\.)+([A-Za-z]+)(\d+)$/);
if (rangeMatch) {
const prefixA = rangeMatch[1]!.toUpperCase();
const startNum = parseInt(rangeMatch[2]!, 10);
const prefixB = rangeMatch[3]!.toUpperCase();
const endNum = parseInt(rangeMatch[4]!, 10);
// Only expand when both prefixes match and range is valid
if (prefixA === prefixB && startNum <= endNum) {
const width = rangeMatch[2]!.length; // preserve zero-padding (S01 not S1)
for (let i = startNum; i <= endNum; i++) {
result.push(`${prefixA}${String(i).padStart(width, "0")}`);
}
continue;
}
}
result.push(trimmed);
}
return result;
}
function extractSlicesSection(content: string): string {
const headingMatch = /^## Slices\s*$/m.exec(content);
if (!headingMatch || headingMatch.index == null) return "";
@ -33,7 +73,7 @@ export function parseRoadmapSlices(content: string): RoadmapSliceEntry[] {
const depsMatch = rest.match(/`depends:\[([^\]]*)\]`/);
const depends = depsMatch && depsMatch[1]!.trim()
? depsMatch[1]!.split(",").map(s => s.trim())
? expandDependencies(depsMatch[1]!.split(",").map(s => s.trim()))
: [];
currentSlice = { id, title, risk, depends, done, demo: "" };

View file

@ -22,6 +22,7 @@ import { readFileSync, readdirSync, existsSync, statSync } from "node:fs";
import { basename, join } from "node:path";
import { nativeParseJsonlTail } from "./native-parser-bridge.js";
import { nativeWorkingTreeStatus, nativeDiffStat } from "./native-git-bridge.js";
import { getAutoWorktreePath } from "./auto-worktree.js";
// ─── Types ────────────────────────────────────────────────────────────────────
@ -296,12 +297,45 @@ export function synthesizeCrashRecovery(
* Replaces the old shallow getLastActivityDiagnostic().
*/
export function getDeepDiagnostic(basePath: string): string | null {
const activityDir = join(basePath, ".gsd", "activity");
const trace = readLastActivityLog(activityDir);
// Try worktree activity logs first if an auto-worktree is active
let trace: ExecutionTrace | null = null;
try {
const mid = readActiveMilestoneId(basePath);
if (mid) {
const wtPath = getAutoWorktreePath(basePath, mid);
if (wtPath) {
const wtActivityDir = join(wtPath, ".gsd", "activity");
trace = readLastActivityLog(wtActivityDir);
}
}
} catch { /* non-fatal — fall through to root */ }
// Fall back to root activity logs
if (!trace || trace.toolCallCount === 0) {
const activityDir = join(basePath, ".gsd", "activity");
trace = readLastActivityLog(activityDir);
}
if (!trace || trace.toolCallCount === 0) return null;
return formatTraceSummary(trace);
}
/**
* Read the active milestone ID directly from STATE.md without async deriveState().
* Looks for `**Active Milestone:** M001` pattern.
*/
function readActiveMilestoneId(basePath: string): string | null {
try {
const statePath = join(basePath, ".gsd", "STATE.md");
if (!existsSync(statePath)) return null;
const content = readFileSync(statePath, "utf-8");
const match = /\*\*Active Milestone:\*\*\s*(\S+)/i.exec(content);
return match?.[1] ?? null;
} catch {
return null;
}
}
// ─── Formatting ───────────────────────────────────────────────────────────────
function formatRecoveryPrompt(

View file

@ -0,0 +1,62 @@
---
id: {{milestoneId}}
remediation_round: {{round}}
verdict: pass | needs-remediation | needs-attention
slices_added: []
human_required_items: 0
validated_at: {{date}}
---
# {{milestoneId}}: Milestone Validation
## Success Criteria Audit
<!-- For each success criterion from the roadmap, list the criterion text,
verdict (MET / NOT MET), and the specific evidence or gap.
Every criterion must appear here with a definitive verdict. -->
- **Criterion:** {{criterionText}}
**Verdict:** {{MET or NOT MET}}
**Evidence:** {{sliceSummary, UATResult, testOutput, or observableBehavior}}
## Deferred Work Inventory
<!-- Every deferred, incomplete, or flagged item found across all slice summaries
and UAT results. Include the source so a reader can trace back to the original. -->
| Item | Source | Classification | Disposition |
|------|--------|----------------|-------------|
| {{itemDescription}} | {{sliceId or UAT reference}} | {{auto-remediable / human-required / acceptable}} | {{what happens with this item}} |
## Requirement Coverage
<!-- Active requirements from REQUIREMENTS.md that are not yet Validated.
If no REQUIREMENTS.md exists, write "No requirements tracking active." -->
- **{{requirementId}}**: {{status}} — {{disposition: covered by remediation slice / acceptable gap / needs attention}}
## Remediation Slices
<!-- New slices appended to the roadmap to address auto-remediable gaps.
Include the full slice definition as written to the roadmap.
If no slices were added, write "None required." -->
{{remediationSliceDefinitions OR "None required."}}
## Requires Attention
<!-- Items classified as human-required, with enough context for Lex to make a decision.
Ordered by priority (blocking items first).
If none, write "None." -->
- **{{itemTitle}}** ({{priority: blocking / non-blocking}})
Context: {{whatTheItemIs, whereItCameFrom, whyItNeedsHumanInput}}
## Verdict
<!-- One-paragraph summary assessment.
State the verdict (pass / needs-remediation / needs-attention),
the number of criteria met vs total, and the key finding
that determined the verdict. -->
{{verdictSummary}}

View file

@ -0,0 +1,186 @@
import test from "node:test";
import assert from "node:assert/strict";
import { mkdirSync, mkdtempSync, writeFileSync, existsSync, readFileSync, rmSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
import { writeLock, readCrashLock, clearLock, isLockProcessAlive } from "../crash-recovery.ts";
// ─── writeLock creates auto.lock in .gsd/ ────────────────────────────────
test("writeLock creates auto.lock with correct structure", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
writeLock(dir, "starting", "M001", 0);
const lockPath = join(dir, ".gsd", "auto.lock");
assert.ok(existsSync(lockPath), "auto.lock should exist after writeLock");
const data = JSON.parse(readFileSync(lockPath, "utf-8"));
assert.equal(data.pid, process.pid, "lock should contain current PID");
assert.equal(data.unitType, "starting", "lock should contain unit type");
assert.equal(data.unitId, "M001", "lock should contain unit ID");
assert.equal(data.completedUnits, 0, "lock should show 0 completed units");
assert.ok(data.startedAt, "lock should have startedAt timestamp");
rmSync(dir, { recursive: true, force: true });
});
test("writeLock updates existing lock with new unit info", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
writeLock(dir, "starting", "M001", 0);
writeLock(dir, "execute-task", "M001/S01/T01", 2, "/tmp/session.jsonl");
const data = JSON.parse(readFileSync(join(dir, ".gsd", "auto.lock"), "utf-8"));
assert.equal(data.unitType, "execute-task", "lock should be updated to new unit type");
assert.equal(data.unitId, "M001/S01/T01", "lock should be updated to new unit ID");
assert.equal(data.completedUnits, 2, "completed count should be updated");
assert.equal(data.sessionFile, "/tmp/session.jsonl", "session file should be recorded");
rmSync(dir, { recursive: true, force: true });
});
// ─── readCrashLock reads auto.lock data ──────────────────────────────────
test("readCrashLock returns null when no lock file exists", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
const lock = readCrashLock(dir);
assert.equal(lock, null, "should return null when no lock file");
rmSync(dir, { recursive: true, force: true });
});
test("readCrashLock returns lock data when file exists", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
writeLock(dir, "plan-milestone", "M002", 5);
const lock = readCrashLock(dir);
assert.ok(lock, "should return lock data");
assert.equal(lock!.unitType, "plan-milestone");
assert.equal(lock!.unitId, "M002");
assert.equal(lock!.completedUnits, 5);
rmSync(dir, { recursive: true, force: true });
});
// ─── clearLock removes auto.lock ─────────────────────────────────────────
test("clearLock removes the lock file", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
writeLock(dir, "starting", "M001", 0);
assert.ok(existsSync(join(dir, ".gsd", "auto.lock")), "lock should exist before clear");
clearLock(dir);
assert.ok(!existsSync(join(dir, ".gsd", "auto.lock")), "lock should be removed after clear");
rmSync(dir, { recursive: true, force: true });
});
test("clearLock is safe when no lock file exists", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
// Should not throw
clearLock(dir);
rmSync(dir, { recursive: true, force: true });
});
// ─── isLockProcessAlive detects live vs dead PIDs ────────────────────────
test("isLockProcessAlive returns false for dead PID", () => {
const lock = {
pid: 9999999,
startedAt: new Date().toISOString(),
unitType: "execute-task",
unitId: "M001/S01/T01",
unitStartedAt: new Date().toISOString(),
completedUnits: 0,
};
assert.equal(isLockProcessAlive(lock), false, "dead PID should return false");
});
test("isLockProcessAlive returns false for own PID (recycled)", () => {
const lock = {
pid: process.pid,
startedAt: new Date().toISOString(),
unitType: "execute-task",
unitId: "M001/S01/T01",
unitStartedAt: new Date().toISOString(),
completedUnits: 0,
};
assert.equal(isLockProcessAlive(lock), false, "own PID should return false (recycled)");
});
test("isLockProcessAlive returns false for invalid PID", () => {
const lock = {
pid: -1,
startedAt: new Date().toISOString(),
unitType: "execute-task",
unitId: "M001/S01/T01",
unitStartedAt: new Date().toISOString(),
completedUnits: 0,
};
assert.equal(isLockProcessAlive(lock), false, "negative PID should return false");
});
// ─── Cross-process detection via lock file ───────────────────────────────
test("lock file enables cross-process auto-mode detection", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
// Use the parent process PID — guaranteed alive on all platforms (Unix and Windows).
// PID 1 (init) only works on Unix; on Windows it doesn't exist.
const alivePid = process.ppid;
const lockData = {
pid: alivePid,
startedAt: new Date().toISOString(),
unitType: "execute-task",
unitId: "M001/S01/T02",
unitStartedAt: new Date().toISOString(),
completedUnits: 3,
};
writeFileSync(join(dir, ".gsd", "auto.lock"), JSON.stringify(lockData, null, 2));
const lock = readCrashLock(dir);
assert.ok(lock, "should read the lock");
assert.equal(lock!.pid, alivePid);
// Parent PID is always alive — isLockProcessAlive should detect it
const alive = isLockProcessAlive(lock!);
assert.equal(alive, true, "parent PID should be detected as alive");
rmSync(dir, { recursive: true, force: true });
});
test("stale lock from dead process is detected as not alive", () => {
const dir = mkdtempSync(join(tmpdir(), "gsd-lock-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
// Simulate a stale lock from a process that no longer exists
const lockData = {
pid: 9999999,
startedAt: "2026-03-01T00:00:00Z",
unitType: "plan-slice",
unitId: "M001/S02",
unitStartedAt: "2026-03-01T00:05:00Z",
completedUnits: 1,
};
writeFileSync(join(dir, ".gsd", "auto.lock"), JSON.stringify(lockData, null, 2));
const lock = readCrashLock(dir);
assert.ok(lock, "should read the stale lock");
assert.equal(isLockProcessAlive(lock!), false, "dead process should not be alive");
rmSync(dir, { recursive: true, force: true });
});

View file

@ -320,3 +320,67 @@ test("verifyExpectedArtifact detects roadmap [x] change despite parse cache", ()
cleanup(base);
}
});
// ─── verifyExpectedArtifact: plan-slice empty scaffold regression (#699) ──
test("verifyExpectedArtifact rejects plan-slice with empty scaffold", () => {
const base = makeTmpBase();
try {
const sliceDir = join(base, ".gsd", "milestones", "M001", "slices", "S01");
mkdirSync(sliceDir, { recursive: true });
writeFileSync(join(sliceDir, "S01-PLAN.md"), "# S01: Test Slice\n\n## Tasks\n\n");
assert.strictEqual(
verifyExpectedArtifact("plan-slice", "M001/S01", base),
false,
"Empty scaffold should not be treated as completed artifact",
);
} finally {
cleanup(base);
}
});
test("verifyExpectedArtifact accepts plan-slice with actual tasks", () => {
const base = makeTmpBase();
try {
const sliceDir = join(base, ".gsd", "milestones", "M001", "slices", "S01");
mkdirSync(sliceDir, { recursive: true });
writeFileSync(join(sliceDir, "S01-PLAN.md"), [
"# S01: Test Slice",
"",
"## Tasks",
"",
"- [ ] **T01: Implement feature** `est:2h`",
"- [ ] **T02: Write tests** `est:1h`",
].join("\n"));
assert.strictEqual(
verifyExpectedArtifact("plan-slice", "M001/S01", base),
true,
"Plan with task entries should be treated as completed artifact",
);
} finally {
cleanup(base);
}
});
test("verifyExpectedArtifact accepts plan-slice with completed tasks", () => {
const base = makeTmpBase();
try {
const sliceDir = join(base, ".gsd", "milestones", "M001", "slices", "S01");
mkdirSync(sliceDir, { recursive: true });
writeFileSync(join(sliceDir, "S01-PLAN.md"), [
"# S01: Test Slice",
"",
"## Tasks",
"",
"- [x] **T01: Implement feature** `est:2h`",
"- [ ] **T02: Write tests** `est:1h`",
].join("\n"));
assert.strictEqual(
verifyExpectedArtifact("plan-slice", "M001/S01", base),
true,
"Plan with completed task entries should be treated as completed artifact",
);
} finally {
cleanup(base);
}
});

View file

@ -0,0 +1,123 @@
/**
* auto-skip-loop.test.ts Tests for the consecutive-skip loop breaker.
*
* Regression for #728: auto-mode infinite skip loop on previously completed
* plan-slice units when deriveState keeps returning the same unit.
*
* The skip paths in dispatchNextUnit track consecutive skips per unit via
* unitConsecutiveSkips. When the same unit is skipped > MAX_CONSECUTIVE_SKIPS
* times without a real dispatch in between, the completion record is evicted
* so deriveState can reconcile.
*/
import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
import {
_getUnitConsecutiveSkips,
_resetUnitConsecutiveSkips,
MAX_CONSECUTIVE_SKIPS,
} from "../auto.ts";
import { persistCompletedKey, removePersistedKey, loadPersistedKeys } from "../auto-recovery.ts";
import { createTestContext } from "./test-helpers.ts";
const { assertEq, assertTrue, report } = createTestContext();
function makeTmpBase(): string {
const dir = mkdtempSync(join(tmpdir(), "gsd-skip-loop-test-"));
mkdirSync(join(dir, ".gsd"), { recursive: true });
return dir;
}
async function main(): Promise<void> {
// ─── Counter starts at zero ────────────────────────────────────────────
console.log("\n=== skip loop counter: initial state ===");
{
_resetUnitConsecutiveSkips();
const map = _getUnitConsecutiveSkips();
assertEq(map.size, 0, "counter map starts empty after reset");
}
// ─── Counter increments correctly ────────────────────────────────────
console.log("\n=== skip loop counter: increments on repeated calls ===");
{
_resetUnitConsecutiveSkips();
const map = _getUnitConsecutiveSkips();
const key = "plan-slice/M001/S04";
for (let i = 1; i <= MAX_CONSECUTIVE_SKIPS; i++) {
const prev = map.get(key) ?? 0;
map.set(key, prev + 1);
}
assertEq(map.get(key), MAX_CONSECUTIVE_SKIPS, `counter reaches MAX_CONSECUTIVE_SKIPS (${MAX_CONSECUTIVE_SKIPS})`);
}
// ─── Threshold constant is sane ──────────────────────────────────────
console.log("\n=== skip loop counter: threshold is reasonable ===");
{
assertTrue(MAX_CONSECUTIVE_SKIPS >= 3, "threshold allows a few legitimate skips");
assertTrue(MAX_CONSECUTIVE_SKIPS <= 10, "threshold catches loops quickly");
}
// ─── Reset clears all keys ────────────────────────────────────────────
console.log("\n=== skip loop counter: reset clears all keys ===");
{
_resetUnitConsecutiveSkips();
const map = _getUnitConsecutiveSkips();
map.set("plan-slice/M001/S01", 2);
map.set("plan-slice/M001/S02", 1);
assertEq(map.size, 2, "map has 2 entries before reset");
_resetUnitConsecutiveSkips();
assertEq(_getUnitConsecutiveSkips().size, 0, "map empty after reset");
}
// ─── Eviction path: persistCompletedKey + removePersistedKey round-trip
// (simulates what the loop-breaker does) ───────────────────────────
console.log("\n=== skip loop counter: eviction removes persisted key ===");
{
_resetUnitConsecutiveSkips();
const base = makeTmpBase();
try {
const key = "plan-slice/M001/S04";
const keySet = new Set<string>();
persistCompletedKey(base, key);
loadPersistedKeys(base, keySet);
assertTrue(keySet.has(key), "key persisted before eviction");
// Simulate loop-breaker eviction
keySet.delete(key);
removePersistedKey(base, key);
const keySet2 = new Set<string>();
loadPersistedKeys(base, keySet2);
assertTrue(!keySet2.has(key), "key absent after eviction");
} finally {
rmSync(base, { recursive: true, force: true });
}
}
// ─── Counter resets per-key, not globally ─────────────────────────────
console.log("\n=== skip loop counter: per-key isolation ===");
{
_resetUnitConsecutiveSkips();
const map = _getUnitConsecutiveSkips();
map.set("plan-slice/M001/S04", MAX_CONSECUTIVE_SKIPS + 1);
map.set("plan-slice/M001/S05", 1);
// Deleting S04 (eviction) should not affect S05
map.delete("plan-slice/M001/S04");
assertTrue(!map.has("plan-slice/M001/S04"), "S04 evicted");
assertEq(map.get("plan-slice/M001/S05"), 1, "S05 counter unaffected");
}
_resetUnitConsecutiveSkips();
report();
}
main().catch((err) => {
console.error(err);
process.exit(1);
});

View file

@ -585,6 +585,64 @@ Discovered an issue.
rmSync(dtBase, { recursive: true, force: true });
}
// ─── unresolvable_dependency: range syntax dep warns ─────────────────
console.log("\n=== doctor: unresolvable_dependency warns for leftover range ID ===");
{
// Simulate a roadmap where expandDependencies did NOT expand (pre-fix stored artifact)
// by writing a dep that looks like a range but doesn't match any real slice.
const base = mkdtempSync(join(tmpdir(), "gsd-doctor-udep-"));
const mDir2 = join(base, ".gsd", "milestones", "M001");
const sDir2 = join(mDir2, "slices", "S01");
const tDir2 = join(sDir2, "tasks");
mkdirSync(tDir2, { recursive: true });
writeFileSync(join(mDir2, "M001-ROADMAP.md"), [
"# M001: Test",
"",
"## Slices",
"- [x] **S01: Done** `risk:low` `depends:[]`",
" > After this: done",
"- [ ] **S02: Blocked** `risk:low` `depends:[S99]`",
" > After this: also done",
].join("\n") + "\n");
writeFileSync(join(sDir2, "S01-PLAN.md"), "# S01\n\n**Goal:** g\n**Demo:** d\n\n## Tasks\n- [x] **T01: t** `est:5m`\n");
writeFileSync(join(tDir2, "T01-SUMMARY.md"), "---\nid: T01\nparent: S01\nmilestone: M001\n---\n# T01\n## What Happened\nDone.\n");
const r = await runGSDDoctor(base, { fix: false });
const udepIssues = r.issues.filter(i => i.code === "unresolvable_dependency");
assertTrue(udepIssues.length > 0, "unresolvable_dependency fires for unknown dep S99");
assertEq(udepIssues[0]?.severity, "warning", "severity is warning");
assertTrue(udepIssues[0]?.message.includes("S99"), "message names the bad dep");
rmSync(base, { recursive: true, force: true });
}
// ─── unresolvable_dependency: valid deps do not warn ─────────────────
console.log("\n=== doctor: no unresolvable_dependency for valid deps ===");
{
const base = mkdtempSync(join(tmpdir(), "gsd-doctor-udep-ok-"));
const mDir2 = join(base, ".gsd", "milestones", "M001");
const sDir2 = join(mDir2, "slices", "S01");
const tDir2 = join(sDir2, "tasks");
mkdirSync(tDir2, { recursive: true });
writeFileSync(join(mDir2, "M001-ROADMAP.md"), [
"# M001: Test",
"",
"## Slices",
"- [x] **S01: Done** `risk:low` `depends:[]`",
" > After this: done",
"- [ ] **S02: Next** `risk:low` `depends:[S01]`",
" > After this: next done",
].join("\n") + "\n");
writeFileSync(join(sDir2, "S01-PLAN.md"), "# S01\n\n**Goal:** g\n**Demo:** d\n\n## Tasks\n- [x] **T01: t** `est:5m`\n");
writeFileSync(join(tDir2, "T01-SUMMARY.md"), "---\nid: T01\nparent: S01\nmilestone: M001\n---\n# T01\n## What Happened\nDone.\n");
const r = await runGSDDoctor(base, { fix: false });
const udepIssues = r.issues.filter(i => i.code === "unresolvable_dependency");
assertEq(udepIssues.length, 0, "no unresolvable_dependency for valid S01 dep");
rmSync(base, { recursive: true, force: true });
}
report();
}

View file

@ -1,6 +1,6 @@
/**
* In-flight tool tracking tests verifies that markToolStart/markToolEnd
* correctly manage the in-flight tools set used by the idle watchdog to
* correctly manage the in-flight tools map used by the idle watchdog to
* distinguish "agent waiting on long-running tool" from "agent is idle".
*
* Background: The idle watchdog checks every 15s for agent progress. Without
@ -8,12 +8,15 @@
* can run 20+ minutes for evaluations, deployments, test suites) are falsely
* declared idle and interrupted by recovery steering messages.
*
* The fix hooks tool_execution_start/end events to track active tool calls.
* When tools are in-flight, the watchdog resets lastProgressAt instead of
* triggering idle recovery.
* The fix hooks tool_execution_start/end events to track active tool calls
* with start timestamps. When tools are in-flight and started recently
* (< idleTimeoutMs), the watchdog resets lastProgressAt instead of triggering
* idle recovery. When a tool has been in-flight for longer than idleTimeoutMs,
* it is treated as stuck (e.g., `command &` keeping stdout open) and recovery
* proceeds anyway.
*/
import { markToolStart, markToolEnd, isAutoActive } from "../auto.ts";
import { markToolStart, markToolEnd, isAutoActive, getOldestInFlightToolAgeMs } from "../auto.ts";
import { createTestContext } from './test-helpers.ts';
const { assertEq, assertTrue, report } = createTestContext();
@ -49,9 +52,17 @@ const { assertEq, assertTrue, report } = createTestContext();
// ═══ Integration contract: expected exports from auto.ts ═════════════════════
{
console.log("\n=== auto.ts exports markToolStart and markToolEnd ===");
console.log("\n=== auto.ts exports markToolStart, markToolEnd, and getOldestInFlightToolAgeMs ===");
assertEq(typeof markToolStart, "function", "markToolStart should be a function");
assertEq(typeof markToolEnd, "function", "markToolEnd should be a function");
assertEq(typeof getOldestInFlightToolAgeMs, "function", "getOldestInFlightToolAgeMs should be a function");
}
{
console.log("\n=== getOldestInFlightToolAgeMs: returns 0 when no tools in-flight ===");
// When auto-mode is inactive, inFlightTools map is empty → age is 0
const age = getOldestInFlightToolAgeMs();
assertEq(age, 0, "should return 0 when no tools are in-flight");
}
{

View file

@ -1,5 +1,5 @@
import { parseRoadmap } from "../files.ts";
import { parseRoadmapSlices } from "../roadmap-slices.ts";
import { parseRoadmapSlices, expandDependencies } from "../roadmap-slices.ts";
import { createTestContext } from './test-helpers.ts';
const { assertEq, assertTrue, report } = createTestContext();
@ -38,4 +38,46 @@ assertEq(roadmap.title, "M003: Current", "roadmap title preserved");
assertEq(roadmap.vision, "Build the thing.", "roadmap vision preserved");
assertTrue(roadmap.boundaryMap.length === 1, "boundary map still parsed");
// ─── expandDependencies unit tests ─────────────────────────────────────
console.log("\n=== expandDependencies: plain IDs pass through ===");
assertEq(expandDependencies([]), [], "empty list");
assertEq(expandDependencies(["S01"]), ["S01"], "single plain ID");
assertEq(expandDependencies(["S01", "S03"]), ["S01", "S03"], "multiple plain IDs");
console.log("\n=== expandDependencies: dash range expansion ===");
assertEq(expandDependencies(["S01-S04"]), ["S01", "S02", "S03", "S04"], "S01-S04 expands correctly");
assertEq(expandDependencies(["S01-S01"]), ["S01"], "single-element range");
assertEq(expandDependencies(["S03-S05"]), ["S03", "S04", "S05"], "mid-range expansion");
console.log("\n=== expandDependencies: dot-range expansion ===");
assertEq(expandDependencies(["S01..S03"]), ["S01", "S02", "S03"], "S01..S03 dot range");
console.log("\n=== expandDependencies: zero-padding preserved ===");
assertEq(expandDependencies(["S01-S03"]), ["S01", "S02", "S03"], "zero-padded IDs preserved");
console.log("\n=== expandDependencies: mixed list ===");
assertEq(expandDependencies(["S01-S03", "S05"]), ["S01", "S02", "S03", "S05"], "range + plain mixed");
console.log("\n=== expandDependencies: invalid range passes through unchanged ===");
assertEq(expandDependencies(["S04-S01"]), ["S04-S01"], "reversed range not expanded (start > end)");
assertEq(expandDependencies(["S01-T04"]), ["S01-T04"], "mismatched prefix not expanded");
// ─── parseRoadmapSlices: range syntax in depends ─────────────────────
console.log("\n=== parseRoadmapSlices: range syntax in depends expanded ===");
{
const rangeContent = `# M016: Test\n\n## Slices\n- [x] **S01: A** \`risk:low\` \`depends:[]\`\n- [x] **S02: B** \`risk:low\` \`depends:[]\`\n- [x] **S03: C** \`risk:low\` \`depends:[]\`\n- [x] **S04: D** \`risk:low\` \`depends:[]\`\n- [ ] **S05: E** \`risk:low\` \`depends:[S01-S04]\`\n > After this: all done\n`;
const rangeSlices = parseRoadmapSlices(rangeContent);
assertEq(rangeSlices.length, 5, "5 slices parsed");
assertEq(rangeSlices[4]?.depends, ["S01", "S02", "S03", "S04"], "S01-S04 range expanded to individual IDs");
}
console.log("\n=== parseRoadmapSlices: comma-separated depends still works ===");
{
const commaContent = `# M001: Test\n\n## Slices\n- [ ] **S05: E** \`risk:low\` \`depends:[S01,S02,S03,S04]\`\n > After this: done\n`;
const commaSlices = parseRoadmapSlices(commaContent);
assertEq(commaSlices[0]?.depends, ["S01", "S02", "S03", "S04"], "comma-separated depends unchanged");
}
report();

View file

@ -103,10 +103,21 @@ async function indexSlice(basePath: string, milestoneId: string, sliceId: string
};
}
export async function indexWorkspace(basePath: string): Promise<GSDWorkspaceIndex> {
export interface IndexWorkspaceOptions {
/**
* When true, run validatePlanBoundary and validateCompleteBoundary for each slice.
* Skipped by default validation is expensive (content analysis) and only needed
* for explicit doctor/audit flows. The /gsd status dashboard and scope pickers
* don't need the full issue list.
*/
validate?: boolean;
}
export async function indexWorkspace(basePath: string, opts: IndexWorkspaceOptions = {}): Promise<GSDWorkspaceIndex> {
const milestoneIds = findMilestoneIds(basePath);
const milestones: WorkspaceMilestoneTarget[] = [];
const validationIssues: ValidationIssue[] = [];
const runValidation = opts.validate === true;
for (const milestoneId of milestoneIds) {
const roadmapPath = resolveMilestoneFile(basePath, milestoneId, "ROADMAP") ?? undefined;
@ -118,11 +129,27 @@ export async function indexWorkspace(basePath: string): Promise<GSDWorkspaceInde
if (roadmapContent) {
const roadmap = parseRoadmap(roadmapContent);
title = titleFromRoadmapHeader(roadmapContent, milestoneId);
for (const slice of roadmap.slices) {
const indexedSlice = await indexSlice(basePath, milestoneId, slice.id, slice.title, slice.done);
// Parallelise all per-slice I/O: indexSlice + (optional) validation calls run concurrently.
// Order is preserved via Promise.all on an array built from roadmap.slices.
const sliceResults = await Promise.all(
roadmap.slices.map(async (slice) => {
if (runValidation) {
const [indexedSlice, planIssues, completeIssues] = await Promise.all([
indexSlice(basePath, milestoneId, slice.id, slice.title, slice.done),
validatePlanBoundary(basePath, milestoneId, slice.id),
validateCompleteBoundary(basePath, milestoneId, slice.id),
]);
return { indexedSlice, issues: [...planIssues, ...completeIssues] };
}
const indexedSlice = await indexSlice(basePath, milestoneId, slice.id, slice.title, slice.done);
return { indexedSlice, issues: [] as ValidationIssue[] };
}),
);
for (const { indexedSlice, issues } of sliceResults) {
slices.push(indexedSlice);
validationIssues.push(...await validatePlanBoundary(basePath, milestoneId, slice.id));
validationIssues.push(...await validateCompleteBoundary(basePath, milestoneId, slice.id));
validationIssues.push(...issues);
}
}
}
@ -173,7 +200,8 @@ export async function listDoctorScopeSuggestions(basePath: string): Promise<Arra
}
export async function getSuggestedNextCommands(basePath: string): Promise<string[]> {
const index = await indexWorkspace(basePath);
// Run validation here since we surface a /gsd doctor audit hint when issues exist.
const index = await indexWorkspace(basePath, { validate: true });
const scope = index.active.milestoneId && index.active.sliceId
? `${index.active.milestoneId}/${index.active.sliceId}`
: index.active.milestoneId;

View file

@ -0,0 +1,54 @@
import test from 'node:test'
import assert from 'node:assert/strict'
import { join } from 'node:path'
import { fileURLToPath, pathToFileURL } from 'node:url'
const projectRoot = join(fileURLToPath(import.meta.url), '..', '..', '..')
/**
* Resolve dist path as a file:// URL for cross-platform dynamic import.
* On Windows, bare paths like `D:\...\mcp-server.js` fail with
* ERR_UNSUPPORTED_ESM_URL_SCHEME because Node's ESM loader requires
* file:// URLs for absolute paths.
*/
function distUrl(filename: string): string {
return pathToFileURL(join(projectRoot, 'dist', filename)).href
}
test('mcp-server module imports without errors', async () => {
// Import from the compiled dist output to avoid subpath resolution issues
// that occur when the resolve-ts test hook rewrites .js -> .ts paths.
const mod = await import(distUrl('mcp-server.js'))
assert.ok(mod, 'module should be importable')
assert.strictEqual(typeof mod.startMcpServer, 'function', 'startMcpServer should be a function')
})
test('startMcpServer accepts the correct argument shape', async () => {
const { startMcpServer } = await import(distUrl('mcp-server.js'))
assert.strictEqual(typeof startMcpServer, 'function')
assert.strictEqual(startMcpServer.length, 1, 'startMcpServer should accept one argument')
})
test('startMcpServer can be called with mock tools', async () => {
const { startMcpServer } = await import(distUrl('mcp-server.js'))
// Create a mock tool matching the McpToolDef interface
const mockTool = {
name: 'test_tool',
description: 'A test tool',
parameters: { type: 'object', properties: {} },
execute: async () => ({
content: [{ type: 'text', text: 'hello' }],
}),
}
// Verify the function can be called with the correct signature
// without throwing during argument validation. It will attempt to
// connect to stdin/stdout as an MCP transport, which won't work in
// a test environment, but the Server instance is created successfully.
assert.doesNotThrow(() => {
void startMcpServer({ tools: [mockTool], version: '0.0.0-test' })
.catch(() => { /* expected: no MCP client on stdin */ })
})
})

3
vscode-extension/.gitignore vendored Normal file
View file

@ -0,0 +1,3 @@
dist/
node_modules/
*.vsix

View file

@ -0,0 +1,9 @@
.vscode/**
.vscode-test/**
src/**
.gitignore
tsconfig.json
**/*.ts
!dist/**
node_modules/**
**/*.map

View file

@ -0,0 +1,11 @@
# Changelog
## [0.1.0]
Initial release.
- Full RPC client — spawns `gsd --mode rpc`, JSON line framing, all 25 RPC commands
- Sidebar dashboard — connection status, model info, thinking level, token usage, cost, quick actions
- Chat participant — `@gsd` in VS Code Chat with streaming responses
- 15 commands with keyboard shortcuts
- Auto-start and auto-compaction configuration

21
vscode-extension/LICENSE Normal file
View file

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Lex Christopherson
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View file

@ -0,0 +1,91 @@
# GSD-2 — VS Code Extension
Control the [GSD-2 coding agent](https://github.com/gsd-build/gsd-2) directly from VS Code. Run autonomous coding sessions, chat with `@gsd` in VS Code Chat, and monitor your agent from a sidebar dashboard — all without leaving the editor.
## Requirements
GSD must be installed before activating this extension:
```bash
npm install -g gsd-pi
```
Node.js ≥ 20.6.0 and Git are required.
## Features
### Sidebar Dashboard
Click the GSD icon in the Activity Bar to open the agent dashboard. It shows:
- Connection status (connected / disconnected)
- Active model and provider
- Thinking level
- Token usage and session cost
- Quick action buttons: Start, Stop, New Session, Compact, Abort
### Chat Integration (`@gsd`)
Use `@gsd` in VS Code Chat (`Ctrl+Shift+I`) to send messages to the agent:
```
@gsd refactor the auth module to use JWT
@gsd /gsd auto
@gsd what's the current milestone status?
```
### Commands
All commands are accessible via `Ctrl+Shift+P`:
| Command | Description |
|---------|-------------|
| **GSD: Start Agent** | Connect to the GSD agent |
| **GSD: Stop Agent** | Disconnect the agent |
| **GSD: New Session** | Start a fresh conversation |
| **GSD: Send Message** | Send a message to the agent |
| **GSD: Abort Current Operation** | Interrupt the current operation |
| **GSD: Steer Agent** | Send a steering message mid-operation |
| **GSD: Switch Model** | Pick a model from QuickPick |
| **GSD: Cycle Model** | Rotate to the next configured model |
| **GSD: Set Thinking Level** | Choose off / low / medium / high |
| **GSD: Cycle Thinking Level** | Rotate through thinking levels |
| **GSD: Compact Context** | Manually trigger context compaction |
| **GSD: Export Conversation as HTML** | Save the session as HTML |
| **GSD: Show Session Stats** | Display token usage and cost |
| **GSD: Run Bash Command** | Execute a shell command via the agent |
| **GSD: List Available Commands** | Browse and run GSD slash commands |
### Keyboard Shortcuts
| Shortcut | Command |
|----------|---------|
| `Ctrl+Shift+G Ctrl+Shift+N` | New Session |
| `Ctrl+Shift+G Ctrl+Shift+M` | Cycle Model |
| `Ctrl+Shift+G Ctrl+Shift+T` | Cycle Thinking Level |
## Configuration
| Setting | Default | Description |
|---------|---------|-------------|
| `gsd.binaryPath` | `"gsd"` | Path to the GSD binary if not on PATH |
| `gsd.autoStart` | `false` | Start the agent automatically when the extension activates |
| `gsd.autoCompaction` | `true` | Enable automatic context compaction |
## Quick Start
1. Install GSD: `npm install -g gsd-pi`
2. Install this extension
3. Open a project folder in VS Code
4. `Ctrl+Shift+P` → **GSD: Start Agent**
5. Use `@gsd` in Chat or the sidebar to interact with the agent
## How It Works
The extension spawns `gsd --mode rpc` in the background and communicates over JSON-RPC via stdin/stdout. All 25 RPC commands are supported, including streaming events for real-time sidebar updates.
## Links
- [GSD Documentation](https://github.com/gsd-build/gsd-2/tree/main/docs)
- [Getting Started](https://github.com/gsd-build/gsd-2/blob/main/docs/getting-started.md)
- [Issue Tracker](https://github.com/gsd-build/gsd-2/issues)

BIN
vscode-extension/logo.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

4001
vscode-extension/package-lock.json generated Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,182 @@
{
"name": "gsd-2",
"displayName": "GSD-2",
"description": "VS Code integration for the GSD-2 coding agent — sidebar dashboard, @gsd chat participant, and 15 commands",
"publisher": "FluxLabs",
"version": "0.1.0",
"icon": "logo.jpg",
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/gsd-build/gsd-2"
},
"homepage": "https://github.com/gsd-build/gsd-2/blob/main/vscode-extension/README.md",
"bugs": {
"url": "https://github.com/gsd-build/gsd-2/issues"
},
"keywords": [
"ai",
"agent",
"coding",
"gsd",
"chat",
"automation",
"claude",
"openai",
"llm"
],
"galleryBanner": {
"color": "#1a1a2e",
"theme": "dark"
},
"engines": {
"vscode": "^1.95.0"
},
"categories": [
"AI",
"Chat"
],
"activationEvents": [
"onStartupFinished"
],
"main": "dist/extension.js",
"contributes": {
"commands": [
{
"command": "gsd.start",
"title": "GSD: Start Agent"
},
{
"command": "gsd.stop",
"title": "GSD: Stop Agent"
},
{
"command": "gsd.newSession",
"title": "GSD: New Session"
},
{
"command": "gsd.sendMessage",
"title": "GSD: Send Message"
},
{
"command": "gsd.cycleModel",
"title": "GSD: Cycle Model"
},
{
"command": "gsd.cycleThinking",
"title": "GSD: Cycle Thinking Level"
},
{
"command": "gsd.compact",
"title": "GSD: Compact Context"
},
{
"command": "gsd.abort",
"title": "GSD: Abort Current Operation"
},
{
"command": "gsd.exportHtml",
"title": "GSD: Export Conversation as HTML"
},
{
"command": "gsd.sessionStats",
"title": "GSD: Show Session Stats"
},
{
"command": "gsd.runBash",
"title": "GSD: Run Bash Command"
},
{
"command": "gsd.switchModel",
"title": "GSD: Switch Model"
},
{
"command": "gsd.setThinking",
"title": "GSD: Set Thinking Level"
},
{
"command": "gsd.steer",
"title": "GSD: Steer Agent"
},
{
"command": "gsd.listCommands",
"title": "GSD: List Available Commands"
}
],
"keybindings": [
{
"command": "gsd.newSession",
"key": "ctrl+shift+g ctrl+shift+n",
"mac": "cmd+shift+g cmd+shift+n"
},
{
"command": "gsd.cycleModel",
"key": "ctrl+shift+g ctrl+shift+m",
"mac": "cmd+shift+g cmd+shift+m"
},
{
"command": "gsd.cycleThinking",
"key": "ctrl+shift+g ctrl+shift+t",
"mac": "cmd+shift+g cmd+shift+t"
}
],
"viewsContainers": {
"activitybar": [
{
"id": "gsd",
"title": "GSD",
"icon": "$(hubot)"
}
]
},
"views": {
"gsd": [
{
"type": "webview",
"id": "gsd-sidebar",
"name": "GSD Agent"
}
]
},
"chatParticipants": [
{
"id": "gsd.agent",
"name": "gsd",
"fullName": "GSD Agent",
"description": "GSD-2 coding agent",
"isSticky": true
}
],
"configuration": {
"title": "GSD",
"properties": {
"gsd.binaryPath": {
"type": "string",
"default": "gsd",
"description": "Path to the GSD binary"
},
"gsd.autoStart": {
"type": "boolean",
"default": false,
"description": "Automatically start the GSD agent when the extension activates"
},
"gsd.autoCompaction": {
"type": "boolean",
"default": true,
"description": "Enable automatic context compaction"
}
}
}
},
"scripts": {
"build": "tsc",
"watch": "tsc --watch",
"package": "vsce package",
"publish": "vsce publish"
},
"devDependencies": {
"@types/vscode": "^1.95.0",
"@vscode/vsce": "^3.7.1",
"typescript": "^5.7.0"
}
}

View file

@ -0,0 +1,283 @@
import * as vscode from "vscode";
import type { AgentEvent, GsdClient } from "./gsd-client.js";
/**
* Registers the @gsd chat participant that forwards messages to the
* GSD RPC client and streams tool execution events back to the chat.
*/
export function registerChatParticipant(
context: vscode.ExtensionContext,
client: GsdClient,
): vscode.Disposable {
const participant = vscode.chat.createChatParticipant("gsd.agent", async (
request: vscode.ChatRequest,
_chatContext: vscode.ChatContext,
response: vscode.ChatResponseStream,
token: vscode.CancellationToken,
) => {
// Auto-start the agent if not connected
if (!client.isConnected) {
response.progress("Starting GSD agent...");
try {
await client.start();
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
response.markdown(`**Failed to start GSD agent:** ${msg}\n\nMake sure \`gsd\` is installed (\`npm install -g gsd-pi\`) and try again.`);
return;
}
}
// Build the full message, injecting any #file references
let message = request.prompt.trim();
if (!message) {
response.markdown("Please provide a message.");
return;
}
const fileContext = await buildFileContext(request);
if (fileContext) {
message = `${fileContext}\n\n${message}`;
}
// Track streaming state
let agentDone = false;
let totalInputTokens = 0;
let totalOutputTokens = 0;
const filesWritten: string[] = [];
const filesRead: string[] = [];
const eventHandler = (event: AgentEvent) => {
switch (event.type) {
case "agent_start":
response.progress("GSD is working...");
break;
case "tool_execution_start": {
const toolName = event.toolName as string;
const toolInput = event.toolInput as Record<string, unknown> | undefined;
const detail = describeToolCall(toolName, toolInput);
response.progress(detail);
// Track file paths for anchors
if (toolInput?.file_path) {
const fp = String(toolInput.file_path);
if (toolName === "Write" || toolName === "Edit") {
if (!filesWritten.includes(fp)) filesWritten.push(fp);
} else if (toolName === "Read") {
if (!filesRead.includes(fp)) filesRead.push(fp);
}
}
break;
}
case "message_update": {
const assistantEvent = event.assistantMessageEvent as Record<string, unknown> | undefined;
if (!assistantEvent) break;
if (assistantEvent.type === "text_delta") {
const delta = assistantEvent.delta as string | undefined;
if (delta) {
response.markdown(delta);
}
} else if (assistantEvent.type === "thinking_delta") {
// Thinking shown inline — prefix with italic so it's visually distinct
const delta = assistantEvent.delta as string | undefined;
if (delta) {
response.markdown(`*${delta}*`);
}
}
break;
}
case "message_end": {
const usage = event.usage as { inputTokens?: number; outputTokens?: number } | undefined;
if (usage) {
if (usage.inputTokens) totalInputTokens += usage.inputTokens;
if (usage.outputTokens) totalOutputTokens += usage.outputTokens;
}
break;
}
case "agent_end":
agentDone = true;
break;
}
};
const subscription = client.onEvent(eventHandler);
token.onCancellationRequested(() => {
client.abort().catch(() => {});
});
try {
await client.sendPrompt(message);
// Wait for agent_end or cancellation
await new Promise<void>((resolve) => {
if (agentDone) {
resolve();
return;
}
const checkDone = client.onEvent((evt) => {
if (evt.type === "agent_end") {
checkDone.dispose();
resolve();
}
});
token.onCancellationRequested(() => {
checkDone.dispose();
resolve();
});
});
// Show clickable file anchors for written files
if (filesWritten.length > 0) {
response.markdown("\n\n**Files changed:**");
for (const fp of filesWritten) {
const uri = resolveFileUri(fp);
if (uri) {
response.anchor(uri, fp);
response.markdown(" ");
}
}
}
// Token usage summary
if (totalInputTokens > 0 || totalOutputTokens > 0) {
response.markdown(
`\n\n---\n*${totalInputTokens.toLocaleString()} in / ${totalOutputTokens.toLocaleString()} out tokens*`,
);
}
} catch (err) {
const errorMessage = err instanceof Error ? err.message : String(err);
response.markdown(`\n**Error:** ${errorMessage}`);
} finally {
subscription.dispose();
}
});
participant.iconPath = new vscode.ThemeIcon("hubot");
// Follow-up suggestions after each response
participant.followupProvider = {
provideFollowups: (_result, _context, _token) => {
return [
{
prompt: "/gsd status",
label: "$(info) Check status",
title: "Check project status",
},
{
prompt: "/gsd auto",
label: "$(rocket) Run auto mode",
title: "Run autonomous mode",
},
{
prompt: "/gsd capture",
label: "$(note) Capture a thought",
title: "Capture a thought mid-session",
},
];
},
};
return participant;
}
// ─── Helpers ─────────────────────────────────────────────────────────────────
/**
* Build a file context block from any #file references in the chat request.
*/
async function buildFileContext(request: vscode.ChatRequest): Promise<string | null> {
if (!request.references || request.references.length === 0) {
return null;
}
const parts: string[] = [];
for (const ref of request.references) {
if (ref.value instanceof vscode.Uri) {
try {
const bytes = await vscode.workspace.fs.readFile(ref.value);
const content = Buffer.from(bytes).toString("utf-8");
const relativePath = vscode.workspace.asRelativePath(ref.value);
parts.push(`File: ${relativePath}\n\`\`\`\n${content}\n\`\`\``);
} catch {
// Skip unreadable files
}
} else if (ref.value instanceof vscode.Location) {
try {
const doc = await vscode.workspace.openTextDocument(ref.value.uri);
const text = doc.getText(ref.value.range);
const relativePath = vscode.workspace.asRelativePath(ref.value.uri);
const { start, end } = ref.value.range;
parts.push(`File: ${relativePath} (lines ${start.line + 1}${end.line + 1})\n\`\`\`\n${text}\n\`\`\``);
} catch {
// Skip unreadable ranges
}
}
}
return parts.length > 0 ? parts.join("\n\n") : null;
}
/**
* Produce a human-readable progress label for a tool call.
*/
function describeToolCall(toolName: string, input?: Record<string, unknown>): string {
if (!input) {
return `Running: ${toolName}`;
}
switch (toolName) {
case "Read":
return `Reading: ${shortenPath(String(input.file_path ?? ""))}`;
case "Write":
return `Writing: ${shortenPath(String(input.file_path ?? ""))}`;
case "Edit":
return `Editing: ${shortenPath(String(input.file_path ?? ""))}`;
case "Bash": {
const cmd = String(input.command ?? "");
return `$ ${cmd.length > 80 ? cmd.slice(0, 77) + "…" : cmd}`;
}
case "Glob":
return `Searching: ${input.pattern ?? ""}`;
case "Grep":
return `Grep: ${input.pattern ?? ""}`;
case "WebSearch":
return `Searching web: ${String(input.query ?? "").slice(0, 60)}`;
case "WebFetch":
return `Fetching: ${String(input.url ?? "").slice(0, 60)}`;
default:
return `Running: ${toolName}`;
}
}
/**
* Shorten an absolute path to just the last 23 segments for display.
*/
function shortenPath(fp: string): string {
const parts = fp.replace(/\\/g, "/").split("/");
return parts.slice(-3).join("/");
}
/**
* Attempt to resolve a file path string to a VS Code URI.
*/
function resolveFileUri(fp: string): vscode.Uri | null {
try {
const workspaceFolders = vscode.workspace.workspaceFolders;
if (!workspaceFolders || workspaceFolders.length === 0) {
return null;
}
// Absolute path
if (fp.startsWith("/") || /^[A-Za-z]:[\\/]/.test(fp)) {
return vscode.Uri.file(fp);
}
// Relative path — resolve against first workspace folder
return vscode.Uri.joinPath(workspaceFolders[0].uri, fp);
} catch {
return null;
}
}

View file

@ -0,0 +1,359 @@
import * as vscode from "vscode";
import { GsdClient, ThinkingLevel } from "./gsd-client.js";
import { registerChatParticipant } from "./chat-participant.js";
import { GsdSidebarProvider } from "./sidebar.js";
let client: GsdClient | undefined;
let sidebarProvider: GsdSidebarProvider | undefined;
function requireConnected(): boolean {
if (!client?.isConnected) {
vscode.window.showWarningMessage("GSD agent is not running.");
return false;
}
return true;
}
function handleError(err: unknown, context: string): void {
const msg = err instanceof Error ? err.message : String(err);
vscode.window.showErrorMessage(`${context}: ${msg}`);
}
export function activate(context: vscode.ExtensionContext): void {
const config = vscode.workspace.getConfiguration("gsd");
const binaryPath = config.get<string>("binaryPath", "gsd");
const cwd = vscode.workspace.workspaceFolders?.[0]?.uri.fsPath ?? process.cwd();
client = new GsdClient(binaryPath, cwd);
context.subscriptions.push(client);
// Log stderr to an output channel
const outputChannel = vscode.window.createOutputChannel("GSD-2 Agent");
context.subscriptions.push(outputChannel);
client.onError((msg) => {
outputChannel.appendLine(`[stderr] ${msg}`);
});
client.onConnectionChange((connected) => {
if (connected) {
vscode.window.setStatusBarMessage("$(hubot) GSD connected", 3000);
} else {
vscode.window.setStatusBarMessage("$(hubot) GSD disconnected", 3000);
}
});
// -- Sidebar -----------------------------------------------------------
sidebarProvider = new GsdSidebarProvider(context.extensionUri, client);
context.subscriptions.push(
vscode.window.registerWebviewViewProvider(
GsdSidebarProvider.viewId,
sidebarProvider,
),
);
// -- Chat participant ---------------------------------------------------
context.subscriptions.push(registerChatParticipant(context, client));
// -- Commands -----------------------------------------------------------
// Start
context.subscriptions.push(
vscode.commands.registerCommand("gsd.start", async () => {
try {
await client!.start();
// Apply auto-compaction setting
const autoCompaction = vscode.workspace.getConfiguration("gsd").get<boolean>("autoCompaction", true);
await client!.setAutoCompaction(autoCompaction).catch(() => {});
sidebarProvider?.refresh();
vscode.window.showInformationMessage("GSD agent started.");
} catch (err) {
handleError(err, "Failed to start GSD");
}
}),
);
// Stop
context.subscriptions.push(
vscode.commands.registerCommand("gsd.stop", async () => {
await client!.stop();
sidebarProvider?.refresh();
vscode.window.showInformationMessage("GSD agent stopped.");
}),
);
// New Session
context.subscriptions.push(
vscode.commands.registerCommand("gsd.newSession", async () => {
if (!requireConnected()) return;
try {
await client!.newSession();
sidebarProvider?.refresh();
vscode.window.showInformationMessage("New GSD session started.");
} catch (err) {
handleError(err, "Failed to start new session");
}
}),
);
// Send Message
context.subscriptions.push(
vscode.commands.registerCommand("gsd.sendMessage", async () => {
if (!requireConnected()) return;
const message = await vscode.window.showInputBox({
prompt: "Enter message for GSD",
placeHolder: "What should I do?",
});
if (!message) return;
try {
await client!.sendPrompt(message);
} catch (err) {
handleError(err, "Failed to send message");
}
}),
);
// Abort
context.subscriptions.push(
vscode.commands.registerCommand("gsd.abort", async () => {
if (!requireConnected()) return;
try {
await client!.abort();
vscode.window.showInformationMessage("Operation aborted.");
} catch (err) {
handleError(err, "Failed to abort");
}
}),
);
// Cycle Model
context.subscriptions.push(
vscode.commands.registerCommand("gsd.cycleModel", async () => {
if (!requireConnected()) return;
try {
const result = await client!.cycleModel();
if (result) {
vscode.window.showInformationMessage(
`Model: ${result.model.provider}/${result.model.id} (thinking: ${result.thinkingLevel})`,
);
} else {
vscode.window.showInformationMessage("No other models available.");
}
sidebarProvider?.refresh();
} catch (err) {
handleError(err, "Failed to cycle model");
}
}),
);
// Switch Model (QuickPick)
context.subscriptions.push(
vscode.commands.registerCommand("gsd.switchModel", async () => {
if (!requireConnected()) return;
try {
const models = await client!.getAvailableModels();
if (models.length === 0) {
vscode.window.showInformationMessage("No models available.");
return;
}
const items = models.map((m) => ({
label: `${m.provider}/${m.id}`,
description: m.contextWindow ? `${Math.round(m.contextWindow / 1000)}k context` : undefined,
provider: m.provider,
modelId: m.id,
}));
const selected = await vscode.window.showQuickPick(items, {
placeHolder: "Select a model",
});
if (!selected) return;
await client!.setModel(selected.provider, selected.modelId);
vscode.window.showInformationMessage(`Model set to ${selected.label}`);
sidebarProvider?.refresh();
} catch (err) {
handleError(err, "Failed to switch model");
}
}),
);
// Cycle Thinking Level
context.subscriptions.push(
vscode.commands.registerCommand("gsd.cycleThinking", async () => {
if (!requireConnected()) return;
try {
const result = await client!.cycleThinkingLevel();
if (result) {
vscode.window.showInformationMessage(`Thinking level: ${result.level}`);
} else {
vscode.window.showInformationMessage("Cannot change thinking level for this model.");
}
sidebarProvider?.refresh();
} catch (err) {
handleError(err, "Failed to cycle thinking level");
}
}),
);
// Set Thinking Level (QuickPick)
context.subscriptions.push(
vscode.commands.registerCommand("gsd.setThinking", async () => {
if (!requireConnected()) return;
const levels: ThinkingLevel[] = ["off", "low", "medium", "high"];
const selected = await vscode.window.showQuickPick(levels, {
placeHolder: "Select thinking level",
});
if (!selected) return;
try {
await client!.setThinkingLevel(selected as ThinkingLevel);
vscode.window.showInformationMessage(`Thinking level set to ${selected}`);
sidebarProvider?.refresh();
} catch (err) {
handleError(err, "Failed to set thinking level");
}
}),
);
// Compact Context
context.subscriptions.push(
vscode.commands.registerCommand("gsd.compact", async () => {
if (!requireConnected()) return;
try {
await client!.compact();
vscode.window.showInformationMessage("Context compacted.");
sidebarProvider?.refresh();
} catch (err) {
handleError(err, "Failed to compact context");
}
}),
);
// Export HTML
context.subscriptions.push(
vscode.commands.registerCommand("gsd.exportHtml", async () => {
if (!requireConnected()) return;
try {
const saveUri = await vscode.window.showSaveDialog({
defaultUri: vscode.Uri.file("gsd-conversation.html"),
filters: { "HTML Files": ["html"] },
});
const outputPath = saveUri?.fsPath;
const result = await client!.exportHtml(outputPath);
vscode.window.showInformationMessage(`Conversation exported to ${result.path}`);
} catch (err) {
handleError(err, "Failed to export HTML");
}
}),
);
// Session Stats
context.subscriptions.push(
vscode.commands.registerCommand("gsd.sessionStats", async () => {
if (!requireConnected()) return;
try {
const stats = await client!.getSessionStats();
const lines: string[] = [];
if (stats.inputTokens !== undefined) lines.push(`Input tokens: ${stats.inputTokens.toLocaleString()}`);
if (stats.outputTokens !== undefined) lines.push(`Output tokens: ${stats.outputTokens.toLocaleString()}`);
if (stats.cacheReadTokens !== undefined) lines.push(`Cache read: ${stats.cacheReadTokens.toLocaleString()}`);
if (stats.cacheWriteTokens !== undefined) lines.push(`Cache write: ${stats.cacheWriteTokens.toLocaleString()}`);
if (stats.totalCost !== undefined) lines.push(`Cost: $${stats.totalCost.toFixed(4)}`);
if (stats.turnCount !== undefined) lines.push(`Turns: ${stats.turnCount}`);
if (stats.messageCount !== undefined) lines.push(`Messages: ${stats.messageCount}`);
if (stats.duration !== undefined) lines.push(`Duration: ${Math.round(stats.duration / 1000)}s`);
vscode.window.showInformationMessage(
lines.length > 0 ? lines.join(" | ") : "No stats available.",
);
} catch (err) {
handleError(err, "Failed to get session stats");
}
}),
);
// Run Bash Command
context.subscriptions.push(
vscode.commands.registerCommand("gsd.runBash", async () => {
if (!requireConnected()) return;
const command = await vscode.window.showInputBox({
prompt: "Enter bash command to execute",
placeHolder: "ls -la",
});
if (!command) return;
try {
const result = await client!.runBash(command);
outputChannel.appendLine(`[bash] $ ${command}`);
if (result.stdout) outputChannel.appendLine(result.stdout);
if (result.stderr) outputChannel.appendLine(`[stderr] ${result.stderr}`);
outputChannel.appendLine(`[exit code: ${result.exitCode}]`);
outputChannel.show(true);
if (result.exitCode === 0) {
vscode.window.showInformationMessage("Bash command completed successfully.");
} else {
vscode.window.showWarningMessage(`Bash command exited with code ${result.exitCode}`);
}
} catch (err) {
handleError(err, "Failed to run bash command");
}
}),
);
// Steer Agent
context.subscriptions.push(
vscode.commands.registerCommand("gsd.steer", async () => {
if (!requireConnected()) return;
const message = await vscode.window.showInputBox({
prompt: "Enter steering message (interrupts current operation)",
placeHolder: "Focus on the error handling instead",
});
if (!message) return;
try {
await client!.steer(message);
} catch (err) {
handleError(err, "Failed to steer agent");
}
}),
);
// List Available Commands
context.subscriptions.push(
vscode.commands.registerCommand("gsd.listCommands", async () => {
if (!requireConnected()) return;
try {
const commands = await client!.getCommands();
if (commands.length === 0) {
vscode.window.showInformationMessage("No slash commands available.");
return;
}
const items = commands.map((cmd) => ({
label: `/${cmd.name}`,
description: cmd.description ?? "",
detail: `Source: ${cmd.source}${cmd.location ? ` (${cmd.location})` : ""}`,
}));
const selected = await vscode.window.showQuickPick(items, {
placeHolder: "Available slash commands",
});
if (selected) {
// Send the selected command as a prompt
await client!.sendPrompt(selected.label);
}
} catch (err) {
handleError(err, "Failed to list commands");
}
}),
);
// -- Auto-start ---------------------------------------------------------
if (config.get<boolean>("autoStart", false)) {
vscode.commands.executeCommand("gsd.start");
}
}
export function deactivate(): void {
client?.dispose();
sidebarProvider?.dispose();
client = undefined;
sidebarProvider = undefined;
}

View file

@ -0,0 +1,520 @@
import { ChildProcess, spawn } from "node:child_process";
import * as vscode from "vscode";
/**
* Mirrors the RPC command/response protocol from the GSD agent.
* These types are intentionally kept minimal and self-contained so the
* extension has no dependency on the agent packages at runtime.
*/
export type ThinkingLevel = "off" | "low" | "medium" | "high";
export interface RpcSessionState {
model?: { provider: string; id: string; contextWindow?: number };
thinkingLevel: ThinkingLevel;
isStreaming: boolean;
isCompacting: boolean;
steeringMode: "all" | "one-at-a-time";
followUpMode: "all" | "one-at-a-time";
sessionFile?: string;
sessionId: string;
sessionName?: string;
autoCompactionEnabled: boolean;
messageCount: number;
pendingMessageCount: number;
}
export interface ModelInfo {
provider: string;
id: string;
contextWindow?: number;
reasoning?: boolean;
}
export interface SessionStats {
inputTokens?: number;
outputTokens?: number;
cacheReadTokens?: number;
cacheWriteTokens?: number;
totalCost?: number;
messageCount?: number;
turnCount?: number;
duration?: number;
}
export interface BashResult {
stdout: string;
stderr: string;
exitCode: number | null;
}
export interface SlashCommand {
name: string;
description?: string;
source: "extension" | "prompt" | "skill";
location?: "user" | "project" | "path";
path?: string;
}
export interface RpcResponse {
id?: string;
type: "response";
command: string;
success: boolean;
data?: unknown;
error?: string;
}
export interface AgentEvent {
type: string;
[key: string]: unknown;
}
type PendingRequest = {
resolve: (response: RpcResponse) => void;
reject: (error: Error) => void;
timer: ReturnType<typeof setTimeout>;
};
/**
* Client that spawns `gsd --mode rpc` and communicates via JSON lines
* over stdin/stdout. Emits VS Code events for streaming responses.
*/
export class GsdClient implements vscode.Disposable {
private process: ChildProcess | null = null;
private pendingRequests = new Map<string, PendingRequest>();
private requestId = 0;
private buffer = "";
private restartCount = 0;
private restartTimestamps: number[] = [];
private readonly _onEvent = new vscode.EventEmitter<AgentEvent>();
readonly onEvent = this._onEvent.event;
private readonly _onConnectionChange = new vscode.EventEmitter<boolean>();
readonly onConnectionChange = this._onConnectionChange.event;
private readonly _onError = new vscode.EventEmitter<string>();
readonly onError = this._onError.event;
private disposables: vscode.Disposable[] = [];
constructor(
private readonly binaryPath: string,
private readonly cwd: string,
) {
this.disposables.push(this._onEvent, this._onConnectionChange, this._onError);
}
get isConnected(): boolean {
return this.process !== null && this.process.exitCode === null;
}
/**
* Spawn the GSD agent in RPC mode.
*/
async start(): Promise<void> {
if (this.process) {
return;
}
this.process = spawn(this.binaryPath, ["--mode", "rpc", "--no-session"], {
cwd: this.cwd,
stdio: ["pipe", "pipe", "pipe"],
env: { ...process.env },
});
this.buffer = "";
this.process.stdout?.on("data", (chunk: Buffer) => {
this.buffer += chunk.toString("utf8");
this.drainBuffer();
});
this.process.stderr?.on("data", (chunk: Buffer) => {
const text = chunk.toString("utf8").trim();
if (text) {
this._onError.fire(text);
}
});
this.process.on("exit", (code, signal) => {
this.process = null;
this.rejectAllPending(`GSD process exited (code=${code}, signal=${signal})`);
this._onConnectionChange.fire(false);
if (code !== 0 && signal !== "SIGTERM") {
const now = Date.now();
this.restartTimestamps.push(now);
// Keep only timestamps within the last 60 seconds
this.restartTimestamps = this.restartTimestamps.filter(t => now - t < 60_000);
if (this.restartTimestamps.length > 3) {
// Too many crashes within 60s — stop retrying
this._onError.fire(
`GSD process crashed ${this.restartTimestamps.length} times within 60s. Not restarting. Use "GSD: Start Agent" to retry manually.`,
);
} else if (this.restartCount < 3) {
this.restartCount++;
setTimeout(() => this.start(), 1000 * this.restartCount);
}
}
});
this._onConnectionChange.fire(true);
this.restartCount = 0;
}
/**
* Stop the GSD agent process.
*/
async stop(): Promise<void> {
if (!this.process) {
return;
}
const proc = this.process;
this.process = null;
proc.kill("SIGTERM");
await new Promise<void>((resolve) => {
const timeout = setTimeout(() => {
proc.kill("SIGKILL");
resolve();
}, 2000);
proc.on("exit", () => {
clearTimeout(timeout);
resolve();
});
});
this.rejectAllPending("Client stopped");
this._onConnectionChange.fire(false);
}
// =========================================================================
// Prompting
// =========================================================================
/**
* Send a prompt message to the agent.
* Returns once the command is acknowledged; streaming events follow via onEvent.
*/
async sendPrompt(message: string): Promise<void> {
const response = await this.send({ type: "prompt", message });
this.assertSuccess(response);
}
/**
* Interrupt the agent with a steering message while it is streaming.
*/
async steer(message: string): Promise<void> {
const response = await this.send({ type: "steer", message });
this.assertSuccess(response);
}
/**
* Send a follow-up message after the agent has completed.
*/
async followUp(message: string): Promise<void> {
const response = await this.send({ type: "follow_up", message });
this.assertSuccess(response);
}
/**
* Abort current operation.
*/
async abort(): Promise<void> {
const response = await this.send({ type: "abort" });
this.assertSuccess(response);
}
// =========================================================================
// State
// =========================================================================
/**
* Get current session state.
*/
async getState(): Promise<RpcSessionState> {
const response = await this.send({ type: "get_state" });
this.assertSuccess(response);
return response.data as RpcSessionState;
}
// =========================================================================
// Model
// =========================================================================
/**
* Set the active model.
*/
async setModel(provider: string, modelId: string): Promise<void> {
const response = await this.send({ type: "set_model", provider, modelId });
this.assertSuccess(response);
}
/**
* Get available models.
*/
async getAvailableModels(): Promise<ModelInfo[]> {
const response = await this.send({ type: "get_available_models" });
this.assertSuccess(response);
return (response.data as { models: ModelInfo[] }).models;
}
/**
* Cycle through available models.
*/
async cycleModel(): Promise<{ model: ModelInfo; thinkingLevel: ThinkingLevel; isScoped: boolean } | null> {
const response = await this.send({ type: "cycle_model" });
this.assertSuccess(response);
return response.data as { model: ModelInfo; thinkingLevel: ThinkingLevel; isScoped: boolean } | null;
}
// =========================================================================
// Thinking
// =========================================================================
/**
* Set the thinking level explicitly.
*/
async setThinkingLevel(level: ThinkingLevel): Promise<void> {
const response = await this.send({ type: "set_thinking_level", level });
this.assertSuccess(response);
}
/**
* Cycle through thinking levels (off -> low -> medium -> high -> off).
*/
async cycleThinkingLevel(): Promise<{ level: ThinkingLevel } | null> {
const response = await this.send({ type: "cycle_thinking_level" });
this.assertSuccess(response);
return response.data as { level: ThinkingLevel } | null;
}
// =========================================================================
// Compaction
// =========================================================================
/**
* Manually compact the conversation context.
*/
async compact(customInstructions?: string): Promise<unknown> {
const cmd: Record<string, unknown> = { type: "compact" };
if (customInstructions) {
cmd.customInstructions = customInstructions;
}
const response = await this.send(cmd);
this.assertSuccess(response);
return response.data;
}
/**
* Enable or disable automatic compaction.
*/
async setAutoCompaction(enabled: boolean): Promise<void> {
const response = await this.send({ type: "set_auto_compaction", enabled });
this.assertSuccess(response);
}
// =========================================================================
// Retry
// =========================================================================
/**
* Enable or disable automatic retry on failure.
*/
async setAutoRetry(enabled: boolean): Promise<void> {
const response = await this.send({ type: "set_auto_retry", enabled });
this.assertSuccess(response);
}
/**
* Abort a pending retry.
*/
async abortRetry(): Promise<void> {
const response = await this.send({ type: "abort_retry" });
this.assertSuccess(response);
}
// =========================================================================
// Bash
// =========================================================================
/**
* Execute a bash command via the agent.
*/
async runBash(command: string): Promise<BashResult> {
const response = await this.send({ type: "bash", command });
this.assertSuccess(response);
return response.data as BashResult;
}
/**
* Abort a running bash command.
*/
async abortBash(): Promise<void> {
const response = await this.send({ type: "abort_bash" });
this.assertSuccess(response);
}
// =========================================================================
// Session
// =========================================================================
/**
* Start a new session.
*/
async newSession(): Promise<void> {
const response = await this.send({ type: "new_session" });
this.assertSuccess(response);
}
/**
* Get session statistics (token counts, cost, etc.).
*/
async getSessionStats(): Promise<SessionStats> {
const response = await this.send({ type: "get_session_stats" });
this.assertSuccess(response);
return response.data as SessionStats;
}
/**
* Export the conversation as HTML.
*/
async exportHtml(outputPath?: string): Promise<{ path: string }> {
const cmd: Record<string, unknown> = { type: "export_html" };
if (outputPath) {
cmd.outputPath = outputPath;
}
const response = await this.send(cmd);
this.assertSuccess(response);
return response.data as { path: string };
}
/**
* Switch to a different session file.
*/
async switchSession(sessionPath: string): Promise<void> {
const response = await this.send({ type: "switch_session", sessionPath });
this.assertSuccess(response);
}
/**
* Set the display name for the current session.
*/
async setSessionName(name: string): Promise<void> {
const response = await this.send({ type: "set_session_name", name });
this.assertSuccess(response);
}
/**
* Get all conversation messages.
*/
async getMessages(): Promise<unknown[]> {
const response = await this.send({ type: "get_messages" });
this.assertSuccess(response);
return (response.data as { messages: unknown[] }).messages;
}
/**
* Get the text of the last assistant response.
*/
async getLastAssistantText(): Promise<string | null> {
const response = await this.send({ type: "get_last_assistant_text" });
this.assertSuccess(response);
return (response.data as { text: string | null }).text;
}
/**
* List available slash commands.
*/
async getCommands(): Promise<SlashCommand[]> {
const response = await this.send({ type: "get_commands" });
this.assertSuccess(response);
return (response.data as { commands: SlashCommand[] }).commands;
}
dispose(): void {
this.stop();
for (const d of this.disposables) {
d.dispose();
}
}
// -- Private helpers ------------------------------------------------------
private drainBuffer(): void {
while (true) {
const newlineIdx = this.buffer.indexOf("\n");
if (newlineIdx === -1) {
break;
}
let line = this.buffer.slice(0, newlineIdx);
this.buffer = this.buffer.slice(newlineIdx + 1);
if (line.endsWith("\r")) {
line = line.slice(0, -1);
}
if (!line) {
continue;
}
this.handleLine(line);
}
}
private handleLine(line: string): void {
let data: Record<string, unknown>;
try {
data = JSON.parse(line);
} catch {
return; // ignore non-JSON lines
}
// Response to a pending request
if (data.type === "response" && typeof data.id === "string" && this.pendingRequests.has(data.id)) {
const pending = this.pendingRequests.get(data.id)!;
this.pendingRequests.delete(data.id);
clearTimeout(pending.timer);
pending.resolve(data as unknown as RpcResponse);
return;
}
// Streaming event
this._onEvent.fire(data as AgentEvent);
}
private send(command: Record<string, unknown>): Promise<RpcResponse> {
if (!this.process?.stdin) {
return Promise.reject(new Error("GSD client not started"));
}
const id = `req_${++this.requestId}`;
const fullCommand = { ...command, id };
return new Promise<RpcResponse>((resolve, reject) => {
const timer = setTimeout(() => {
this.pendingRequests.delete(id);
reject(new Error(`Timeout waiting for response to ${command.type}`));
}, 30_000);
this.pendingRequests.set(id, { resolve, reject, timer });
this.process!.stdin!.write(JSON.stringify(fullCommand) + "\n");
});
}
private assertSuccess(response: RpcResponse): void {
if (!response.success) {
throw new Error(response.error ?? "Unknown RPC error");
}
}
private rejectAllPending(reason: string): void {
for (const [, pending] of this.pendingRequests) {
clearTimeout(pending.timer);
pending.reject(new Error(reason));
}
this.pendingRequests.clear();
}
}

View file

@ -0,0 +1,445 @@
import * as vscode from "vscode";
import type { GsdClient, SessionStats, ThinkingLevel } from "./gsd-client.js";
/**
* WebviewViewProvider that renders a sidebar panel showing connection status,
* model info, thinking level, token usage, cost, and quick action controls.
*/
export class GsdSidebarProvider implements vscode.WebviewViewProvider {
public static readonly viewId = "gsd-sidebar";
private view?: vscode.WebviewView;
private disposables: vscode.Disposable[] = [];
private refreshTimer: ReturnType<typeof setInterval> | undefined;
constructor(
private readonly extensionUri: vscode.Uri,
private readonly client: GsdClient,
) {
this.disposables.push(
client.onConnectionChange(() => this.refresh()),
client.onEvent((evt) => {
// Refresh on streaming state changes
if (evt.type === "agent_start" || evt.type === "agent_end") {
this.refresh();
}
}),
);
}
resolveWebviewView(
webviewView: vscode.WebviewView,
_context: vscode.WebviewViewResolveContext,
_token: vscode.CancellationToken,
): void {
this.view = webviewView;
webviewView.webview.options = {
enableScripts: true,
};
webviewView.webview.onDidReceiveMessage(async (msg: { command: string; value?: string }) => {
switch (msg.command) {
case "start":
await vscode.commands.executeCommand("gsd.start");
break;
case "stop":
await vscode.commands.executeCommand("gsd.stop");
break;
case "newSession":
await vscode.commands.executeCommand("gsd.newSession");
break;
case "cycleModel":
await vscode.commands.executeCommand("gsd.cycleModel");
break;
case "cycleThinking":
await vscode.commands.executeCommand("gsd.cycleThinking");
break;
case "switchModel":
await vscode.commands.executeCommand("gsd.switchModel");
break;
case "setThinking":
await vscode.commands.executeCommand("gsd.setThinking");
break;
case "compact":
await vscode.commands.executeCommand("gsd.compact");
break;
case "abort":
await vscode.commands.executeCommand("gsd.abort");
break;
case "exportHtml":
await vscode.commands.executeCommand("gsd.exportHtml");
break;
case "sessionStats":
await vscode.commands.executeCommand("gsd.sessionStats");
break;
case "listCommands":
await vscode.commands.executeCommand("gsd.listCommands");
break;
case "toggleAutoCompaction":
if (this.client.isConnected) {
const state = await this.client.getState().catch(() => null);
if (state) {
await this.client.setAutoCompaction(!state.autoCompactionEnabled).catch(() => {});
this.refresh();
}
}
break;
}
});
// Periodic refresh while connected (for token stats)
this.refreshTimer = setInterval(() => {
if (this.client.isConnected) {
this.refresh();
}
}, 10_000);
this.refresh();
}
async refresh(): Promise<void> {
if (!this.view) {
return;
}
let modelName = "N/A";
let sessionId = "N/A";
let sessionName = "";
let messageCount = 0;
let thinkingLevel: ThinkingLevel = "off";
let isStreaming = false;
let isCompacting = false;
let autoCompaction = false;
let stats: SessionStats | null = null;
if (this.client.isConnected) {
try {
const state = await this.client.getState();
modelName = state.model
? `${state.model.provider}/${state.model.id}`
: "Not set";
sessionId = state.sessionId;
sessionName = state.sessionName ?? "";
messageCount = state.messageCount;
thinkingLevel = state.thinkingLevel as ThinkingLevel;
isStreaming = state.isStreaming;
isCompacting = state.isCompacting;
autoCompaction = state.autoCompactionEnabled;
} catch {
// State fetch failed, show defaults
}
try {
stats = await this.client.getSessionStats();
} catch {
// Stats fetch failed
}
}
const connected = this.client.isConnected;
this.view.webview.html = this.getHtml({
connected,
modelName,
sessionId,
sessionName,
messageCount,
thinkingLevel,
isStreaming,
isCompacting,
autoCompaction,
stats,
});
}
dispose(): void {
if (this.refreshTimer) {
clearInterval(this.refreshTimer);
}
for (const d of this.disposables) {
d.dispose();
}
}
private getHtml(info: {
connected: boolean;
modelName: string;
sessionId: string;
sessionName: string;
messageCount: number;
thinkingLevel: ThinkingLevel;
isStreaming: boolean;
isCompacting: boolean;
autoCompaction: boolean;
stats: SessionStats | null;
}): string {
const statusColor = info.connected ? "#4ec9b0" : "#f44747";
const statusText = info.connected
? info.isStreaming
? "Processing..."
: info.isCompacting
? "Compacting..."
: "Connected"
: "Disconnected";
const inputTokens = info.stats?.inputTokens?.toLocaleString() ?? "-";
const outputTokens = info.stats?.outputTokens?.toLocaleString() ?? "-";
const cost = info.stats?.totalCost !== undefined ? `$${info.stats.totalCost.toFixed(4)}` : "-";
const thinkingBadge = info.thinkingLevel !== "off"
? `<span class="badge">${info.thinkingLevel}</span>`
: `<span class="badge muted">off</span>`;
const autoCompBadge = info.autoCompaction
? `<span class="badge">on</span>`
: `<span class="badge muted">off</span>`;
const streamingIndicator = info.isStreaming
? `<div class="streaming-indicator"><span class="spinner"></span> Agent is working...</div>`
: "";
const nonce = getNonce();
return /* html */ `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Content-Security-Policy" content="default-src 'none'; style-src 'unsafe-inline'; script-src 'nonce-${nonce}';">
<style>
body {
font-family: var(--vscode-font-family);
font-size: var(--vscode-font-size);
color: var(--vscode-foreground);
padding: 12px;
margin: 0;
}
.status-row {
display: flex;
align-items: center;
gap: 8px;
margin-bottom: 12px;
}
.status-dot {
width: 10px;
height: 10px;
border-radius: 50%;
background: ${statusColor};
flex-shrink: 0;
}
.streaming-indicator {
display: flex;
align-items: center;
gap: 8px;
padding: 6px 10px;
margin-bottom: 12px;
background: var(--vscode-editor-background);
border-radius: 4px;
border: 1px solid var(--vscode-focusBorder);
font-size: 12px;
}
.spinner {
width: 12px;
height: 12px;
border: 2px solid var(--vscode-foreground);
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}
@keyframes spin {
to { transform: rotate(360deg); }
}
.section {
margin-bottom: 14px;
}
.section-title {
font-size: 11px;
text-transform: uppercase;
opacity: 0.6;
margin-bottom: 6px;
letter-spacing: 0.5px;
}
.info-table {
width: 100%;
}
.info-table td {
padding: 3px 0;
vertical-align: middle;
}
.info-table td:first-child {
opacity: 0.7;
padding-right: 12px;
white-space: nowrap;
}
.info-table td:last-child {
word-break: break-all;
}
.badge {
display: inline-block;
padding: 1px 6px;
border-radius: 3px;
font-size: 11px;
background: var(--vscode-badge-background);
color: var(--vscode-badge-foreground);
}
.badge.muted {
opacity: 0.5;
}
.badge.clickable {
cursor: pointer;
}
.badge.clickable:hover {
opacity: 0.8;
}
.btn-group {
display: flex;
flex-direction: column;
gap: 6px;
}
.btn-row {
display: flex;
gap: 6px;
}
.btn-row button {
flex: 1;
}
button {
display: block;
width: 100%;
padding: 6px 14px;
border: none;
border-radius: 2px;
cursor: pointer;
font-size: var(--vscode-font-size);
color: var(--vscode-button-foreground);
background: var(--vscode-button-background);
}
button:hover {
background: var(--vscode-button-hoverBackground);
}
button.secondary {
color: var(--vscode-button-secondaryForeground);
background: var(--vscode-button-secondaryBackground);
}
button.secondary:hover {
background: var(--vscode-button-secondaryHoverBackground);
}
.token-stats {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 4px 12px;
font-size: 12px;
}
.token-stats .label {
opacity: 0.7;
}
.token-stats .value {
text-align: right;
font-variant-numeric: tabular-nums;
}
</style>
</head>
<body>
<div class="status-row">
<div class="status-dot"></div>
<strong>${statusText}</strong>
</div>
${streamingIndicator}
<div class="section">
<div class="section-title">Session</div>
<table class="info-table">
<tr><td>Model</td><td>${escapeHtml(info.modelName)}</td></tr>
<tr><td>Session</td><td>${escapeHtml(info.sessionName || info.sessionId)}</td></tr>
<tr><td>Messages</td><td>${info.messageCount}</td></tr>
<tr>
<td>Thinking</td>
<td>${thinkingBadge}</td>
</tr>
<tr>
<td>Auto-compact</td>
<td>${autoCompBadge}</td>
</tr>
</table>
</div>
${info.connected && info.stats ? `
<div class="section">
<div class="section-title">Token Usage</div>
<div class="token-stats">
<span class="label">Input</span>
<span class="value">${inputTokens}</span>
<span class="label">Output</span>
<span class="value">${outputTokens}</span>
<span class="label">Cost</span>
<span class="value">${cost}</span>
</div>
</div>
` : ""}
<div class="section">
<div class="section-title">Controls</div>
<div class="btn-group">
${info.connected
? `<button data-command="stop">Stop Agent</button>
<div class="btn-row">
<button class="secondary" data-command="newSession">New Session</button>
<button class="secondary" data-command="switchModel">Model</button>
</div>
<div class="btn-row">
<button class="secondary" data-command="cycleThinking">Thinking</button>
<button class="secondary" data-command="toggleAutoCompaction">Auto-Compact</button>
</div>`
: `<button data-command="start">Start Agent</button>`
}
</div>
</div>
${info.connected ? `
<div class="section">
<div class="section-title">Actions</div>
<div class="btn-group">
<div class="btn-row">
<button class="secondary" data-command="compact">Compact</button>
<button class="secondary" data-command="exportHtml">Export</button>
</div>
<div class="btn-row">
<button class="secondary" data-command="abort">Abort</button>
<button class="secondary" data-command="listCommands">Commands</button>
</div>
</div>
</div>
` : ""}
<script nonce="${nonce}">
const vscode = acquireVsCodeApi();
document.addEventListener('click', (e) => {
const btn = e.target.closest('[data-command]');
if (btn) {
vscode.postMessage({ command: btn.dataset.command });
}
});
</script>
</body>
</html>`;
}
}
function escapeHtml(text: string): string {
return text
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;")
.replace(/"/g, "&quot;");
}
function getNonce(): string {
const chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
let nonce = "";
for (let i = 0; i < 32; i++) {
nonce += chars.charAt(Math.floor(Math.random() * chars.length));
}
return nonce;
}

View file

@ -0,0 +1,19 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"lib": ["ES2022"],
"outDir": "dist",
"rootDir": "src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}