Commit graph

47 commits

Author SHA1 Message Date
Mikael Hugo
725affd126 feat(self-feedback): purpose_anchor on entries (ADR-0000 restoration, v71)
SF is a purpose-to-software compiler — every self_feedback row must name
the milestone vision or slice goal it's filed against, so triage can
prioritize against purpose rather than treating each row as floating.

  - Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT.
    NULL allowed for legacy rows; fresh-DB CREATE includes the column.
  - sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor
    (camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose})
    pushes a LIKE %fragment% filter into the DB layer so triage doesn't
    have to pull the full table.
  - rowToSelfFeedback exposes purposeAnchor, falling back to the JSON
    projection for legacy rows where the column is NULL.
  - headless-feedback CLI: `feedback add --purpose <fragment>` persists
    the anchor; `feedback list --purpose <fragment>` filters by it.
    Omission stays valid — restoration is additive, not breaking.
  - help-text + migration test updated; new vitest covers add/list
    round-trip, NULL-on-omit legacy compat, substring match, and the
    help-text documentation contract.

Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md:
"non-trivial artifacts must name their purpose and consumer."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:51:52 +02:00
Mikael Hugo
aa0d57371e feat(headless): enforce ADR-0000 PDD-fields gate at new-milestone
Restoration of forgotten doctrine: ADR-0000 declares the eight PDD
fields (Purpose, Consumer, Contract, Failure boundary, Evidence,
Non-goals, Invariants, Assumptions) the purpose gate, but
`sf headless new-milestone --context <file>` was accepting any
context including empty or trivially-thin seed docs. This wires a
pre-create check that refuses the run when fields are missing or
too thin, naming exactly which ones so the operator can fix the
seed doc and retry.

- new src/resources/extensions/sf/headless-pdd-check.js: scans
  context for the eight fields (heading and inline-label forms) and
  reports missing/sparse, plus a minimum-spine check (Purpose +
  Consumer + Contract + Evidence-or-Falsifier).
- src/headless.ts calls the check after loadContext, before
  bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text.
- --skip-pdd-check is the migration escape hatch (warning printed,
  PDD gate bypassed) for milestones that pre-date the gate.
- SF-internal auto-bootstrap (autonomous→new-milestone fallback)
  is exempted because the seed is SF-generated, not operator-PDD.
- vitest test covers missing-Purpose, missing-Consumer, all-8,
  sparse, inline-label form, Falsifier-as-Evidence spine, and the
  doctrine field order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:37:06 +02:00
Mikael Hugo
cf32e79578 feat(memory-embeddings): read SF_LLM_GATEWAY_KEY from env as auth.json fallback
Enables CI and containerised deployments without writing secrets to disk.
Auth.json still takes precedence when present.

- readGatewayFromAuthJson now falls back to SF_LLM_GATEWAY_KEY env var
- SF_LLM_GATEWAY_URL env var also supported for endpoint override
- Added tests for env fallback, auth.json preference, and default URL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 17:13:40 +02:00
Mikael Hugo
d8f56e6704 feat(cli): add sf key subcommand for auth.json management
Surgical read/write access to ~/.sf/agent/auth.json without touching
the file directly. All mutations go through AuthStorage so file-lock
and chmod-600 invariants are always respected.

  sf key set    <provider> <api-key>   add/rotate stored key
  sf key get    <provider>             show masked key (last 4 chars)
  sf key remove <provider> [--yes]     remove credential
  sf key list                          list all providers + status

Rationale: SF's source of truth for credentials is auth.json at
runtime — env vars are only used during initial one-time provider
setup. Rotation needs an explicit, audit-friendly path, not implicit
env-driven re-reads. Keys are never echoed in full (last 4 chars
only); remove always prompts unless --yes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:04 +02:00
Mikael Hugo
7ba469cff1 feat(memory): add debug logging to memory extraction pipeline
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The memory extraction system has infrastructure (DB tables, LLM prompts,
unit closeout wiring, embedding backfill) but zero processed units and
only self-feedback-resolution memories. This suggests extraction is
failing silently.

Add debugLog() calls throughout extractMemoriesFromUnit() so we can
observe:
- Skip reasons (mutex busy, rate limited, already processed, file too small)
- Start/done lifecycle per unit
- LLM call and parse outcomes
- Error messages on failure and retry

This makes the extraction pipeline observable via --debug or the
journal/debug log without changing behavior.

Tests: 185 files / 1993 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:09:36 +02:00
Mikael Hugo
18aa257ede refactor: rename review gate agent 2026-05-14 19:43:01 +02:00
Mikael Hugo
12f5eb2279 feat(triage): wire --apply CLI + canonical resolve_issue evidence kinds
Three coupled changes that together complete the operator-facing
--apply surface for sf headless triage:

1. headless.ts: parse --apply from commandArgs and forward to
   handleTriage. The triage option flow now distinguishes inspect
   (--list, --json), one-shot (--run), and orchestrated apply
   (--apply) cleanly.

2. help-text.ts: triage subcommand line + examples block now document
   the --apply mode (triage-decider → rubber-duck pipeline).

3. bootstrap/db-tools.js: resolve_issue tool now accepts the full
   canonical evidence-kind set instead of hardcoding "agent-fix":
   - agent-fix (default; commit-based fix evidence)
   - human-clear (stale, superseded, false positive, intentional close)
   - promoted-to-requirement (with required requirement_id)
   The tool surfaces a clear error when promoted-to-requirement is
   used without requirement_id. The promptGuidelines updated to walk
   callers through choosing the right kind.

   self-feedback-db.test.mjs extended with coverage for all three
   evidence kinds + the missing-requirement_id rejection path.

Together these make sf headless triage --apply genuinely useful: the
agent can produce a plan with any outcome, rubber-duck reviews it,
and the runner applies via resolve_issue with the right evidence
kind per decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:23:10 +02:00
Mikael Hugo
001740680b feat(headless,auto): surface self-feedback queue at autonomous-loop idle
Two thin slices toward sf-mp4rxkwb-l4baga:

1. Help text. The triage and reflect commands have shipped over the
   last few commits but neither was discoverable via `sf headless help`.
   Add both to the command list + add five usage examples covering the
   piping and --run patterns.

2. Bail-time queue notifier. When the autonomous loop is about to break
   for "no-active-milestone" or "milestone-complete" while open
   self-feedback entries still exist, surface the queue with a clear
   pointer to `sf headless triage --list` / `--run`. Best-effort wrapper
   that never throws — the proper fix (triage as a real unit type with
   begin/dispatch/checkpoint/complete lifecycle) is the larger remaining
   slice of the parent entry; this just makes the queue VISIBLE at the
   exact moment operators historically lost track of it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 07:44:34 +02:00
Mikael Hugo
62b19d7ba4 feat(reflection): wire LLM dispatch (sf headless reflect --run)
Phase 1B of the reflection layer: complete the operator-driven loop by
adding actual LLM dispatch. Phase 1A (commit e161a59e2) shipped the
corpus assembler + prompt template + the prompt-emit operator surface.
This commit wires the dispatch end so `sf headless reflect --run`
produces a real report on disk without manual model piping.

Why shell-out to the gemini CLI and not SF's provider abstraction:
reflection is a single-prompt one-shot inference. Going through SF's
full agent dispatch would require a session, model registry, tool
registration, recovery shell — overkill for "render this prompt,
capture text." The gemini CLI handles auth (~/.gemini/oauth_creds.json),
Code Assist project discovery, and protocol drift on its behalf.
Subprocess cost is paid once per reflection (rare).

Implementation:

- reflection.js: runGeminiReflection(prompt, options) spawns
  `gemini --yolo --model <model> -p "<directive>"` and pipes the giant
  rendered template via stdin (gemini -p reads stdin and appends).
  Returns { ok, content, cleanFinish, exitCode, error, stderr }; never
  throws. Defaults to gemini-3-pro-preview (0% used on AI Ultra,
  strongest agentic model with quota). 8-minute timeout.

  cleanFinish detected by REFLECTION_COMPLETE terminator (emitted by
  the prompt template's output contract) — operator gets a warning when
  the report is truncated.

- headless-reflect.ts: --run flag triggers dispatch + report write
  via writeReflectionReport. --model overrides the default. Errors
  surface as JSON or text per --json. Successful runs emit the report
  path on stdout; failures emit error + truncated stderr.

- help-text.ts: documents --run and --model flags.

- Tests (4 new, 13 total): use a fake `gemini` binary on PATH to
  exercise the spawn path without real OAuth/network — covers
  ok+cleanFinish, non-zero exit, hang/timeout, missing-terminator.

All 1538 SF extension tests pass; typecheck clean.

Phase 2 follow-up (still gated on sf-mp4rxkwb-l4baga
triage-not-a-first-class-unit-type landing): reflection-pass becomes a
real autonomous-loop unit type, milestone-close auto-triggers it, the
report's `Recommended new self-feedback entries` section gets parsed
and the entries auto-filed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 04:33:16 +02:00
Mikael Hugo
e161a59e2f feat(reflection): add Phase 1A reflection layer (corpus + prompt + sf headless reflect)
Addresses self-feedback entry sf-mp4uzvcd-pazg6v
(architecture-defect:no-reflection-layer-over-self-feedback-corpus): SF
detected symptoms and triaged individual entries but had no layer that
reasoned about the corpus to recognize recurring structural patterns.
The same architectural pressure expressed itself across multiple entries
with different exact-kind strings; nothing escalated the pattern to a
class. The cognitive work fell on the operator.

This commit ships Phase 1A — the data-assembly + prompt half of the
reflection layer + an operator-driven entry point. Phase 1B (LLM dispatch
via the autonomous loop as a real unit type) lands once
sf-mp4rxkwb-l4baga (triage-not-a-first-class-unit-type) is in.

Files:
- src/resources/extensions/sf/reflection.js (new)
  - assembleReflectionCorpus(basePath): bundles open + recent-resolved
    self-feedback (full json), last 50 commits via git log, milestone +
    slice + task state, all milestone validation verdicts, and prior
    reflection report into one struct. Returns null on prerequisite
    failure (DB closed) so callers downgrade gracefully.
  - renderReflectionCorpusBrief(corpus): renders the corpus into a
    markdown brief the LLM consumes in one turn.
  - writeReflectionReport(basePath, content): persists to
    .sf/reflection/<timestamp>-report.md so next pass detects "what
    changed since last reflection."

- src/resources/extensions/sf/prompts/reflection-pass.md (new)
  - {{include:working-directory}} prefix.
  - Reasoning order: cluster by structural shape (not exact kind),
    identify recurring patterns, identify commit/ledger gaps, identify
    stale validation drift, identify the deepest architectural concern,
    compare against prior report.
  - Output contract: structured markdown report with named sections,
    terminator REFLECTION_COMPLETE for clean-finish detection.
  - Constraints: don't fix anything (reflection layer not executor),
    don't resolve entries without commit-SHA evidence, don't invent IDs.

- src/headless-reflect.ts (new) — sf headless reflect [--json]
  - Pre-opens the project DB via auto-start.openProjectDbIfPresent
    (one-shot bypass path doesn't run the full SF agent bootstrap).
  - Default: emits the rendered prompt brief (template + corpus) for
    operators to pipe into any model. Lets the corpus-assembly layer
    ship and validate before the LLM-dispatch layer is wired.
  - --json: emits raw corpus snapshot for tooling.

- src/headless.ts: registers the new "reflect" command after the
  existing usage block.
- src/help-text.ts: documents it in the headless command list.

- src/resources/extensions/sf/tests/reflection.test.mjs (new, 9 tests):
  null-when-DB-closed; collects open + recent-resolved; excludes >30d
  resolutions; captures milestone/slice/task tree; captures validation
  verdicts; commits returned as array (best-effort tmpdir is ok); brief
  renders all major sections; entry IDs/severity/kind appear in brief;
  writeReflectionReport round-trips through assembleReflectionCorpus's
  previousReport read.

Live smoke verified: sf headless reflect against the real .sf/sf.db
returns 15 open + 23 recent-resolved entries, 50 commits, 2 milestones,
1 validation file (correctly surfacing M001's stale needs-attention
verdict against actual 5/5 slices done — exactly the case that
motivated this layer).

Total: +848 LOC, full SF extension suite (1534 tests) passes,
typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 04:27:29 +02:00
Mikael Hugo
383e495085 feat(headless,gemini-cli): add sf headless usage + unify gemini quota path
Adds a machine-readable headless surface for live LLM-provider usage and
unifies the gemini-cli quota fetch through one helper, removing the
duplication that existed between usage-bar.js and the new package.

1. snapshotGeminiCliAccount in @singularity-forge/google-gemini-cli-provider

   - Single source of truth for { projectId, userTierId, userTierName,
     paidTier, models[] } via setupUser + retrieveUserQuota.
   - Dedups buckets per modelId, keeping the worst (lowest remainingFraction)
     so consumers always see the most-restrictive window. Code Assist
     sometimes returns multiple buckets per model; the pessimistic choice
     is what every consumer needs.
   - discoverGeminiCliModels(cwd?) wraps it for catalog-cache callers that
     only need the IDs.

2. sf headless usage subcommand

   - New src/headless-usage.ts handler. text (default) and --json output.
     Uses the package's snapshot directly — no RPC child, no jiti
     gymnastics — matching the shape of headless-uok-status / headless-doctor.
   - Wired into src/headless.ts after the doctor block.
   - Help text adds the command line.

3. usage-bar.js refactored to delegate

   - fetchGeminiUsage no longer imports gemini-cli-core directly. It calls
     snapshotGeminiCliAccount and reshapes the result into the existing
     { provider, displayName, windows[] } UI contract.
   - Eliminates the duplicate setupUser + retrieveUserQuota code path.
   - The fast existsSync(~/.gemini/oauth_creds.json) pre-flight stays
     so unauth'd users get a friendly message without paying for OAuth
     bootstrap.

4. Model registry refactor (separate track committed alongside)

   - src/resources/extensions/sf/model-registry.ts (new) consolidates
     canonical model identity, capability tier, and generation tags into
     one source of truth that auto-model-selection, benchmark-selector,
     and model-router now consume instead of maintaining parallel maps.

All 1487 tests pass (151 files); typecheck clean for both the package
and the SF extensions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:42:53 +02:00
Mikael Hugo
797db16ae8 feat(sf): S03/T04 — add UOK gate health to sf headless status uok
Adds a new `sf headless status uok` subcommand that queries
gate-run stats and circuit-breaker state from sf.db and formats
them as a markdown table or JSON (--json flag).

- src/headless-uok-status.ts: handler that loads sf-db-gates
  directly (avoids the unimported getDistinctGateIds in gate-runner)
- src/headless.ts: bypass RPC, route 'status uok' to handler
- src/help-text.ts: document the new subcommand
- tests/headless-uok-status.test.mjs: 19 node:test coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 18:31:03 +02:00
Mikael Hugo
378ab702e1 feat(sf): streamline uok state and direct modes 2026-05-08 05:51:06 +02:00
Mikael Hugo
6fc054e7c3 sf snapshot: uncommitted changes after 49m inactivity 2026-05-08 01:07:24 +02:00
Mikael Hugo
89677b7e9b sf snapshot: uncommitted changes after 110m inactivity 2026-05-08 00:17:47 +02:00
Mikael Hugo
e35cc3c6b8 docs: align schedule and package state wording 2026-05-07 03:36:56 +02:00
Mikael Hugo
76b218762b fix: harden sf autonomous runtime 2026-05-06 06:02:46 +02:00
Mikael Hugo
a1fd6cfc05 fix: separate headless transport from autonomous mode 2026-05-06 02:24:15 +02:00
Mikael Hugo
ab6cad4c84 fix: clean provider surfaces and core build 2026-05-05 16:31:53 +02:00
Mikael Hugo
4c98cb8c33 fix: make autonomous mode canonical 2026-05-05 15:42:10 +02:00
Mikael Hugo
f11c877224 style: format repository with biome 2026-05-05 14:31:16 +02:00
Mikael Hugo
47c806d733 fix: version sf extension runtime sources 2026-05-04 23:27:20 +02:00
Mikael Hugo
12e7333f1c feat: stabilize autonomous workflow system 2026-05-01 20:18:50 +02:00
Mikael Hugo
cd69e85608 Harden SF model routing and harness contracts 2026-04-30 07:41:24 +02:00
Mikael Hugo
b24f426f2b batch: snapshot of in-flight v2 work
This commit captures uncommitted modifications that accumulated in the
working tree across multiple in-progress workstreams. It is a snapshot
to clear the deck before sf v3 work begins; individual workstreams
should land separately on top of this.

Notable additions:
- trace-collector.ts, traces.ts, src/tests/trace-export.test.ts —
  trace export plumbing
- biome.json — Biome linter configuration
- .gitignore — exclude native/npm/**/*.node compiled binaries

The bulk of the diff is across src/resources/extensions/sf/ (301 files)
and src/resources/extensions/sf/tests/ (277 files), reflecting the
ongoing sf extension work. Specific feature commits should follow this
snapshot rather than being archaeology'd out of it.

The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary
was left out of the commit — it's now gitignored and built locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:42:31 +02:00
Mikael Hugo
f98a1e360e batch: codex-rescue session output (multiple in-flight tasks)
Combined output of multiple parallel codex-rescue runs that produced
working-tree edits but didn't commit. Tasks contributing:

- prefs: per-provider model allow-list (provider_model_allow) — manual
- TUI scroll + unresponsive (a7884d1a / bt3fpn4y2)
- planningMeeting required (aa09e904 / br127l763)
- Logs UX 4-pack (a5c65314 / btcplhu7f)
- Gate auto-resolve + completion nudge (ae4c8b64 / bw1w1fjkp)
- sf_task_complete atomic + retry (a7a079b4 / b20cy5owv)
- Multi-model meeting + minimax M2.7 + draft promotion (a756faac / task-moifjknd-lwjc98)
- Per-role slice prompts (a94c3e1a)
- Per-role vision-meeting prompts (afd165a0 / task-moifple5-lcwtjl)
- Schema sweep (ac994b1e / task-moifq7pu-83coqz)
- Flow audit (ad26ecfd / bttj4vrqm)

Typecheck passes. Tests not run as a full suite — spot-check after merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 11:52:42 +02:00
ace-pm
6b0ac484ba refactor: update log prefixes and string values from gsd- to sf- namespace
Updates channel prefixes, log messages, comments, and configuration values
across daemon, mcp-server, and related packages to complete the rebrand from
gsd to sf-run naming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:37:12 +02:00
ace-pm
35dc87ef53 chore: sync workspace state after rebrand
- Rebrand commits already in history (gsd → forge)
- Sync pre-existing doc, docker, and CI config updates
- All rebrand artifacts verified in place:
  * Native crates: forge-engine, forge-ast, forge-grep
  * Log prefixes: [forge] across 22+ files
  * Binary: ~/bin/sf-run
  * Workspace scopes: @sf-run/*, @singularity-forge/*
  * Nix flake: Rust toolchain ready

System ready for: nix develop && bun run build:native

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:54:20 +02:00
Nils Reeh
15bccca78f feat(graph): implement knowledge graph system (closes #4202)
Ports the v1 graphify system to v2 as a native TypeScript implementation.
The knowledge graph builds semantic relationships between milestones, slices,
tasks, and knowledge entries — and injects relevant subgraphs automatically
into every agent dispatch prompt.

## Core implementation (packages/mcp-server/src/readers/graph.ts)

- `buildGraph(projectDir)` — walks all .gsd/ artifacts (STATE.md,
  milestone PLANs, slice PLANs, KNOWLEDGE.md), extracts nodes and edges
  with confidence tiers (EXTRACTED / INFERRED / AMBIGUOUS). Parse errors
  skip the node rather than crashing.
- `writeGraph(gsdRoot, graph)` — atomic write via tmp file + rename.
- `writeSnapshot(gsdRoot)` — saves a diff baseline before each rebuild.
- `graphQuery(projectDir, term, budget?)` — BFS subgraph search with
  case-insensitive matching on label + description; trims AMBIGUOUS edges
  first, then INFERRED, respecting the token budget (default 4 000).
- `graphStatus(projectDir)` — freshness check; stale = older than 24 h.
- `graphDiff(projectDir)` — compares current graph to last snapshot,
  returns added / removed / changed counts for nodes and edges.

## MCP tool (packages/mcp-server/src/server.ts)

Registers `gsd_graph` immediately after `gsd_knowledge` with four modes:
build | query | status | diff. All errors returned as isError: true.

## CLI subcommand (src/cli.ts, src/help-text.ts)

`gsd graph build|status|query <term>|diff` — follows the established
`if (cliFlags.messages[0] === '...')` dispatch pattern. Uses
`resolveGsdRoot()` for git-root-aware path resolution (not a naive
`.gsd` append). Help text updated with correct positional argument format.

## Auto-rebuild after slice completion
(src/resources/extensions/gsd/tools/complete-slice.ts)

Fire-and-forget `buildGraph → writeGraph` triggered after every slice
completion. Uses `@gsd-build/mcp-server` package import (not a relative
src path) and `resolveGsdRoot()` for correct path resolution in monorepos.

## Graph-aware dispatch injection
(src/resources/extensions/gsd/graph-context.ts,
 src/resources/extensions/gsd/auto-prompts.ts)

`inlineGraphSubgraph(projectDir, term, { budget })` queries the graph and
formats the result as a `### Knowledge Graph Context` markdown block,
consistent with all other inlined context blocks. Adds a stale warning
annotation when the graph is older than 24 h. Returns null (graceful
skip) when graph.json is missing, the query returns zero nodes, or the
import fails — no agent dispatch is ever blocked by graph availability.

Injected into three prompt builders:
- `buildResearchSlicePrompt` — 3 000 token budget
- `buildPlanSlicePrompt`     — 3 000 token budget
- `buildExecuteTaskPrompt`   — 2 000 token budget

## Tests

- 22 tests for the core graph reader (graph.test.ts)
- 14 tests for the dispatch injection helper (graph-context.test.ts)
- All tests use real on-disk fixtures (no module mocking needed)
- Full suite: 6 318 passed, 0 failed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 02:20:49 +02:00
Claude
0ed576ac00 Make model selection model-agnostic
Remove hard-coded Anthropic/Claude defaults and silent provider swaps so
the app honors whatever model/provider the user has configured.

- src/cli.ts: drop the anthropic->claude-code auto-migration blocks that
  were rewriting the user's saved defaultProvider on every startup.
- packages/pi-coding-agent/src/core/model-resolver.ts: delete the
  defaultModelPerProvider table, drop the "recommended variant" swap
  that silently upgraded e.g. claude-opus-4-6 to -extended, and replace
  the provider-iteration first-available fallback with provider-sticky
  (user's saved provider first, then first registry entry).
- src/startup-model-validation.ts: replace the openai/anthropic-first
  fallback chain with Pi-default -> same-provider -> first-available.
- src/help-text.ts: use a generic provider/model-id example for --model
  instead of claude-opus-4-6.
- src/tests/startup-model-validation.test.ts: update the fallback test
  to assert provider stickiness rather than a specific Claude model id.

https://claude.ai/code/session_01CvuUuzuVjRcQN25263nG6V
2026-04-13 14:03:35 +00:00
Tom Boucher
9b6ff01471 docs: add provider setup guide for third-party LLM providers (#3294)
* docs: add provider setup guide and improve onboarding hints

Fixes #2161

Add docs/providers.md with step-by-step setup instructions for every
supported LLM provider: OpenRouter, Ollama, LM Studio, vLLM, SGLang,
and all built-in providers. Includes env var names, example configs,
common pitfalls, and verification steps.

Improve onboarding wizard:
- Add URL hints to provider selection list
- Show common local endpoints when choosing Custom (OpenAI-compatible)
- Add post-setup guidance for OpenRouter and custom endpoints
- Reference docs/providers.md for compat troubleshooting

Update cross-references in getting-started.md, troubleshooting.md,
docs/README.md, and help-text.ts to link to the new guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: verify config help mentions OpenRouter, Ollama, and docs/providers.md

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: trek-e <trek-e@users.noreply.github.com>
2026-04-05 00:48:19 -04:00
Tom Boucher
05b7cb95cb fix: route gsd auto to headless runner to prevent hang on piped stdin/stdout (#3057)
`gsd auto` was not handled as a subcommand — it fell through to the
interactive TUI, which hangs indefinitely when stdin/stdout are piped
(non-TTY). Add `auto` as a recognized subcommand that rewrites argv
and delegates to `runHeadless(parseHeadlessArgs(...))`, matching the
existing `gsd headless auto` behavior.

Also adds `gsd auto` to TTY error hints and help text.

Closes #2732

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:44:04 -06:00
Lex Christopherson
d355ab93fb test: Added --output-format text|json|stream-json flag, standardized ex…
- "src/headless-types.ts"
- "src/headless-events.ts"
- "src/headless.ts"
- "src/help-text.ts"
- "src/tests/headless-cli-surface.test.ts"

GSD-Task: S02/T01
2026-03-26 11:34:21 -06:00
Jay The Reaper
68902466ac fix(core): address PR review feedback for non-apikey provider support (#2452)
- Strip apiKey from options at streamSimple registration boundary for
  externalCli/none providers — enforced structurally, not by convention
- Add registration-time validation: externalCli/none requires streamSimple,
  rejects contradictory apiKey, improved error messages mentioning authMode
- Cache legacy hook module imports to prevent side-effect double-execution
- Add isReady() trust boundary documentation
- Add inline comments on compaction-orchestrator apiKey flow
- Refactor package-commands.test.ts to use t.after() cleanup
- Add lifecycle-hooks.test.ts with 24 unit tests for readManifestRuntimeDeps,
  collectRuntimeDependencies, verifyRuntimeDependencies, resolveLocalSourcePath
- Expand model-registry-auth-mode.test.ts with streamSimple apiKey boundary
  tests and registration validation tests (80 total tests across all files)
- Add afterRemove deleted-directory edge case test
- Fix help-text.ts wording: "lifecycle hooks" → "post-install validation"
- Fix event.message null check documentation (intentional tightening)
2026-03-25 08:45:20 -06:00
Jay The Reaper
bc278d12d9 feat(core): support for 'non-api-key' provider extensions like Claude Code CLI (#2382)
* feat(core): add generic native post-install hooks for package install

* feat(core): add before/after install/remove lifecycle hooks

* refactor(core): remove postInstall alias from lifecycle hook fallback

* feat(core): complete authMode support for keyless providers

The initial authMode implementation fixed model-registry, sdk, and
fallback-resolver but missed agent-session.ts (6 callsites) and
compaction-orchestrator.ts (2 callsites) that block externalCli
providers at runtime.

Architecture: separate readiness gating from credential retrieval.
- isProviderRequestReady(): authMode-aware readiness check
- getApiKey()/getApiKeyForProvider(): return undefined for
  externalCli/none providers instead of triggering auth errors
- All 8 callsites in agent-session and compaction-orchestrator
  now gate on readiness, not key presence
- Downstream signatures (compaction, branch-summarization) accept
  apiKey: string | undefined
- Replaced hardcoded ollama exception in discoverModels with
  isProviderRequestReady

Zero behavioral change for classic apiKey/oauth providers.

* feat(core): add isReady callback for provider readiness verification

Extensions can now provide an isReady() callback when registering any
provider. isProviderRequestReady() calls it before default auth checks,
allowing providers to verify actual reachability (CLI authenticated,
API key valid, service online) rather than relying solely on credential
presence.

* test(core): expand authMode test coverage

Cover all four auth modes (apiKey, oauth, externalCli, none),
isReady callback behavior, getProviderAuthMode defaults,
isProviderRequestReady for each mode, getAvailable filtering,
and getApiKey early-return for keyless providers.

* chore: remove provider-api-bridge files from this branch

These files implement GSD core → provider-api wiring (deps + tool
registry) and belong in a separate PR. Reverts register-extension.ts
to upstream state.
2026-03-24 15:50:12 -06:00
TÂCHES
35cee7b05f feat: add -w/--worktree CLI flag for isolated worktree sessions (#1247)
* feat: add -w/--worktree CLI flag to start in an isolated worktree

Enables `gsd -w` to auto-create a randomly-named worktree (adjective-verbing-noun
pattern) and `gsd -w my-feature` for named worktrees. Reuses existing worktree
infrastructure under .gsd/worktrees/ with worktree/<name> branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: full worktree lifecycle — subcommands, auto-commit on exit, status banners

Major improvements to the -w/--worktree system:

- `gsd worktree list` — show worktrees with status (files changed, commits, dirty)
- `gsd worktree merge [name]` — squash-merge into main and clean up
- `gsd worktree clean` — remove all merged/empty worktrees
- `gsd worktree remove <name>` — remove with --force safety gate
- `gsd -w` (no name) resumes the only active worktree instead of creating a new one
- `gsd -w` with multiple active worktrees shows a picker
- Auto-commit dirty work on session exit (session_shutdown hook)
- Status banner on normal `gsd` launch when unmerged worktrees exist
- Full help text with lifecycle documentation (`gsd worktree --help`)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 14:57:25 -06:00
Juan Francisco Lebrero
fe0f4f35e6 feat: add --events flag for JSONL stream filtering (#1000)
Allow orchestrators to filter the JSONL event stream to specific event
types, reducing stdout noise. The filter applies only to output —
internal processing (completion detection, supervised mode, answer
injection) is unaffected.

- New `--events <types>` flag (comma-separated, implies `--json`)
- Filter applied at stdout write point, all events still processed internally
- Updated help-text and SKILL.md with examples
- Tests for argument parsing and filter matching logic
2026-03-17 17:35:44 -06:00
Juan Francisco Lebrero
8df6f7b75a feat: add --answers flag for headless answer injection (#982)
Pre-supply answers and secrets for non-interactive headless runs via a
declarative JSON file. Two main use cases:

1. Provide secrets that today get lost in headless mode (secure_env_collect
   returns null in RPC mode). Secrets are injected as env vars into the
   RPC child process.
2. Override default auto-responses when the first option isn't desired.

Uses two-phase correlation: observe tool_execution_start events for
question metadata, then match extension_ui_request events by title to
look up pre-supplied answers. Out-of-order events are buffered with a
500ms timeout.

Coexists with --supervised: injector tries first, then supervised mode,
then auto-responder.
2026-03-17 17:19:55 -06:00
Juan Francisco Lebrero
99c3375f18 feat: add gsd headless query for instant state inspection (#951)
* feat: add `gsd headless query` for structured state inspection

Add read-only query commands that return parseable JSON without
spawning an LLM session. Decouples orchestrators from .gsd/ internals.

Targets: phase, cost, progress, next

* simplify: single `query` command returning full snapshot

Replace 4 query targets (phase/cost/progress/next) with one command
that returns everything in a single JSON object. Caller uses jq.

Also document query in README.md and docs/commands.md.

* docs: update gsd-headless skill and references

- SKILL.md: add missing flags (--supervised, --max-restarts, --response-timeout)
- references/commands.md: add query, discuss, remote, inspect, forensics
- references/multi-session.md: fix spawning syntax, use query for budget

* fix: remove integration tests that entered via merge

These files belong to the feat/headless-orchestration-skill branch
and were accidentally included during the upstream/main merge.
They contain TS errors (sessionTerminated scope issue) that break CI.

* fix: restore headless-command.ts deleted by accident
2026-03-17 16:03:59 -06:00
Juan Francisco Lebrero
bdbe739ebc feat: headless orchestration skill + supervised mode (#905) 2026-03-17 11:08:15 -06:00
TÂCHES
69d37d3196 feat: add headless new-milestone command for programmatic milestone creation (#781)
Enables fully headless project creation from specification documents via
`gsd headless new-milestone --context spec.md`. Supports file input,
stdin piping, inline text, and optional auto-mode chaining with --auto.

Key changes:
- headless.ts: new CLI flags (--context, --context-text, --auto, --verbose),
  context loading (file/stdin/inline), .gsd/ bootstrapping, auto-mode chaining
- commands.ts: /gsd new-milestone command routing via headless context temp file
- guided-flow.ts: showHeadlessMilestoneCreation(), bootstrapGsdProject(),
  buildHeadlessDiscussPrompt() for non-interactive milestone creation
- prompts/discuss-headless.md: headless variant of discuss.md that skips Q&A
  rounds and works entirely from the provided specification
- help-text.ts: documentation for new-milestone subcommand and flags

Closes #765

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 21:28:56 -06:00
frizynn
f56b8c69f0 fix: simplify headless flags, add missing imports, document headless mode
- Remove --verbose flag from headless (use --json for detailed output)
- Remove redundant sawToolExecution state variable
- Remove unused rejectCompletion
- Add missing build*Prompt imports in auto.ts (fixes CI typecheck:extensions)
- Document headless mode in README.md and docs/commands.md
- Simplify help text with examples instead of exhaustive command catalog
2026-03-16 19:46:56 -03:00
frizynn
b09e2a549c feat: add gsd headless CLI subcommand for non-interactive auto-mode
Adds a first-class `gsd headless` command that runs auto-mode without a
TUI by spawning a child process in RPC mode via RpcClient. Useful for
CI/CD pipelines, scripts, and unattended execution.

CLI interface:
  gsd headless                  - Run auto-mode until complete
  gsd headless --step           - Run one unit only (sends /gsd next)
  gsd headless --timeout 300000 - Custom timeout (default 5 min)
  gsd headless --json           - Forward RPC events as JSONL to stdout
  gsd headless --verbose        - Show full agent text and tool results
  gsd headless --model <id>     - Override model

Exit codes: 0 = complete, 1 = error/timeout, 2 = blocked

Features:
- Extension UI auto-responder (handles select, confirm, input, editor,
  notify, setStatus, setWidget, setTitle, set_editor_text)
- Completion detection via terminal notification keywords + idle timeout
- Human-readable progress output to stderr
- SIGINT/SIGTERM forwarding for clean shutdown
- Child process crash detection
- Completion summary with diagnostics on failure
2026-03-16 19:45:39 -03:00
sgodoy90
72cef21876 feat: add gsd sessions subcommand for session picker
Add a new `gsd sessions` subcommand that lists all saved sessions for
the current directory and lets the user interactively pick one to resume.

Currently `gsd --continue` only resumes the most recent session, with no
way to access older conversations. This change adds:

- `gsd sessions` subcommand that calls SessionManager.list() to enumerate
  all sessions for the current working directory
- Interactive numbered list showing date, message count, session name (if
  set), and a preview of the first message
- Selection by number to resume any past session via SessionManager.open()
- Subcommand help text (`gsd sessions --help`)
- Help text entry in the main `gsd --help` output

The implementation uses only existing SessionManager APIs (list, open) -
no SDK changes required.
2026-03-16 15:27:10 -06:00
Jeremy McSpadden
8d56ab2893 feat: add MCP server mode, /lint skill, E2E smoke tests
- Add native MCP server mode (--mode mcp): exposes GSD's tools via
  Model Context Protocol over stdin/stdout for Claude Desktop, VS Code,
  and other MCP-compatible clients. Uses @modelcontextprotocol/sdk.
- Add /lint skill: auto-detects ESLint, Biome, Prettier, rustfmt,
  gofmt, Black, Ruff and runs with structured output
- Add 6 E2E smoke tests: --version, --help, config --help, update
  --help, --list-models, and --mode text --print startup
- Fix diff-context.ts stdio type for CI compatibility
- Fix token-counter.ts tiktoken import for extensions typecheck
- Update help text and CLI to include --mode mcp
2026-03-16 13:56:31 -05:00
Jeremy McSpadden
0b3163d297 feat: add /review skill, /test skill, chokidar file watcher, subcommand help
- Add /review skill: reviews staged/unstaged/commit changes for security,
  performance, bugs, and quality with structured findings by severity
- Add /test skill: auto-detects test framework, generates comprehensive
  tests for source files, or runs suites with failure analysis
- Add chokidar file watcher: watches ~/.gsd/agent/ for config changes
  (settings.json, auth.json, models.json, extensions/) with debounced
  events on an EventBus
- Add --help per subcommand: `gsd config --help` and `gsd update --help`
  show subcommand-specific usage information
- 8 new file-watcher tests (start/stop, event emission, debouncing,
  unrelated file filtering)
2026-03-16 13:47:25 -05:00
Jeremy McSpadden
a79e953caa refactor: deduplicate help text, cross-platform validate-pack, fix dev.js
- Extract duplicated help text from loader.ts and cli.ts into shared
  help-text.ts module (single source of truth)
- Convert validate-pack.sh to Node.js for Windows compatibility
- Fix dev.js using unnecessary npx for tsc (it's a devDependency,
  use node_modules/.bin/tsc directly)
2026-03-16 13:29:31 -05:00