Commit graph

3806 commits

Author SHA1 Message Date
Jeremy
6dc7c0ec1d test(01-05): add capability-aware routing integration tests
- Full pipeline with capability_routing: true returns capability-scored decision
- capability_routing: false falls back to tier-only with no capabilityScores
- Single eligible model (pinned) skips scoring and uses tier-only
- Unknown model gets uniform score of 50 and competes in scoring
- capabilityOverrides change scoring outcome in scoreEligibleModels
- capabilityOverrides pass through resolveModelForComplexity to STEP 2
- Regression guards: routing disabled, unknown model, no-downgrade-needed all pass
- All 51 tests pass (42 existing + 9 new integration)
2026-04-04 10:56:23 -05:00
Jeremy
1645be072c feat(01-05): fire before_model_select hook, add verbose scoring output, load capability overrides
- Fire pi.emitBeforeModelSelect() in selectAndApplyModel before resolveModelForComplexity
- Hook override bypasses capability scoring entirely with tier-only selectionMethod
- Verbose output shows capability-scored breakdown: model scores sorted descending
- Add loadCapabilityOverrides() to model-router.ts for deep-merge with built-in profiles
- Extend resolveModelForComplexity signature with optional capabilityOverrides parameter
- Pass capabilityOverrides through to scoreEligibleModels in STEP 2
2026-04-04 10:56:22 -05:00
Jeremy
6cc42bb504 feat(01-04): register before_model_select placeholder handler in GSD hooks
- Add before_model_select handler registration inside registerHooks()
- Handler returns undefined (no override) to let capability scoring proceed
- Comment references ADR-004 for traceability
- Serves as documentation and ensures event type is registered for Plan 05 wiring
2026-04-04 10:56:06 -05:00
Jeremy
1cea7fb8bc feat(01-04): add BeforeModelSelectEvent to extension API and wire emission
- Add BeforeModelSelectEvent interface and BeforeModelSelectResult type to types.ts
- Add on('before_model_select') subscription overload to ExtensionAPI interface
- Add emitBeforeModelSelect() method to ExtensionAPI interface and ExtensionRuntimeState
- Implement emitBeforeModelSelect() on ExtensionRunner using invokeHandlers (first-override-wins)
- Bind runner's emitBeforeModelSelect into shared runtime at construction time
- Wire emitBeforeModelSelect delegation through createExtensionAPI in loader.ts
2026-04-04 10:56:06 -05:00
Jeremy
1866ccf781 feat(01-03): wire taskMetadata from selectAndApplyModel to resolveModelForComplexity
- Pass unitType and classification.taskMetadata as 5th and 6th args to resolveModelForComplexity
- Completes end-to-end data pipeline: classifier extracts metadata, attaches to ClassificationResult, auto-model-selection passes through to router for capability scoring
2026-04-04 10:56:06 -05:00
Jeremy
accee43563 feat(01-03): insert STEP 2 capability scoring into resolveModelForComplexity
- Add unitType and taskMetadata optional params to resolveModelForComplexity
- Replace findModelForTier with getEligibleModels for multi-model eligible set
- Insert STEP 2 scoring block: activates when capability_routing enabled, eligible.length > 1, unitType provided
- Add buildFallbackChain helper to deduplicate fallback assembly logic
- Scoring returns capability-scored selectionMethod with capabilityScores and taskRequirements
- Single-model and zero-model paths fall through to tier-only behavior
- All 42 existing tests pass unchanged (backward compat via optional params)
2026-04-04 10:55:28 -05:00
Jeremy
bf918d30d5 test(01-02): add unit tests for scoring functions and taskMetadata passthrough
- Add scoreModel, computeTaskRequirements, scoreEligibleModels, getEligibleModels
  describe blocks to model-router.test.ts (27 new tests)
- Add ClassificationResult taskMetadata describe block to complexity-classifier.test.ts
  (4 new tests: execute-task populated, hook undefined, plan-slice undefined, extractTaskMetadata export)
- Add getModelTier unknown-default tests verifying standard tier (not heavy) per D-15
- All 42 model-router tests pass, all 32 complexity-classifier tests pass
- All 36 pre-existing capability-router tests continue to pass
2026-04-04 10:54:02 -05:00
Jeremy
409cd77cbc feat(01-01): add taskMetadata to ClassificationResult and export extractTaskMetadata
- Add taskMetadata?: TaskMetadata to ClassificationResult in complexity-classifier.ts
- Add taskMetadata?: TaskMetadata to ClassificationResult in types.ts (duplicate interface)
- Export extractTaskMetadata() so it can be imported by model-router.ts
- Refactor classifyUnitComplexity() to extract metadata once for execute-task (eliminates double-extraction at adaptive learning step)
- Populate taskMetadata field on ClassificationResult for execute-task units
- Set taskMetadata: undefined explicitly on hook unit fast-path
2026-04-04 10:53:45 -05:00
Jeremy
0ccd3fd8a4 feat(01-01): add capability types, data tables, and scoring functions to model-router
- Import TaskMetadata from complexity-classifier
- Add capability_routing?: boolean to DynamicRoutingConfig
- Add capabilityScores, taskRequirements, selectionMethod fields to RoutingDecision
- Add ModelCapabilities interface (7 dimensions: coding, debugging, research, reasoning, speed, longContext, instruction)
- Add MODEL_CAPABILITY_PROFILES data table with 9 model profiles
- Add BASE_REQUIREMENTS data table with 11 unit type vectors
- Add exported scoreModel() pure function (weighted average, 0-100 range)
- Add exported computeTaskRequirements() with metadata-driven vector refinement
- Add exported scoreEligibleModels() with cost-preferring tie-break sorting
- Add exported getEligibleModels() extracted from findModelForTier() logic
- Add selectionMethod: "tier-only" to all 5 return sites in resolveModelForComplexity()
- Change getModelTier() unknown default from "heavy" to "standard" (per D-15)
- Add capability_routing: true to defaultRoutingConfig()
2026-04-04 10:53:45 -05:00
Jeremy
e89bf7d18e test(01-01): add failing tests for capability-aware model routing
- Tests for scoreModel weighted average, edge cases (empty/unknown dims)
- Tests for computeTaskRequirements with all branch paths (docs, concurrency, migration, large-file)
- Tests for MODEL_CAPABILITY_PROFILES (9 models, 7 dimensions each)
- Tests for BASE_REQUIREMENTS (all 11 unit types)
- Tests for scoreEligibleModels (sorting, tie-breaking, unknown models, overrides)
- Tests for getEligibleModels (tier filtering, explicit config, empty result)
- Tests for DynamicRoutingConfig.capability_routing and RoutingDecision.selectionMethod
2026-04-04 10:51:31 -05:00
Jeremy
851bb0bebe fix(pi-coding-agent): upgrade Kotlin LSP to official Kotlin/kotlin-lsp
Closes #3493
2026-04-04 10:45:15 -05:00
Jeremy
3f9fa9351f fix(test): use correct RequirementCounts type fields in edge case tests
Replace non-existent `invalidated` field with the correct type fields
(`outOfScope`, `blocked`, `total`) to pass typecheck.
2026-04-04 10:25:00 -05:00
Jeremy
181243a933 chore: init gsd 2026-04-04 10:00:43 -05:00
Jeremy
62cc474002 test(gsd): fill state machine E2E verification gaps (#3498)
Add 102 integration tests across two new files covering state machine
edge cases, runtime failures, and boundary conditions not exercised
by the existing live-validation suite.

Closes #3498
2026-04-04 10:00:07 -05:00
github-actions[bot]
6aaa244742 release: v2.60.0 2026-04-04 14:46:08 +00:00
Tibsfox
920fed7122 fix(pi-ai): extend repairToolJson to handle XML tags and truncated numbers
The existing repairToolJson only handles YAML bullet lists (#2660).
Two additional malformation patterns from smaller models now cause
tool call failures and stuck retry loops:

1. XML parameter tags mixed into JSON values (#3403):
   LLMs (especially Haiku-class) sometimes emit hybrid XML/JSON
   syntax like <parameter name="X">value</parameter> inside
   JSON string values. Add stripXmlParameterTags() to remove
   the tags while preserving content.

2. Truncated numeric values (#3464):
   Smaller models emit incomplete numbers like "exitCode": -,
   or "durationMs": , when values are cut off mid-generation.
   Add repairTruncatedNumbers() to replace these with 0.

Both repairs run before the existing YAML bullet repair phase.
The AJV validation layer (coerceTypes: true) then handles any
remaining string-to-number coercion.

Adds 13 new tests covering detection and repair for both patterns.

Closes #3464, closes #3403, addresses #3369
2026-04-04 03:59:55 -07:00
Tom Boucher
a061e3c276 feat: stop/backtrack capture classifications for milestone regression (#3488)
* feat: add stop/backtrack capture classifications for milestone regression (#3487)

Adds 4-layer methodology for halting auto-mode and backtracking to
previous milestones when captures indicate the user wants to stop or
that a milestone missed critical features:

1. Type layer: "stop" and "backtrack" classification types in captures.ts
2. Guard layer: pre-dispatch stop check in runGuards() pauses auto-mode
   before the next unit dispatches
3. Resolution layer: executeBacktrack() writes BACKTRACK-TRIGGER.md and
   milestone regression markers for state machine detection
4. Protection layer: revertExecutorResolvedCaptures() detects and reverts
   captures silenced by non-triage agents (resolved without classification)

Also adds fast-path stop detection in auto-post-unit.ts that pattern-matches
pending capture text for stop keywords without waiting for triage.

Closes #3487

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add slice-level skip with gsd_skip_slice tool (#3477)

Adds "skipped" as a closed status alongside "complete" and "done":

- status-guards.ts: isClosedStatus() recognizes "skipped"
- state.ts: isStatusDone() recognizes "skipped"
- gsd-db.ts: getActiveSliceFromDb() skips slices with status "skipped"
- db-tools.ts: new gsd_skip_slice tool for rethink and manual use
- rethink.md: added "Skip a slice" operation to rethink prompt
- rethink.ts: buildRethinkData shows skipped slice counts

Skipped slices satisfy dependencies for downstream slices, allowing
auto-mode to advance past them. Slice data is preserved for reference.

Relates to #3477

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 4 issues found in adversarial review of PR #3488

1. triage-ui.ts: Restore stop/backtrack entries in CLASSIFICATION_LABELS
   and ALL_CLASSIFICATIONS — the Record<Classification, ...> type requires
   all union members, and runtime lookups would crash on stop/backtrack.
   Also auto-confirm stop/backtrack in the triage confirmation flow
   (matching the triage-captures.md prompt directive).

2. triage-resolution.ts: Replace require("node:fs") in clearBacktrackTrigger
   with ESM import of unlinkSync — consistent with the rest of the codebase.

3. auto-post-unit.ts: Anchor STOP_PATTERN regex to start-of-string (^) to
   prevent false positives on captures like "add a pause button" or "stop
   the timer from re-rendering" which are feature descriptions, not halt
   directives.

4. status-guards.test.ts: Add missing test case for isClosedStatus("skipped")
   to cover the new status value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update tool-naming test count for gsd_skip_slice

The new gsd_skip_slice tool (no alias) brings the total from 29 to 30.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 01:40:33 -04:00
Tom Boucher
7d5bf63b2d feat: GSD context optimization with model routing and context masking
* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 01:02:35 -04:00
Nils Reeh
131cb1bed2 fix(remote-questions): fire configured channels in interactive mode
tryRemoteQuestions was gated behind if (!ctx.hasUI), so Telegram/Slack/
Discord were never contacted when GSD ran with a terminal UI. The test
message sent during setup always worked (direct API call, no guard), which
made the feature appear configured but non-functional in practice.

Move the remote call before the hasUI guard so configured channels fire
regardless of UI availability. When no remote channel is configured,
tryRemoteQuestions returns null and the local UI is used as before.

Adds a source-level regression test asserting that tryRemoteQuestions is
called before the !ctx.hasUI branch.

Closes #3480

Verified with AI.
2026-04-04 01:51:18 +02:00
Jeremy McSpadden
bb47f5a087 Merge pull request #2287 from jeremymcs/worktree-btw-implementation
feat: add /btw skill — ephemeral side questions from conversation context
2026-04-03 15:18:03 -05:00
deseltrus
58208634ac fix(pi-coding-agent): cancel stale retries after model switch
Explicit model changes aborted the active backoff sleep but left already-queued retry callbacks alive. That let stale provider retries continue after a user or runtime model switch, which could keep surfacing the old provider's backoff errors in the new session state.

Cancel the full retry generation on explicit model switches by clearing queued continue callbacks in RetryHandler, then cover the behavior with regression tests for both the session switch ordering and the queued-retry cancellation path.
2026-04-03 16:21:21 +02:00
Jeremy McSpadden
df93cdc43c Merge pull request #3432 from deseltrus/fix/slash-command-session-routing
fix: route non-builtin slash commands after TUI dispatch
2026-04-03 05:08:01 -05:00
github-actions[bot]
7589509156 release: v2.59.0 2026-04-03 06:18:06 +00:00
deseltrus
b7e0173e50 fix: route non-builtin slash commands after TUI dispatch
The TUI slash dispatcher started treating any unrecognized /command as handled before session.prompt() could resolve extension commands, prompt templates, or /skill:* inputs. That blocked valid non-builtin slash commands and also let /export swallow unrelated /export* prefixes.

Move unknown-command detection to the interactive entry points, allow only known builtins or session-resolved slash commands through, gate /skill:* on the skill-command setting, and tighten /export matching to exact command tokens.
2026-04-03 06:44:09 +02:00
Jeremy McSpadden
db4fa32854 Merge pull request #3426 from justinwyer/fix/configurable-security-overrides
fix(security): add configurable overrides for command allowlist and SSRF blocklist
2026-04-02 12:29:57 -05:00
Justin Wyer
95875c41c5 refactor(test): consolidate regression and override tests into #666 test files
Move regression tests and override tests from standalone files into
the existing test files introduced by PR #666:

- resolve-config-value.test.ts: add REGRESSION #666 describe block
  and setAllowedCommandPrefixes override tests
- url-utils.test.ts: add REGRESSION #666 describe block and
  setFetchAllowedUrls override tests
- Delete: regression-666.test.ts, resolve-config-value-override.test.ts,
  url-utils-override.test.ts

Same 59 tests, fewer files, tests live next to the code they test.
2026-04-02 14:06:19 +02:00
Justin Wyer
d5f581fe6b test: add regression tests for #666 (fails on main, passes on fix)
Two regression tests that prove the bug introduced by PR #666:

1. Non-default credential tool (sops) is silently blocked by the
   hardcoded SAFE_COMMAND_PREFIXES with no way to override.
2. Private IP URL is silently blocked by isBlockedUrl() with no
   way to allowlist.

Both tests use dynamic import to check for the override functions,
so they run cleanly on both main (where they fail) and this branch
(where they pass). Verified in a git worktree of main.
2026-04-02 14:03:34 +02:00
Justin Wyer
5ae78d8f8b docs: document command allowlist and fetch_page URL blocking
- custom-models.md: add Command Allowlist section under Value Resolution
  explaining the restriction, default list, and how to override via
  allowedCommandPrefixes setting or GSD_ALLOWED_COMMAND_PREFIXES env var

- configuration.md: add URL Blocking (fetch_page) section documenting
  what's blocked by default, why, and how to allowlist specific hosts
  via fetchAllowedUrls setting or GSD_FETCH_ALLOWED_URLS env var

- configuration.md: add both env vars to the Environment Variables table
2026-04-02 13:55:07 +02:00
Justin Wyer
71caa18552 fix(security): add configurable overrides for command allowlist and SSRF blocklist
PR #666 introduced hardcoded SAFE_COMMAND_PREFIXES and SSRF URL
blocklists with no override mechanism. Users with non-standard
credential tools (sops, doppler, age, infisical) or needing to fetch
from internal URLs (self-hosted docs, VPN services) were silently
blocked with no recourse.

Add two global-only settings (ignored in project-level settings.json
to preserve the security property against malicious repos):

- allowedCommandPrefixes: replaces the built-in command allowlist
- fetchAllowedUrls: exempts hostnames from SSRF blocking

Both also support env var overrides (GSD_ALLOWED_COMMAND_PREFIXES,
GSD_FETCH_ALLOWED_URLS) for CI/container environments. Env vars
take precedence over settings.json.

Security model: global-only keys are stripped from project settings
at load time via stripGlobalOnlyKeys(), applied at all three
assignment points for this.projectSettings. The merge function
stays untouched — no future caller can accidentally skip stripping.

15 new tests covering override behavior, cache invalidation,
allowlist exemptions, and global-only enforcement.
2026-04-02 13:45:05 +02:00
Jeremy McSpadden
46d5fa56af Merge pull request #2312 from jeremymcs/fix/tui-review
fix(tui): comprehensive TUI review — layout, flow, rendering, and state fixes
2026-04-01 16:38:31 -05:00
Jeremy
a3a2f2e3b3 test(tui): update provider-manager tests for confirmation-based removal
Tests now match the new hasAuth guard and double-press r confirmation
flow introduced in the TUI review PR.
2026-04-01 16:24:14 -05:00
Jeremy McSpadden
d0555857c2 Merge pull request #2976 from jeremymcs/splash-header-updates-clean
feat(splash): add remote channel indicator to tools row
2026-04-01 16:14:23 -05:00
Jeremy McSpadden
b2abff3ce5 Merge pull request #3138 from jeremymcs/claude/add-stale-commit-check-GIbgw
feat(doctor): stale commit safety check with gsd snapshot and auto-cleanup
2026-04-01 14:22:21 -05:00
Jeremy McSpadden
03bb723dfb Merge pull request #3204 from jeremymcs/fix/stream-re-catch-all-json-parse
fix(error-classifier): catch-all V8 JSON.parse pattern replaces STREAM_RE whack-a-mole
2026-04-01 14:22:07 -05:00
Jeremy
f7cb3ec07b chore(merge): resolve conflict with upstream/main for PR #3204
Keep catch-all STREAM_RE from PR; upstream's 5-variant whack-a-mole is
superseded by the /in JSON at position \d+/ pattern. Also drop the now-
stale comment about checking stream before server/connection (no longer
needed since catch-all avoids those false-positive overlaps).
2026-04-01 14:05:28 -05:00
Jeremy
d929e9ceed chore(merge): resolve conflicts with upstream/main for PR #3138
- auto-worktree.ts: take upstream's MERGE_HEAD cleanup wording/order
- state.ts: take upstream's inline disk→DB reconciliation (#2631)
  over the simpler "always call deriveStateFromDb" approach
2026-04-01 14:04:16 -05:00
Tom Boucher
6e7721406e Merge pull request #2680 from jeremymcs/refactor/vscode-extension-design
feat(vscode): sidebar redesign, SCM provider, checkpoints, diagnostics [3/3]
2026-04-01 11:51:36 -04:00
Tom Boucher
0415f41eee Merge pull request #3116 from jeremymcs/refactor/planning-tier-heavy
refactor(complexity): reclassify planning phases from standard to heavy tier
2026-04-01 11:49:11 -04:00
Tom Boucher
adaa661a87 Merge pull request #3401 from gsd-build/claude/resolve-pr-3322-conflict-GQXIc
fix(worktree): resolve merge conflict for PR #3322 — adopt comprehensive pre-merge cleanup
2026-04-01 11:31:31 -04:00
Claude
1edf172463 test(worktree): add regression test for SQUASH_MSG/MERGE_MSG pre-merge cleanup (#2912)
Satisfies CI require-tests gate by adding a test that verifies the
comprehensive pre-merge cleanup (step 7b) removes stale SQUASH_MSG and
MERGE_MSG files — the enhancement over the prior MERGE_HEAD-only cleanup.

https://claude.ai/code/session_01SSHD9RNwVGNxAJZEgNZpgZ
2026-04-01 15:19:18 +00:00
Claude
e5b6a6a1b9 fix(worktree): resolve merge conflict for PR #3322 — adopt comprehensive pre-merge cleanup
Main already had a simpler step 7c (removing only MERGE_HEAD). The PR's
step 7b is more thorough: it also removes SQUASH_MSG and MERGE_MSG,
matching the existing post-merge cleanup pattern. Replace 7c with 7b.

https://claude.ai/code/session_01SSHD9RNwVGNxAJZEgNZpgZ
2026-04-01 15:11:12 +00:00
Tom Boucher
77220d1dde Merge pull request #2283 from jeremymcs/feat/codebase-map
feat(gsd): codebase map — structural orientation for fresh agent contexts
2026-04-01 10:49:01 -04:00
Jeremy McSpadden
04ebe3f0a0 feat(extensions): add Ollama extension for first-class local LLM support (#3371)
Self-contained extension at src/resources/extensions/ollama/ that
auto-detects a running Ollama instance, discovers locally pulled models,
and registers them as a first-class provider with zero configuration.

Features:
- Auto-discovery of local models via /api/tags on session_start
- Capability detection (vision, reasoning, context window) for 40+ model families
- /ollama slash command with status, list, pull, remove, ps subcommands
- ollama_manage LLM-callable tool for agent-driven model operations
- Onboarding flow with auto-detect (no API key required)
- Non-blocking async probe — doesn't delay TUI paint
- Respects OLLAMA_HOST env var for non-default endpoints

Core changes (minimal):
- Add "ollama" to KnownProvider in pi-ai types
- Add "ollama" key resolution in env-api-keys.ts
- Add "ollama" default model in model-resolver.ts
- Add "Ollama (Local)" to onboarding wizard with probe flow
2026-04-01 08:37:31 -06:00
Jeremy
2cc01c11ee fix(merge): clean stale MERGE_HEAD before squash merge (#2912)
A pre-existing MERGE_HEAD (from failed prior merge, libgit2 native path,
or external tooling) blocks git merge --squash. Remove stale merge state
files before starting the squash merge, not just after.
2026-03-31 17:48:45 -05:00
Jeremy
0e978d4565 fix(state): always run disk→DB reconciliation when DB is available (#2631)
When DB was available but empty, deriveState skipped deriveStateFromDb
entirely, bypassing the disk→DB sync logic. Milestones created outside
the DB write path were never discovered.
2026-03-31 17:34:05 -05:00
Jeremy
36b03890da fix(git-service): fix merge-base ancestry check and .gsd/ leakage in snapshot absorption
- Check HEAD~1 (newest snapshot) instead of resetTarget (pre-snapshot
  base) for remote ancestry. The old check false-positived when the
  remote was at the pre-snapshot base but snapshots were local-only.
- Re-run smartStage() after soft reset so RUNTIME_EXCLUSION_PATHS
  apply to the absorbed commit. Without this, .gsd/ state files from
  snapshot commits leaked into the real commit.
2026-03-31 17:25:29 -05:00
Jeremy
fa0651bfd6 feat(doctor): stale commit safety check with gsd snapshot and auto-cleanup
Adds a safety mechanism that detects uncommitted changes idle past a
configurable threshold (default: 30 min), auto-snapshots tracked files
using `git add -u`, and cleans up snapshot commits when real work lands.

- New `stale_uncommitted_changes` doctor issue with auto-snapshot fix
- Detection in health widget (60s), pre-dispatch gate, and /gsd doctor
- `nativeAddTracked()` stages only tracked files (no secrets/binaries)
- `absorbSnapshotCommits()` squashes `gsd snapshot:` commits into next
  real autoCommit via soft reset + re-commit
- Configurable via `stale_commit_threshold_minutes` preference (0=off)
2026-03-31 17:25:29 -05:00
Jeremy McSpadden
e0d130e682 feat(extensions): wire up topological sort and unified registry filtering (#3152)
- Add extension-manifest.ts and extension-sort.ts to pi-coding-agent
  with manifest reading and Kahn's BFS topological sort algorithm
- Add extensionPathsTransform hook to DefaultResourceLoader that runs
  between path merging and loadExtensions() — enables pre-load
  filtering and reordering without modifying pi internals
- Wire GSD's buildResourceLoader() to provide a transform that:
  1. Filters ALL extensions (including community) through the GSD registry
  2. Sorts in topological dependency order via sortExtensionPaths()
- Mark discoverAndLoadExtensions() as @deprecated (dead code path)
- Add 16 tests covering manifest reading, dependency sorting, cycles,
  missing deps, and non-array deps

Previously, dependencies.extensions in manifests was decorative (sort
existed but was never called), and gsd extensions disable only worked
for bundled extensions. Community extensions in ~/.gsd/agent/extensions/
bypassed the registry entirely.
2026-03-31 11:54:48 -06:00
Jeremy McSpadden
f0059a5498 fix(extensions): update provides.hooks in 7 extension manifests to match actual registrations (#3157)
Audit found that 7 bundled extensions had incomplete provides.hooks
arrays in their manifests. Updated each to match actual pi.on() calls:

- async-jobs: +session_before_switch, session_shutdown
- bg-shell: +8 hooks (session_compact, session_tree, etc.)
- browser-tools: +session_start
- context7: +session_shutdown
- google-search: +session_shutdown
- gsd: +12 hooks (bash_transform, tool_call, tool_result, etc.)
- search-the-web: +session_start

Closes #3156
2026-03-31 11:54:41 -06:00
Jeremy McSpadden
1e89090136 test(state-machine): add regression suite — 86 tests across 6 files (#3161) (#3162)
Comprehensive validation of the GSD state machine identified 7 HIGH, 14 MEDIUM,
and 16 LOW findings. This adds regression and integration tests covering:

Unit tests (49):
- Event replay idempotency (M4 lossy blocker replay, M5 duplicate evidence)
- Reconciliation edge cases (fork detection, entity keys, conflict detection)
- Completion hierarchy guards (vacuous truth, phantom parents, rollback fidelity)
- State derivation parity (ghost milestones, phase transitions, DB/FS consistency)
- Stuck detection coverage (all 3 rules + documented gap for 3-unit cycles)

Integration tests (37):
- Full happy-path lifecycle (pre-planning → complete)
- 12 completion guard edge cases with real handlers
- 7 reopen operations including H5 (no reopen-milestone exists)
- Phantom parent auto-creation (H6)
- State derivation consistency with live DB
- Event log integrity across operations
- M12: stale SUMMARY.md causes reconciler to override reopen

Closes #3161
2026-03-31 11:54:30 -06:00