Commit graph

82 commits

Author SHA1 Message Date
Mikael Hugo
deeb4dbd4e sf snapshot: uncommitted changes after 61m inactivity 2026-05-07 16:39:39 +02:00
Mikael Hugo
343ee5c89e sf snapshot: uncommitted changes after 158m inactivity 2026-05-07 10:01:56 +02:00
Mikael Hugo
5157223e4c fix: record requested headless command 2026-05-07 00:40:05 +02:00
Mikael Hugo
2d465b11fd test: add comprehensive Phase 1 coverage for dispatch loop (48 tests)
- Add metrics.test.ts: 21 tests for unit outcome recording, model performance tracking, fire-and-forget safety, persistence, error handling
- Add triage-self-feedback.test.ts: 27 tests for report classification, confidence thresholds, auto-fix, deduplication, severity categorization, async safety

Purpose: Increase coverage of critical autonomous dispatch paths from 40% to 60%+.
Covers fire-and-forget patterns (metrics recording and auto-fix application must not
block dispatch), concurrent recording safety, graceful degradation on error.

Tests validate:
  ✓ Unit outcome recording without blocking
  ✓ Per-task-type model performance tracking
  ✓ Fire-and-forget error handling (metrics/fixes don't break dispatch)
  ✓ Concurrent metric recording race conditions
  ✓ Persistence atomicity
  ✓ Report classification by type/severity
  ✓ Confidence thresholds (0.85-0.95 per type)
  ✓ Auto-fix deduplication and prioritization
  ✓ Async triage without blocking dispatch

Phase 1 complete: 48 tests, all passing.
Phase 2: Recovery path hardening (recovery/forensics)
Phase 3: Property-based FSM testing (fast-check)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 00:38:19 +02:00
Mikael Hugo
553ba23b89 integrate: hook quick wins into UOK dispatch loop
Integration of 3 quick wins into existing UOK infrastructure:

1. Model Learning (Quick Win #2) → metrics.js
   - Record outcomes to model-learner for per-task-type performance tracking
   - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome()
   - Fire-and-forget: never blocks outcome recording on learning failure
   - Enables adaptive model routing decisions in downstream gates

2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js
   - Auto-fix high-confidence reports (>0.85) in applyTriageReport()
   - Hook: After triage and requirement promotion, apply auto-fixes
   - Fire-and-forget: never blocks report application on fix failure
   - Returns reportsAutoFixed count for triage metrics

3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js
   - Already active in execute-task prompt template
   - Semantic matching with graceful degradation

All integration points:
- Fire-and-forget: learning/fixing failures never block dispatch
- UOK-native: use existing outcome recording, db, gates
- Backward compatible: applyTriageReport now async, but callers handle it
- No new dependencies: all modules already in codebase

Testing: 2934 tests pass (no regressions from integration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 22:34:41 +02:00
Mikael Hugo
42c651d106 fix: show verbose prompt traces 2026-05-06 06:45:15 +02:00
Mikael Hugo
a95e2947df fix: reconcile sift warmup observability 2026-05-06 06:22:09 +02:00
Mikael Hugo
76b218762b fix: harden sf autonomous runtime 2026-05-06 06:02:46 +02:00
Mikael Hugo
a1fd6cfc05 fix: separate headless transport from autonomous mode 2026-05-06 02:24:15 +02:00
Mikael Hugo
46db1e95ef refactor: remove legacy autonomous aliases 2026-05-05 18:47:50 +02:00
Mikael Hugo
ab6cad4c84 fix: clean provider surfaces and core build 2026-05-05 16:31:53 +02:00
Mikael Hugo
4c98cb8c33 fix: make autonomous mode canonical 2026-05-05 15:42:10 +02:00
Mikael Hugo
2d9c2018af chore: clean repo quality gates 2026-05-05 14:55:11 +02:00
Mikael Hugo
f11c877224 style: format repository with biome 2026-05-05 14:31:16 +02:00
Mikael Hugo
ed4a4bc93a chore: commit current worktree state 2026-05-04 19:28:39 +02:00
Mikael Hugo
44204e0424 chore(sf): add optional token telemetry 2026-05-02 11:50:34 +02:00
Mikael Hugo
26be0b4153 fix(sf): stabilize headless auto flow 2026-05-02 11:34:41 +02:00
Mikael Hugo
12538bbfa3 sf snapshot: pre-dispatch, uncommitted changes after 32m inactivity 2026-05-02 11:25:51 +02:00
Mikael Hugo
1990d2a2ee feat: Renamed textBuffer to assistantTextBuffer in headless.ts and vali…
- src/headless.ts
- .sf/REQUIREMENTS.md

SF-Task: S01/T04
2026-05-02 08:48:44 +02:00
Mikael Hugo
3a3ea29c51 chore(sf): test backfill, parse helpers, parallel session pickups 2026-05-02 02:26:01 +02:00
Mikael Hugo
dda9793cd6 feat(sf): port sf-home, memory-embeddings, component-types, workflow-install + sweep
- sf-home.ts: new — resolves ~/.sf/ path and SF home dir helpers (port of gsd-home.ts)
- memory-embeddings.ts: new — embedding helpers for memory similarity search
- component-types.ts: new — Component, ComponentManifest, ComponentHook type defs
- workflow-install.ts: new — workflow installation from local/remote sources
- auto-post-unit.ts: clearEvidenceFromDisk after successful verification
- routing-history.ts: add cost-per-token tracking to routing decisions
- workflow-{manifest,templates}.ts: hardening sweep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:22:13 +02:00
Mikael Hugo
9e8361da23 chore(sf): minor self-feedback + workflow-template tweaks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:21:13 +02:00
Mikael Hugo
df8fca8cc7 feat(sf): workflow-plugins port, sf-db expansions, worktree-manager hardening
- workflow-plugins.ts: new — unified plugin discovery, 4 execution modes
  (oneshot, yaml-step, markdown-phase, auto-milestone), hot-reload support
- sf-db.ts: add milestone ghosting/reservation, hook_runs table, memory
  embedding schema, subscription token usage tracking
- worktree-manager.ts: active-worktree tracking, health check cascade,
  dangling-ref pruning, sync-on-switch
- atomic-write.ts: add writeJsonAtomic convenience wrapper
- workflow-logger.ts: add "plugins" LogComponent variant
- workflow-templates.ts: template hot-reload + validation sweep
- scaffold-versioning.ts: versioned drift detection improvements
- preferences-migrations.ts: v3→v4 subscription cost fields migration
- self-feedback.ts: feedback loop dedup window
- headless.ts: EXIT_RELOAD + notification dedup boundary (final)
- tests/auto-vs-autonomous.test.ts: expand coverage for both code paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:20:14 +02:00
Mikael Hugo
abb3d76ffa chore(sf): minor sweep — gate-registry dedup, token-counter, worktree-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:18:03 +02:00
Mikael Hugo
86026c9e4f feat(sf): final UOK parity pass + secondary agent sweep
Evidence-collector (matches gsd2 exactly):
- recordToolCall now takes toolCallId as first arg (parallel-call fix)
- recordToolResult matches by toolCallId, not last-unresolved heuristic
- saveEvidenceToDisk now atomic tmp-rename JSON (not appendFileSync JSONL)
- clearEvidenceFromDisk added; resetEvidence takes no args
- stricter isEvidenceArray validator

auto/loop.ts:
- PID guard in loadStuckState prevents cross-test state pollution
- pid field added to saveStuckState payload
- saveCustomVerifyRetryCounts uses atomicWriteSync (crash-safe)

auto/run-unit.ts:
- chdir failure marked isTransient:true (dir may exist on retry)

auto/session.ts:
- canAskUser field added with reset() support

auto/phases.ts:
- currentUnit = null in closeoutAndStop (no stale refs after stop)

bootstrap/provider-error-resume.ts:
- resetTransientRetryState injectable via ProviderErrorResumeDeps

Secondary sweep (worktree, workflow, token-counter, verification-gate,
activity-log, doctor-environment, json-persistence, scaffold-keeper tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:17:21 +02:00
Mikael Hugo
9db94ed77e chore(sf): residual session work — final consolidation
Last batch from the parallel swarm session: docstring tweaks,
verification-gate doc additions, workflow-reconcile and worktree-command
follow-ups, doctor-environment cleanup. Typecheck clean.

Most of the session work landed in earlier commits (8be8f4774, 3045538cb,
038938f2a, ed85252fc, 4f4b584e5, etc.); this commit is the residual
working-tree state after all swarms reported.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:17:03 +02:00
Mikael Hugo
f1cef7c476 feat(sf): multi-agent sweep — paths, verification, auto closeout, bootstrap, worktree
- paths.ts: add resolveSliceSummaryPath, resolveCheckpointPath, task-summary helpers
- bootstrap/system-context.ts: worktree active context + codebase-map inject
- auto.ts: plumb autonomousMode flag, startAuto options expansion
- auto/loop.ts: Math.max(0,...) clock-skew guard in enforceMinRequestInterval
- auto/session.ts: add lastUnitAgentEndMessages and PreExecFailure tracking
- auto-post-unit.ts: clearEvidenceFromDisk after verification, isDeterministicPolicyError
- auto-unit-closeout.ts: populate lastPreExecFailure on gate failures
- cache.ts: fix TTL helper arg counts
- codebase-generator.ts: add incremental refresh helpers
- commands/handlers/auto.ts: wire autonomousMode and plan-v2 flags
- context-budget.ts: remove stale context-budget trimming (was dead code)
- dispatch-guard.ts: trim unused guards
- doctor-{environment,runtime-checks}.ts: expand health checks
- execution-instruction-guard.ts: add approval-boundary guard
- gate-registry.ts: de-dup gate registration on reload
- gitignore.ts: add .sf/worktrees to default gitignore
- notification-store.ts: add dedup window + category grouping
- pre-execution-checks.ts: add provider-readiness pre-check
- preferences.ts: subscription cost helpers + allow_flat_rate_providers
- production-mutation-approval.ts: approval-required flag on mutation tools
- state.ts: remove redundant fallback (now handled in deriveState)
- token-counter.ts: subscription token usage tracking
- verification-gate.ts: gate retry on bounded failure class
- workflow-{projections,reconcile,template-compiler,templates}: hardening
- worktree-{command,manager}: path normalization + active-worktree tracking
- tests/verification-evidence.test.ts: new — evidence load/save/clear coverage
- tests/provider-errors.test.ts: add missing provider-delay tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:16:13 +02:00
Mikael Hugo
a611cd5792 feat: introduce repo-vcs skill and add JSDoc annotations across core modules
- Add repository-vcs-context.ts to detect and inject VCS context (Git/Jujutsu)
  into the agent system prompt; wire in repo-vcs bundled skill trigger
- Add src/resources/skills/repo-vcs/ skill for commit, push, and safe-push workflows
- Add JSDoc Purpose/Consumer annotations to app-paths, bundled-extension-paths,
  errors, extension-discovery, extension-registry, headless-types, headless, and traces
- Add justfile and just to flake.nix devShell
- Fill out new-user-onboarding.md spec (Draft) and core-beliefs.md (Status: Accepted)
- Add notification-event-model.md design doc and notification-source-hygiene.md spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 21:36:32 +02:00
Mikael Hugo
12e7333f1c feat: stabilize autonomous workflow system 2026-05-01 20:18:50 +02:00
Mikael Hugo
2111da8e60 sf snapshot: pre-dispatch, uncommitted changes after 53m inactivity 2026-04-30 19:10:38 +02:00
Mikael Hugo
d8a9d63c87 feat: Replaced bare error writes in cli.ts, headless.ts, and startup-mo…
- src/cli.ts
- src/headless.ts
- src/startup-model-validation.ts

SF-Task: S04/T03
2026-04-30 15:43:29 +02:00
Mikael Hugo
8677e73046 sf snapshot: pre-dispatch, uncommitted changes after 97m inactivity 2026-04-30 15:11:45 +02:00
Mikael Hugo
62d430ab23 Add provider smoke benchmark and headless updates 2026-04-30 10:19:18 +02:00
Mikael Hugo
6ccce42c62 Add headless bootstrap and TODO triage tests 2026-04-30 09:21:24 +02:00
Mikael Hugo
cd69e85608 Harden SF model routing and harness contracts 2026-04-30 07:41:24 +02:00
Mikael Hugo
a45f873124 chore: snapshot WIP before resuming M004/S03 auto
84 files spanning provider capabilities, model routing, headless
runtime, sf auto subsystems, gitbook docs, and test coverage. Snapshotted
so headless auto can resume M004 (Production Readiness) S03
(Verification Gate Validation) on a clean tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:31:19 +02:00
Mikael Hugo
120d7deda8 fix: keep headless alive for provider auto-resume 2026-04-29 20:16:23 +02:00
Mikael Hugo
2ed1638153 fix: add headless heartbeat output 2026-04-29 19:29:43 +02:00
Mikael Hugo
c5df4b46a6 fix(headless): await auto loop in headless mode 2026-04-29 15:37:17 +02:00
Mikael Hugo
df614a3e47 fix(headless): split idle-timeout role from deadlock-backstop role
The single IDLE_TIMEOUT_MS constant was conflating two different jobs:
"are we done?" vs "is the agent stuck?". For multi-turn commands (auto,
next, discuss, plan), the first question is wrong — those signal
completion explicitly via "auto-mode stopped" terminal notifications,
and child-process exit catches crashes. The 120s I'd just bumped
multi-turn to was still in idle-detection mindset; that's not what we
need from this timer.

New semantics:
- IDLE_TIMEOUT_MS = 15s — quick commands (status, queue, …); idle
  really does mean done.
- NEW_MILESTONE_IDLE_TIMEOUT_MS = 120s — bounded creative task with
  pauses for thinking between bootstrap steps.
- MULTI_TURN_DEADLOCK_BACKSTOP_MS = 30 minutes — auto/next/discuss/plan.
  Not a "done" detector; a deadlock recovery bound. Long enough to
  never bother slow LLM reasoning or chained tool calls; short enough
  to recover from a true hang within a reasonable window. Real
  completion comes from terminal notifications + child-process exit,
  both already wired.

Code reads cleaner too: effectiveIdleTimeout selection now mirrors the
three-way conceptual split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:18:58 +02:00
Mikael Hugo
c239ad6c9d fix(headless): use long idle timeout for auto/next/discuss/plan
The 15s IDLE_TIMEOUT_MS was killing auto-mode prematurely. Symptom: sf
headless auto would dispatch a task, the LLM would make 1-2 tool calls,
pause to reason about the next step, exceed 15s of "no events", and
headless would declare "Status: complete" — exiting at ~35s with the task
barely started (123 events but only 2 tool calls).

The 120s NEW_MILESTONE_IDLE_TIMEOUT_MS already exists for the same reason
("LLM may pause between tool calls e.g. after mkdir, before writing
files"). The same applies to auto/next/discuss/plan — all multi-turn
commands where the LLM thinks longer between actions, especially on
non-trivial tasks. isMultiTurnCommand was already defined for related
logic; this just wires it into the idle-timeout decision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:13:43 +02:00
Mikael Hugo
9b718f8e36 fix(headless): repair missing sf project symlink 2026-04-29 14:43:30 +02:00
Mikael Hugo
b24f426f2b batch: snapshot of in-flight v2 work
This commit captures uncommitted modifications that accumulated in the
working tree across multiple in-progress workstreams. It is a snapshot
to clear the deck before sf v3 work begins; individual workstreams
should land separately on top of this.

Notable additions:
- trace-collector.ts, traces.ts, src/tests/trace-export.test.ts —
  trace export plumbing
- biome.json — Biome linter configuration
- .gitignore — exclude native/npm/**/*.node compiled binaries

The bulk of the diff is across src/resources/extensions/sf/ (301 files)
and src/resources/extensions/sf/tests/ (277 files), reflecting the
ongoing sf extension work. Specific feature commits should follow this
snapshot rather than being archaeology'd out of it.

The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary
was left out of the commit — it's now gitignored and built locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:42:31 +02:00
Mikael Hugo
f98a1e360e batch: codex-rescue session output (multiple in-flight tasks)
Combined output of multiple parallel codex-rescue runs that produced
working-tree edits but didn't commit. Tasks contributing:

- prefs: per-provider model allow-list (provider_model_allow) — manual
- TUI scroll + unresponsive (a7884d1a / bt3fpn4y2)
- planningMeeting required (aa09e904 / br127l763)
- Logs UX 4-pack (a5c65314 / btcplhu7f)
- Gate auto-resolve + completion nudge (ae4c8b64 / bw1w1fjkp)
- sf_task_complete atomic + retry (a7a079b4 / b20cy5owv)
- Multi-model meeting + minimax M2.7 + draft promotion (a756faac / task-moifjknd-lwjc98)
- Per-role slice prompts (a94c3e1a)
- Per-role vision-meeting prompts (afd165a0 / task-moifple5-lcwtjl)
- Schema sweep (ac994b1e / task-moifq7pu-83coqz)
- Flow audit (ad26ecfd / bttj4vrqm)

Typecheck passes. Tests not run as a full suite — spot-check after merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 11:52:42 +02:00
Mikael Hugo
e2147c0694 sf snapshot: pre-dispatch, uncommitted changes after 43m inactivity 2026-04-25 06:34:49 +02:00
Mikael Hugo
7b6c9dd099 sf snapshot: pre-dispatch, uncommitted changes after 4703m inactivity 2026-04-25 05:51:29 +02:00
ace-pm
51b65fd490
fix: symlink extensions + silent catches masking real errors
Real bugs from 2nd-pass scan:

1. extension-registry.ts: discoverAllManifests skipped symlinked extension
   dirs because Dirent.isDirectory() returns false for symlinks. Dev-workflow
   symlinks under ~/.sf/agent/extensions/ were invisible to list/enable/
   disable/info. Matches the regression documented in
   symlink-extension-discovery.test.ts — the test inlines the correct logic,
   but this callsite still had the buggy form. Now accepts isDirectory() ||
   isSymbolicLink().

2. headless.ts SIGINT handler: client.stop() failures were double-silenced
   (inner .catch(()=>{}), outer try{}catch{}). Interactive mode logs stop
   errors to stderr. Restored head/headless parity — still fire-and-forget
   (exit code is forced via process.exit) but failures are observable.

3. openai-codex-responses.ts SSE parser: malformed data frames were silently
   dropped so broken streams looked identical to clean ones. Now debug-logs
   the parse error with the chunk context so broken streams are
   distinguishable in logs. Stream continues on bad chunk (one bad frame
   shouldn't kill the whole generation).

4. web/cleanup-service.ts generated script: bare 'catch {}' around four native
   git calls (nativeBranchList, nativeDetectMainBranch, nativeBranchListMerged,
   nativeForEachRef). A failed main-branch detection silently left mainBranch
   undefined-shaped, then the next native call operated on garbage. Now emits
   console.warn so failures surface in the subprocess log.

5. web/undo-service.ts generated script: git revert failure was silenced;
   when --no-commit failed, user saw commitsReverted=0 with no reason. Now
   logs the revert error before attempting --abort (abort itself remains
   best-effort silent).

False positives from the same scan (investigated and dismissed):
- auto-worktree.ts #2505: code uses ':(exclude).sf/milestones' pathspec +
  shelter-and-restore, which is a better fix than the 'drop --include-untracked'
  approach the test comment describes. Test comment is stale; source is correct.
- Lifecycle handler unhandled rejections across 5 extensions: extensions/runner.ts
  already try/catches handler invocations and routes to emitError. Wrapping the
  individual handlers would be redundant.
2026-04-21 02:01:41 +02:00
Mikael Hugo
941eb4c830 headless: clean up sf headless auto stderr output
Three fixes to make the headless progress stream readable at a glance:

1. Filter TUI footer widget keys from setStatus — 0-emoji, 0-color-band,
   authority, ollama, sf-fast, and sf-auto are sticky indicators for the
   interactive TUI footer, not workflow phases. They no longer leak
   through as [phase] ollama / [phase] sf-fast noise.

2. Unify tag prefix column width at 11 chars via a new tag() helper in
   headless-ui.ts. All of [tool], [agent], [forge], [phase], [thinking],
   [cost], [text] now align on the same column, matching the existing
   [headless] and [thinking] widths.

3. Dedupe consecutive identical progress lines in headless.ts so a
   widget that re-emits the same setStatus on every LLM call prints
   once instead of flooding stderr. Two different lines still both show;
   only adjacent duplicates collapse.

Also tightens parsePhaseLabel so an unknown bare statusKey with no
message returns null rather than leaking the raw key — a defense in
depth if the footer-widget allowlist drifts behind a new extension.

Tests: 4 new cases in headless-progress.test.ts covering footer-key
suppression, bare-key suppression, workflow-phase passthrough, and
tag-alignment. 88/88 pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 05:47:02 +02:00
ace-pm
f92ee8d64c
Rename @sf-run/* → @singularity-forge/* package scope
- All 373 source files updated
- Package.json scopes in all workspace packages
- Loader workspace symlink dir updated
- RpcClient import unified from pi-coding-agent (fixes type mismatch)
- Scripts, configs, flake.nix updated
- Workspace symlinks rebuilt
2026-04-15 22:56:33 +02:00
ace-pm
9d739dfa5d Rename GSD→SF: complete rebrand from fork origin
- All gsdDir/gsdRoot/gsdHome → sfDir/sfRootDir/sfHome
- GSDWorkspace* → SFWorkspace* interfaces
- bootstrapGsdProject → bootstrapProject
- runGSDDoctor → runSFDoctor
- GsdClient → SfClient, gsd-client.ts → sf-client.ts
- .gsd/ → .sf/ in all tests, docs, docker, native, vscode
- Auto-migration: headless detects .gsd/ → renames to .sf/
- Deleted gsd-phase-state.ts backward-compat re-export
- Renamed bin/gsd-from-source → bin/sf-from-source
- Updated mintlify docs, github workflows, docker configs
2026-04-15 18:33:47 +02:00