Commit graph

177 commits

Author SHA1 Message Date
Mikael Hugo
b0fce94f9e feat: record retrieval evidence across context tools 2026-05-07 18:17:41 +02:00
Mikael Hugo
05f185256c docs: record local cli survey cross-check 2026-05-07 17:22:03 +02:00
Mikael Hugo
b1a7749763 fix: harden widget and provider auth handling 2026-05-07 17:20:52 +02:00
Mikael Hugo
8088489e38 sf snapshot: uncommitted changes after 258m inactivity 2026-05-07 15:37:55 +02:00
Mikael Hugo
87362f27fc docs: remove mcp server roadmap residue 2026-05-07 06:25:59 +02:00
Mikael Hugo
5c32d91124 feat: promote schedule and self-feedback state to db 2026-05-07 05:34:42 +02:00
Mikael Hugo
fce0c4c781 Tier 1.1: Implement vault credential resolver for provider keys
- Add vault-credential-resolver.js: Async credential resolution with vault:// URI support
- Integration with vault-resolver.js (low-level Vault client)
- Update doctor-providers.js to detect and report vault URIs
- Synchronous doctor checks (no network I/O) with lazy async resolution
- Fail-open semantics: vault unavailable -> fall back to plaintext
- 28 tests for credential resolver (all passing)
- ADR-0078: Architecture and auth chain documentation

Features:
- vault://secret/path/to/secret#fieldname URI format
- Auth chain: VAULT_TOKEN -> ~/.vault-token -> AppRole (reserved)
- Helper functions: couldBeVaultUri, hasProviderCredentialEnvVar, resolveProviderCredential, getCredentialValue, formatCredentialInfo
- Full backward compatibility with plaintext keys and auth.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 04:59:07 +02:00
Mikael Hugo
87aa04cf05 Tier 1.3: Add spec/runtime/evidence schema separation (v32)
Implements the 3-table normalization model for milestone, slice, and task entities:

- 9 new tables: {milestone,slice,task}_{specs,evidence} + runtime tables
- milestone_specs: immutable record of intent (vision, goals, risks, proof strategy)
- slice_specs: immutable slice-level intent
- task_specs: immutable task verification criteria
- {entity}_evidence: append-only audit trail with timestamps and phase metadata
- Indices on evidence tables for efficient chronological queries

Key improvements:
- Spec immutability: Write-once specs preserve original intent
- Audit trail: Evidence chain enables data archaeology and decision history
- Query efficiency: Each table contains only relevant columns
- Re-planning clarity: Multiple spec versions can exist for same entity ID
- Forensic capability: Timestamp + phase metadata on evidence rows

Migration:
- Schema version bumped to 32
- Migration runs on first open of existing databases
- No data loss; existing milestone/slice/task rows preserved
- Creates spec and evidence tables from existing columns (future work)

This is Phase 1 of Tier 1.3 implementation (schema definition + basic setup).
Phases 2-5 (migration, data layer updates, tool updates, tests) follow in next PRs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 04:20:32 +02:00
Mikael Hugo
4f217cc88c docs: promote sf state guidance 2026-05-07 03:59:38 +02:00
Mikael Hugo
932f17b93a refactor: rename workflow tool boundary 2026-05-07 03:45:41 +02:00
Mikael Hugo
e35cc3c6b8 docs: align schedule and package state wording 2026-05-07 03:36:56 +02:00
Mikael Hugo
3e6827e7dc docs: remove stale direct db and mcp guidance 2026-05-07 03:33:14 +02:00
Mikael Hugo
9ab0b9fe63 docs: tighten legacy state fallback wording 2026-05-07 03:25:20 +02:00
Mikael Hugo
39382f7e54 docs: clarify db-backed state guidance 2026-05-07 03:20:20 +02:00
Mikael Hugo
2fae96d539 docs: align runtime state and mcp boundaries 2026-05-07 03:09:55 +02:00
Mikael Hugo
f192dbfca0 docs: add ADR-076 for UOK memory integration decisions
Document the three-phase integration of SF memory system with UOK:

Phase 1: Unit outcome recording (recordUnitOutcomeInMemory)
- Records success/failure patterns with 0.9/0.5 confidence
- Fire-and-forget async, never blocks execution

Phase 2: Dispatch ranking enhancement (enhanceUnitRankingWithMemory)
- Queries memory for similar patterns
- Boosts matching candidates by up to 15% (conservative limit)
- Deterministic embeddings ensure reproducible ranking

Phase 3: Gate context enrichment (enrichGateResultWithMemory)
- Diagnostic only; never changes gate pass/fail logic
- Helps operators understand recurring issues

All memory operations gracefully degrade if DB unavailable.
56 test cases validate integration across all phases.

Relates to ADR-0075 (UOK gates), ADR-008 (SF tools).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 02:05:01 +02:00
Mikael Hugo
a8634d4a3b docs: add memory system integration guide for developers
Practical quick-start guide for using SF's autonomous memory system:

- Record unit outcomes (success/failure patterns)
- Enhance dispatch ranking with learned patterns
- Add context to gate failures
- Core memory operations (create, query, relations)
- Common integration patterns
- Graceful degradation strategy
- Performance notes and best practices
- Testing with mocked memory
- Debugging helpers

Guide covers:
- Fire-and-forget async pattern
- Never blocks dispatch/execution
- Testing strategies for memory-enhanced code
- Performance characteristics
- Architecture decision: memory is SF-internal

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 02:03:34 +02:00
Mikael Hugo
b384c8e0df docs: clarify memory system is SF-internal, not MCP-exposed
Add architecture decision: Memory is not exposed as MCP server.

- SF is an MCP client only (consumes external MCP tools)
- Memory is internal SF infrastructure (uses SQLite, fire-and-forget async)
- Memory exposed as SF tools only (capture, query, graph)
- No external MCP exposure needed (memory is autonomous learning, not a service)

This keeps SF's learning system private and prevents:
- External memory pollution
- Uncontrolled confidence scoring
- Inconsistent learning patterns
- Loss of autonomy (memory decisions stay internal)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 01:41:33 +02:00
Mikael Hugo
b6ea800e2e docs: comprehensive SF memory system architecture reference
Add MEMORY-SYSTEM-ARCHITECTURE.md documenting:
- All 10 memory modules (store, embeddings, relations, etc.)
- Core functions and APIs for each module
- Storage schema (SQLite tables)
- Integration points (UOK, dispatch, gates)
- Usage examples and architecture diagram
- Performance characteristics
- Graceful degradation strategy
- Data retention and growth management

This serves as:
1. Reference guide for developers using memory system
2. Architecture overview of autonomous learning
3. Integration point documentation for extensions
4. Future enhancement roadmap

Discovered during UOK memory integration work:
- Memory system already complete (no duplication needed)
- Used for pattern learning, dispatch ranking, and diagnostics
- Node 24 native SQLite backend (no external deps)
- Fire-and-forget async operations (never blocks)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 01:36:08 +02:00
Mikael Hugo
3f099e240c Update test coverage plan: Phase 3 complete
- Phase 1: 48 tests (metrics + triage) ✓
- Phase 2: 31 tests (crash recovery) ✓
- Phase 3: 17 tests (property-based FSM) ✓
- Total: 96 critical path tests + 25 env schema tests = 104 new tests
- All passing, coverage targets met

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 01:01:47 +02:00
Mikael Hugo
2d465b11fd test: add comprehensive Phase 1 coverage for dispatch loop (48 tests)
- Add metrics.test.ts: 21 tests for unit outcome recording, model performance tracking, fire-and-forget safety, persistence, error handling
- Add triage-self-feedback.test.ts: 27 tests for report classification, confidence thresholds, auto-fix, deduplication, severity categorization, async safety

Purpose: Increase coverage of critical autonomous dispatch paths from 40% to 60%+.
Covers fire-and-forget patterns (metrics recording and auto-fix application must not
block dispatch), concurrent recording safety, graceful degradation on error.

Tests validate:
  ✓ Unit outcome recording without blocking
  ✓ Per-task-type model performance tracking
  ✓ Fire-and-forget error handling (metrics/fixes don't break dispatch)
  ✓ Concurrent metric recording race conditions
  ✓ Persistence atomicity
  ✓ Report classification by type/severity
  ✓ Confidence thresholds (0.85-0.95 per type)
  ✓ Auto-fix deduplication and prioritization
  ✓ Async triage without blocking dispatch

Phase 1 complete: 48 tests, all passing.
Phase 2: Recovery path hardening (recovery/forensics)
Phase 3: Property-based FSM testing (fast-check)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 00:38:19 +02:00
Mikael Hugo
6be23806fe feat: comprehensive environment schema with type-safe validation
- Expand env.ts with completeSfEnvSchema covering all 80+ SF_* variables
- Organize variables into logical categories (core, directories, performance, debug, extensions, recovery, settings, misc)
- Add typed API: getCompleteSfEnv(), parseCompleteSfEnv(), getEnvValidationSummary()
- Support graceful degradation (missing config returns partial data, never throws)
- Add 25 comprehensive test cases covering schema, parsing, defaults, round-trips
- Document in docs/ENV.md with quick start, API reference, migration guide

Purpose: Prevent silent misconfiguration by centralizing environment validation,
enabling IDE auto-completion, and providing clear defaults. Callers get type-safe
access to all config instead of scattered process.env reads.

Consumers: loader.ts for startup validation, all modules reading configuration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-07 00:31:59 +02:00
Mikael Hugo
f2db20b4d6 docs: add SQLite migration guide for Node 24 upgrade
Comprehensive guide for migrating from JSON to node:sqlite when Node 24 is available:
- Schema design (model_outcomes + model_stats tables)
- Phase-by-phase refactoring approach
- Data migration from JSON with backward compatibility
- Testing strategy with new SQLite-specific tests
- Future opportunities: dashboards, trend analysis, A/B testing, federated learning

This doc serves as a roadmap for ~2 days of work when Node 24 becomes standard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 23:03:50 +02:00
Mikael Hugo
553ba23b89 integrate: hook quick wins into UOK dispatch loop
Integration of 3 quick wins into existing UOK infrastructure:

1. Model Learning (Quick Win #2) → metrics.js
   - Record outcomes to model-learner for per-task-type performance tracking
   - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome()
   - Fire-and-forget: never blocks outcome recording on learning failure
   - Enables adaptive model routing decisions in downstream gates

2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js
   - Auto-fix high-confidence reports (>0.85) in applyTriageReport()
   - Hook: After triage and requirement promotion, apply auto-fixes
   - Fire-and-forget: never blocks report application on fix failure
   - Returns reportsAutoFixed count for triage metrics

3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js
   - Already active in execute-task prompt template
   - Semantic matching with graceful degradation

All integration points:
- Fire-and-forget: learning/fixing failures never block dispatch
- UOK-native: use existing outcome recording, db, gates
- Backward compatible: applyTriageReport now async, but callers handle it
- No new dependencies: all modules already in codebase

Testing: 2934 tests pass (no regressions from integration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 22:34:41 +02:00
Mikael Hugo
0e2edfdebf feat: implement 3 quick wins for SF self-evolution
Quick Win 1: Close Self-Report Feedback Loop [9/10 impact]
- Added self-report-fixer.js module with automatic fix classification
- Pattern-based detection for high-confidence fixes (e.g., prompt rubrics)
- Deduplication and severity-based categorization of reports
- Designed for extension into triage-self-feedback pipeline

Quick Win 2: Activate Continuous Model Learning [8/10 impact]
- Added model-learner.js with ModelPerformanceTracker class
- Per-task-type tracking: success rate, latency, cost, token efficiency
- Auto-demotion for models failing >50% on specific task types
- A/B testing infrastructure for hypothesis testing on low-risk tasks
- Failure analysis with pattern detection (e.g., timeouts, quality issues)
- Storage: .sf/model-performance.json, .sf/model-failure-log.jsonl

Quick Win 3: Automate Knowledge Injection [7/10 impact]
- Added knowledge-injector.js with semantic similarity scoring
- Integrated into auto-prompts.js for execute-task prompts
- queryKnowledge already exists in context-store.js (60% done)
- Enhanced with: semantic matching, confidence filtering, contradiction detection
- Tracks knowledge usage for feedback loop

Integration:
- Modified auto-prompts.js to inject knowledge via knowledgeInjection variable
- Added getKnowledgeInjection helper for graceful degradation
- All new modules pass build check and are in dist/

Status: Core infrastructure in place; ready for integration into dispatch loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 22:01:37 +02:00
Mikael Hugo
8fd59e156d sf snapshot: uncommitted changes after 321m inactivity 2026-05-06 21:53:05 +02:00
Mikael Hugo
6471e10245 sf snapshot: uncommitted changes after 64m inactivity 2026-05-06 16:28:31 +02:00
Mikael Hugo
f655188814 sf snapshot: uncommitted changes after 93m inactivity 2026-05-06 11:37:27 +02:00
Mikael Hugo
a73ea845e7 sf snapshot: uncommitted changes after 61m inactivity 2026-05-06 10:04:20 +02:00
Mikael Hugo
500a9d1c1d fix: move unit runtime under uok ownership 2026-05-06 07:02:28 +02:00
Mikael Hugo
76b218762b fix: harden sf autonomous runtime 2026-05-06 06:02:46 +02:00
Mikael Hugo
adf28d69b4 feat: run solver eval from autonomous lifecycle 2026-05-06 04:02:40 +02:00
Mikael Hugo
a1fd6cfc05 fix: separate headless transport from autonomous mode 2026-05-06 02:24:15 +02:00
Mikael Hugo
3960e42b26 docs: align sf purpose doctrine and docs 2026-05-06 00:38:36 +02:00
Mikael Hugo
d75ebfe7c3 sf snapshot: uncommitted changes after 43m inactivity 2026-05-05 21:39:56 +02:00
Mikael Hugo
22fa995500 fix: avoid lockfile churn during doctor install 2026-05-05 20:24:30 +02:00
Mikael Hugo
ab6cad4c84 fix: clean provider surfaces and core build 2026-05-05 16:31:53 +02:00
Mikael Hugo
4c98cb8c33 fix: make autonomous mode canonical 2026-05-05 15:42:10 +02:00
Mikael Hugo
55e7dd0e02 fix: clean generated harness residue 2026-05-05 15:04:34 +02:00
Mikael Hugo
00a118ea71 chore: commit current workspace state 2026-05-05 14:46:18 +02:00
Mikael Hugo
47c806d733 fix: version sf extension runtime sources 2026-05-04 23:27:20 +02:00
Mikael Hugo
ed4a4bc93a chore: commit current worktree state 2026-05-04 19:28:39 +02:00
Mikael Hugo
a37737c4af docs: memory-relations.ts is now ranker-live
Updates 23c5de38b (which flagged the table as storage-only) to reflect
that 55b14c3f7 wired the ranker consumer (graph-boost in
getRelevantMemoriesRanked) and b9bff3762 wired the writer
(co-extraction linkage in applyMemoryActions). The graph-aware
pipeline is now end-to-end live, with named relation types,
auto-linking confidence (0.5), intra-pool boost, and damping (0.4).

Honest description for contributors reading top-down.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:13:56 +02:00
Mikael Hugo
23c5de38bf docs: clarify memory-relations.ts is storage-only today
The architecture.md entry implied memory-relations.ts contributes to
ranking ("knowledge-graph edges between memories"). The read consumer
doesn't exist yet — getRelevantMemoriesRanked uses cosine + static
score, not graph traversal. Relations are written via /sf memory
import / createMemoryRelation but never read for ranking.

Updated the description so a contributor reading this file knows the
graph-traversal pipeline is the next logical extension, not something
that currently runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:52:38 +02:00
Mikael Hugo
daa192a572 docs: list memory-* modules in architecture.md
The repo's architecture file listed only `memory-extractor.ts` and
`memory-store.ts` — the rest of the memory subsystem
(`memory-embeddings.ts`, `memory-embeddings-llm-gateway.ts`,
`memory-relations.ts`, `memory-source-store.ts`) had no entry, so a
new contributor reading the file would miss them entirely.

Added one-line descriptions for each, including the gateway adapter's
opt-in env-var contract (`SF_LLM_GATEWAY_KEY`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:29:03 +02:00
Mikael Hugo
a3ef4bdf3f fix(sf): remove workflow tool aliases 2026-05-02 18:32:50 +02:00
Mikael Hugo
ba4bab1034 fix(sf): correct stale .sf milestone paths in prompts + ADR-impl absolute links
prompts/parallel-research-slices.md step 3 told the dispatcher to verify
research at `.sf/{{mid}}/`, but slice research files actually live at
`.sf/milestones/{{mid}}/slices/<sliceId>/<sliceId>-RESEARCH.md`. Step 3
verification could only ever fail.

prompts/validate-milestone.md sent the three milestone-validation reviewer
agents to wrong paths:
- parentTrace pointed at `.sf/{{milestoneId}}/S0X-SUMMARY.md` (slice
  summaries actually live at `.sf/milestones/{{milestoneId}}/slices/S0X/`)
- Reviewer A read `.sf/{{milestoneId}}/REQUIREMENTS.md` (the file is at
  project-level `.sf/REQUIREMENTS.md`)
- Reviewer A scanned `.sf/{{milestoneId}}/` for slice SUMMARYs (wrong dir)
- Reviewer C read `.sf/{{milestoneId}}/CONTEXT.md` (actual file is
  `.sf/milestones/{{milestoneId}}/{{milestoneId}}-CONTEXT.md`)

Reviewers would either return false MISSING / FAIL verdicts or have to
re-discover the layout.

docs/dev/ADR-{008,009}-IMPLEMENTATION-PLAN.md "Related ADR" links pointed
to absolute paths inside a contributor's old Mac (`/Users/jeremymcspadden/
Github/sf-2/...`). Replaced with sibling-file relative paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:06:16 +02:00
Mikael Hugo
21113e18a9 fix: update remaining stale repo and scope refs to singularity-forge
After fixing forensics.md and error-classifier.ts last fire, swept the
rest of the tree for the same class of stale reference:

- scripts/validate-pack.js: criticalPackages list used \`@sf\` and
  \`@sf-build\` scopes — neither exists in node_modules; this is in CI
  (.github/workflows/ci.yml) + prepublishOnly, so the validation step
  was failing to find anything. Now \`@singularity-forge/pi-coding-agent\`
  and \`@singularity-forge/rpc-client\` (the actual scope).
- src/resources/skills/github-workflows/references/gh/SKILL.md: same
  GraphQL bug as forensics.md — owner:"sf-build" name:"sf-2" — and
  three \`gh project\` commands using owner sf-build. The gh issue
  create command above already used singularity-forge/sf-run, so the
  follow-up calls always failed. Also retitled "sf-2 Backlog" to
  "sf-run Backlog".
- src/resources/extensions/sf/bootstrap/system-context.ts: deprecation
  warning linked to https://github.com/sf-build/SF/issues/1492.
- packages/mcp-server/README.md, packages/rpc-client/README.md: 9 refs
  to \`@sf-build/...\` for installable package names — would mislead
  anyone copy-pasting into npm install.
- docs/user-docs/troubleshooting.md (+ zh-CN): GitHub Issues link
  pointed at github.com/sf-build/SF/issues.
- docs/user-docs/getting-started.md (+ zh-CN): clone URL was correct
  but the next \`cd\` was \`cd sf-2/docker\` — won't exist after a
  fresh clone of sf-run.
- docs/dev/ci-cd-pipeline.md: GHCR org was \`sf-build\`.

Code comments containing "sf-2" / "sf-build" in non-active places
(parsers.ts banner, error message URLs in tests, dev-doc absolute
paths from a contributor's Mac) left alone — they're informational
and not addressed by users or runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:01:55 +02:00
Mikael Hugo
61485c5bef fix(sf): remove legacy completion tool aliases 2026-05-02 17:51:38 +02:00
Mikael Hugo
85a0188fe1 fix(sf): stabilize auto notices and package checks 2026-05-02 12:39:27 +02:00