singularity-forge/sf-orchestrator/workflows/monitor-and-poll.md

7.6 KiB

Monitor and Poll

Check status of a SF project, handle blockers, track costs, and decide next actions.

Checking Project State

The query command is your primary monitoring tool. It's instant (~50ms), costs nothing (no LLM), and returns the full project snapshot.

cd /path/to/project
sf headless query

Key fields to inspect

# Overall status
sf headless query | jq '{
  phase: .state.phase,
  milestone: .state.activeMilestone.id,
  slice: .state.activeSlice.id,
  task: .state.activeTask.id,
  progress: .state.progress,
  cost: .cost.total
}'

# What should happen next
sf headless query | jq '.next'
# Returns: { "action": "dispatch", "unitType": "execute-task", "unitId": "M001/S01/T01" }

# Is it done?
sf headless query | jq '.state.phase'
# "complete" = done, "blocked" = needs you, anything else = in progress

Phase meanings

Phase Meaning Your action
pre-planning Milestone exists, no slices planned yet Run auto or next
needs-discussion Ambiguities need resolution Supply answers or run with defaults
discussing Discussion in progress Wait
researching Codebase/library research Wait
planning Creating task plans Wait
executing Writing code Wait
verifying Checking must-haves Wait
summarizing Recording what happened Wait
advancing Moving to next task/slice Wait
evaluating-gates Quality checks before execution Wait or run next
validating-milestone Final milestone checks Wait
completing-milestone Archiving and cleanup Wait
complete Done Verify deliverables
blocked Needs human input Handle blocker (see below)
paused Explicitly paused Resume with auto

Handling Blockers

When exit code is 10 or phase is blocked:

# 1. Understand the blocker
sf headless query | jq '{phase: .state.phase, blockers: .state.blockers, nextAction: .state.nextAction}'

# 2. Option A: Steer around it
sf headless steer "Skip the database dependency, use in-memory storage instead"

# 3. Option B: Supply pre-built answers
cat > fix.json << 'EOF'
{
  "questions": { "blocked_question_id": "workaround_option" },
  "defaults": { "strategy": "first_option" }
}
EOF
sf headless --answers fix.json auto

# 4. Option C: Force a specific phase
sf headless dispatch replan

# 5. Option D: Escalate to user
echo "SF build blocked. Phase: $(sf headless query | jq -r '.state.phase')"
echo "Manual intervention required."

Cost Tracking

# Current cumulative cost
sf headless query | jq '.cost.total'

# Per-worker breakdown
sf headless query | jq '.cost.workers'

# After a step (from HeadlessJsonResult)
RESULT=$(sf headless --output-format json next 2>/dev/null)
echo "$RESULT" | jq '.cost'

Budget enforcement pattern

MAX_BUDGET=15.00

check_budget() {
  TOTAL=$(sf headless query | jq -r '.cost.total')
  OVER=$(echo "$TOTAL > $MAX_BUDGET" | bc -l)
  if [ "$OVER" = "1" ]; then
    echo "Budget exceeded: \$$TOTAL > \$$MAX_BUDGET"
    sf headless stop
    return 1
  fi
  return 0
}

Poll-and-React Loop

For agents that need to periodically check on a build:

cd /path/to/project

poll_project() {
  STATE=$(sf headless query 2>/dev/null)
  if [ -z "$STATE" ]; then
    echo "NO_PROJECT"
    return
  fi

  PHASE=$(echo "$STATE" | jq -r '.state.phase')
  COST=$(echo "$STATE" | jq -r '.cost.total')
  PROGRESS=$(echo "$STATE" | jq -r '"\(.state.progress.milestones.done)/\(.state.progress.milestones.total) milestones, \(.state.progress.tasks.done)/\(.state.progress.tasks.total) tasks"')

  case "$PHASE" in
    complete)
      echo "COMPLETE cost=\$$COST progress=$PROGRESS"
      ;;
    blocked)
      BLOCKER=$(echo "$STATE" | jq -r '.state.nextAction // "unknown"')
      echo "BLOCKED reason=$BLOCKER cost=\$$COST"
      ;;
    *)
      NEXT=$(echo "$STATE" | jq -r '.next.action // "none"')
      echo "IN_PROGRESS phase=$PHASE next=$NEXT cost=\$$COST progress=$PROGRESS"
      ;;
  esac
}

Resuming Work

If a build was interrupted or you need to continue:

cd /path/to/project

# Check current state
sf headless query | jq '.state.phase'

# Resume from where it left off
sf headless --output-format json auto 2>/dev/null

# Or resume a specific session
sf headless --resume "$SESSION_ID" --output-format json auto 2>/dev/null

Supervised Mode

Use --supervised when you want SF to ask you questions interactively rather than auto-answering or blocking. SF writes UI requests to stdout as JSONL; you respond via stdin.

When to use it: You're the orchestrator running in a loop and want to intercept SF's questions yourself instead of pre-supplying an answers file.

# Launch in supervised mode — SF will write extension_ui_request events to stdout
# and wait for your response on stdin before continuing
sf headless --supervised --json auto 2>/dev/null | while IFS= read -r line; do
  TYPE=$(echo "$line" | jq -r '.type')

  if [ "$TYPE" = "extension_ui_request" ]; then
    # SF is asking a question — inspect it and respond
    TITLE=$(echo "$line" | jq -r '.title // .message // "?"')
    OPTIONS=$(echo "$line" | jq -r '.options[]?.label // empty' | head -5)
    echo "SF asks: $TITLE" >&2
    echo "Options: $OPTIONS" >&2

    # Send your answer back on stdin (the option label or value)
    echo "first_option"   # replace with your selection logic
  fi
done

--response-timeout N (default 30000ms) controls how long SF waits for your response before treating it as a timeout. If you don't respond in time, SF blocks with exit code 10.

Simpler alternative: If you just want to pre-answer known questions without interactive handling, use --answers <file> instead — see references/answer-injection.md.

Crash Recovery

When SF exits unexpectedly (crash, OOM, signal) or .sf/ state looks corrupted:

cd /path/to/project

# 1. Check if the project directory is intact
ls .sf/ 2>/dev/null || { echo "No .sf/ — project state lost, start fresh"; exit 1; }

# 2. Run doctor — detects and auto-fixes common state corruption
sf headless doctor

# 3. Check what state SF thinks it's in
sf headless query | jq '{phase: .state.phase, next: .next}'

# 4. If query fails (state unreadable), inspect STATE.md directly
cat .sf/STATE.md 2>/dev/null

# 5. Resume from current state
sf headless --output-format json auto 2>/dev/null

# 6. If a specific session was interrupted, resume by session ID
sf headless --resume "$SESSION_ID" --output-format json auto 2>/dev/null

Common crash scenarios:

Symptom Cause Fix
query returns empty / parse error .sf/STATE.md corrupted Run sf headless doctor
Phase stuck at advancing Slice summary write interrupted Run sf headless next to retry
Phase stuck at completing-milestone Milestone archive write interrupted Run sf headless dispatch complete
Zombie .sf/ lock file Previous process killed mid-write Run sf headless doctor
exit 1 with no JSON output SF itself crashed (OOM, signal) Check system logs; resume with --resume

If doctor can't recover the state, the safest path is to read .sf/milestones/*/ROADMAP.md to see what completed, then start a new milestone for remaining work.

Reading Build Artifacts

After completion, inspect what SF produced:

cd /path/to/project

# Project summary
cat .sf/PROJECT.md

# What was decided
cat .sf/DECISIONS.md

# Requirements and their validation status
cat .sf/REQUIREMENTS.md

# Milestone summary
cat .sf/milestones/M001-*/M001-*-SUMMARY.md 2>/dev/null

# Git history (SF commits per-slice)
git log --oneline