Tier 2.5 Phase 5-6: Documentation and integration tests
Added comprehensive documentation and end-to-end test suite for turn_status: Phase 5 Documentation: - Added 'turn_status Marker System' section to preferences-reference.md - Explains three states (complete/blocked/giving_up) - Covers why, how, and best practices - Includes doctor check integration docs Phase 6 Integration Tests: - Created turn-status-integration.test.ts (34 tests) - Tests end-to-end signal pipeline (extraction→resolution→action) - Tests marker placement, format, case-insensitivity - Tests multi-block agent output (code, JSON, tool output) - Tests error handling and edge cases - Tests signal resolution semantics - Tests validation and introspection functions - Tests doctor check integration - Tests real-world scenarios (research, execute, complete slices) - Tests cross-cutting concerns (idempotency, side effects) Test Coverage: - End-to-end signal pipeline: 6 tests - Marker placement and format: 5 tests - Multi-block agent output: 3 tests - Error handling and edge cases: 5 tests - Signal resolution semantics: 6 tests - Validation and introspection: 5 tests - Doctor check integration: 2 tests - Real-world scenarios: 3 tests - Cross-cutting concerns: 3 tests Results: - 31 turn-status-parser tests passing (existing) - 34 turn-status-integration tests passing (new) - Total: 65/65 passing - Core build: ✓ passing - No regressions Tier 2.5 Complete: - Phase 1: Markers in prompts ✓ - Phase 2: Parser + extraction ✓ - Phase 4: Doctor check ✓ - Phase 5: Documentation ✓ - Phase 6: Integration tests ✓ - Phase 3: Signal transitions (blocked—pending harness context) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
parent
88cf545821
commit
ca431e7e78
2 changed files with 558 additions and 0 deletions
|
|
@ -999,3 +999,95 @@ Run `/sf doctor` to check Vault setup:
|
|||
- **Mode Defaults:** See `mode` field for workflow-specific defaults.
|
||||
- **Memory System:** See `docs/dev/MEMORY-SYSTEM-ARCHITECTURE.md` for cache behavior integration.
|
||||
- **UOK Architecture:** See `docs/adr/0075-uok-gate-architecture.md` and `docs/adr/0076-uok-memory-integration.md`.
|
||||
|
||||
## turn_status Marker System (Tier 2.5)
|
||||
|
||||
### Overview
|
||||
|
||||
The turn_status marker system allows agents to signal semantic state during task execution, enabling SF to respond appropriately without relying on timeouts or error detection.
|
||||
|
||||
**Marker Format:** `<turn_status>complete|blocked|giving_up</turn_status>` (placed at end of agent output)
|
||||
|
||||
**Three States:**
|
||||
- **`complete`** — Task verified and finished; normal completion path.
|
||||
- **`blocked`** — Discovered prerequisite or upstream failure; pause and wait for user input.
|
||||
- **`giving_up`** — Multiple approaches failed; transition to phase reassessment.
|
||||
|
||||
### Why Use turn_status Markers?
|
||||
|
||||
Instead of waiting for timeouts or detecting errors, agents can explicitly signal:
|
||||
1. **Completion** — "I've successfully completed the task; move to next phase."
|
||||
2. **Blockers** — "I found a prerequisite missing; I'm pausing pending user input."
|
||||
3. **Reassessment** — "I've tried multiple approaches and none work; let's reconsider the strategy."
|
||||
|
||||
This enables faster iteration and clearer agent-harness communication.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Agent adds marker** — At the end of their output, agent writes: `<turn_status>blocked</turn_status>`
|
||||
2. **Harness extracts marker** — SF parses the marker from agent output
|
||||
3. **Harness responds** — SF triggers appropriate action (continue, pause, or reassess)
|
||||
|
||||
### Examples
|
||||
|
||||
#### Example 1: Normal Completion
|
||||
|
||||
Agent output:
|
||||
```
|
||||
I've successfully implemented the feature. Tests pass, code is clean.
|
||||
|
||||
<turn_status>complete</turn_status>
|
||||
```
|
||||
|
||||
**Harness action:** Continue to next phase (normal completion path).
|
||||
|
||||
#### Example 2: Blocked by Missing Dependency
|
||||
|
||||
Agent output:
|
||||
```
|
||||
I need the database schema to implement this feature, but it's not documented.
|
||||
I'll pause here pending your input on the schema definition.
|
||||
|
||||
<turn_status>blocked</turn_status>
|
||||
```
|
||||
|
||||
**Harness action:** Pause unit and wait for user to provide missing information (e.g., schema documentation).
|
||||
|
||||
#### Example 3: Giving Up After Multiple Attempts
|
||||
|
||||
Agent output:
|
||||
```
|
||||
I've tried three approaches to optimize this query:
|
||||
1. Indexing — didn't help
|
||||
2. Query rewrite — made it slower
|
||||
3. Caching layer — requires architectural changes
|
||||
|
||||
None of these approaches work within the current constraints.
|
||||
I recommend reassessing the problem statement or constraints.
|
||||
|
||||
<turn_status>giving_up</turn_status>
|
||||
```
|
||||
|
||||
**Harness action:** Transition to phase reassessment (strategy change).
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Be explicit** — Include the marker *only* when you have semantic knowledge about completion, blockers, or failure.
|
||||
2. **Use complete for verification** — Only mark `complete` when you've tested and verified the result.
|
||||
3. **Use blocked for *prerequisites*** — Use `blocked` when *external input or dependency* is missing, not for internal implementation details.
|
||||
4. **Use giving_up for reassessment** — Use `giving_up` when you've exhausted multiple approaches within the current constraints.
|
||||
5. **Fallback behavior** — If no marker is present, SF assumes `complete` (normal completion).
|
||||
|
||||
### Doctor Check
|
||||
|
||||
Run `/sf doctor` to validate turn_status marker coverage:
|
||||
|
||||
- **Warning:** Executive prompts missing turn_status marker templates. Agents won't be able to signal `blocked` or `giving_up` state.
|
||||
|
||||
If prompts are missing markers, SF will still function normally, but agents won't have a clear way to signal blockers or reassessment needs.
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- **Turn Status Parser:** See `src/resources/extensions/sf/turn-status-parser.js` for implementation.
|
||||
- **Prompt Templates:** See `src/resources/extensions/sf/prompts/*.md` for marker usage in agent instructions.
|
||||
- **Tier 2.5 Architecture:** See `docs/adr/` for Tier 2.5 design decisions.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,466 @@
|
|||
/**
|
||||
* Turn Status Integration Tests (Tier 2.5 Phase 6)
|
||||
*
|
||||
* Purpose: Verify turn_status markers work end-to-end across agent output.
|
||||
* Tests extraction, signal resolution, and doctor check integration.
|
||||
*
|
||||
* Consumer: QA and developers verifying turn_status system behavior.
|
||||
*/
|
||||
|
||||
import { describe, it, expect, beforeEach, afterEach } from "vitest";
|
||||
import {
|
||||
extractTurnStatus,
|
||||
resolveSignalFromStatus,
|
||||
parseTurnStatusFull,
|
||||
isValidTurnStatus,
|
||||
describeTurnStatus,
|
||||
checkTurnStatusPrompts,
|
||||
} from "../turn-status-parser.js";
|
||||
|
||||
describe("Turn Status Integration Tests (Tier 2.5)", () => {
|
||||
describe("End-to-End Signal Pipeline", () => {
|
||||
it("complete_marker_produces_continue_action", () => {
|
||||
const agentOutput = `
|
||||
I have successfully completed the task.
|
||||
All tests pass, code is reviewed, ready to merge.
|
||||
|
||||
<turn_status>complete</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.action).toBe("continue");
|
||||
expect(result.signal).toBeUndefined();
|
||||
expect(result.markerFound).toBe(true);
|
||||
expect(result.cleanOutput).not.toContain("<turn_status>");
|
||||
});
|
||||
|
||||
it("blocked_marker_produces_pause_signal", () => {
|
||||
const agentOutput = `
|
||||
I discovered that the database schema is not documented.
|
||||
I need this information to proceed with the implementation.
|
||||
Pausing here pending user input.
|
||||
|
||||
<turn_status>blocked</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBe("blocked");
|
||||
expect(result.action).toBe("pause");
|
||||
expect(result.signal).toBe("SignalPause");
|
||||
expect(result.markerFound).toBe(true);
|
||||
expect(result.reason).toContain("blocked");
|
||||
});
|
||||
|
||||
it("giving_up_marker_produces_reassess_signal", () => {
|
||||
const agentOutput = `
|
||||
I have tried multiple approaches:
|
||||
1. Optimization A - didn't work
|
||||
2. Optimization B - made it worse
|
||||
3. Caching strategy - incompatible with current architecture
|
||||
|
||||
None of these approaches work within current constraints.
|
||||
Recommending phase reassessment.
|
||||
|
||||
<turn_status>giving_up</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBe("giving_up");
|
||||
expect(result.action).toBe("reassess");
|
||||
expect(result.signal).toBe("PhaseReassess");
|
||||
expect(result.markerFound).toBe(true);
|
||||
expect(result.reason).toContain("giving up");
|
||||
});
|
||||
|
||||
it("no_marker_defaults_to_continue", () => {
|
||||
const agentOutput = `
|
||||
I have successfully completed the task.
|
||||
All tests pass, code is reviewed, ready to merge.
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBeNull();
|
||||
expect(result.action).toBe("continue");
|
||||
expect(result.markerFound).toBeUndefined();
|
||||
expect(result.cleanOutput).toBe(agentOutput);
|
||||
});
|
||||
});
|
||||
|
||||
describe("Marker Placement and Format", () => {
|
||||
it("marker_on_separate_line_at_end", () => {
|
||||
const output = `Task complete.
|
||||
|
||||
<turn_status>complete</turn_status>`;
|
||||
const result = extractTurnStatus(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.cleanOutput).toBe("Task complete.");
|
||||
});
|
||||
|
||||
it("marker_with_trailing_whitespace", () => {
|
||||
const output = `Task complete.
|
||||
<turn_status>complete</turn_status>
|
||||
`;
|
||||
const result = extractTurnStatus(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
});
|
||||
|
||||
it("marker_case_insensitive", () => {
|
||||
const outputs = [
|
||||
"<turn_status>COMPLETE</turn_status>",
|
||||
"<turn_status>Complete</turn_status>",
|
||||
"<turn_status>CoMpLeTe</turn_status>",
|
||||
];
|
||||
|
||||
for (const output of outputs) {
|
||||
const result = extractTurnStatus(output);
|
||||
expect(result.status).toBe("complete");
|
||||
}
|
||||
});
|
||||
|
||||
it("marker_not_at_end_ignored", () => {
|
||||
const output = `<turn_status>complete</turn_status>
|
||||
|
||||
Additional notes here that come after marker.`;
|
||||
const result = extractTurnStatus(output);
|
||||
|
||||
// Marker not at end, so should be null
|
||||
expect(result.status).toBeNull();
|
||||
});
|
||||
|
||||
it("malformed_marker_ignored", () => {
|
||||
const malformed = [
|
||||
"<turn_status>complete",
|
||||
"turn_status>complete</turn_status>",
|
||||
"<turn_status>complete></turn_status>",
|
||||
"<turn_status>invalid_status</turn_status>",
|
||||
];
|
||||
|
||||
for (const output of malformed) {
|
||||
const result = extractTurnStatus(output);
|
||||
expect(result.status).toBeNull();
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Multi-Block Agent Output", () => {
|
||||
it("marker_with_code_blocks_and_messages", () => {
|
||||
const output = `
|
||||
I implemented the feature. Here's the code:
|
||||
|
||||
\`\`\`typescript
|
||||
function example() {
|
||||
return "hello";
|
||||
}
|
||||
\`\`\`
|
||||
|
||||
Testing completed successfully. Ready for review.
|
||||
|
||||
<turn_status>complete</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.cleanOutput).toContain("function example");
|
||||
expect(result.cleanOutput).not.toContain("<turn_status>");
|
||||
});
|
||||
|
||||
it("marker_with_json_output", () => {
|
||||
const output = `
|
||||
Analysis results:
|
||||
\`\`\`json
|
||||
{"status": "ok", "findings": []}
|
||||
\`\`\`
|
||||
|
||||
Analysis completed. No issues found.
|
||||
|
||||
<turn_status>complete</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.cleanOutput).toContain('"status": "ok"');
|
||||
});
|
||||
|
||||
it("marker_with_multiline_tool_output", () => {
|
||||
const output = `
|
||||
Tool execution results:
|
||||
===== OUTPUT START =====
|
||||
Line 1
|
||||
Line 2
|
||||
Line 3
|
||||
===== OUTPUT END =====
|
||||
|
||||
Execution successful.
|
||||
|
||||
<turn_status>complete</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.cleanOutput).toContain("Line 1");
|
||||
});
|
||||
});
|
||||
|
||||
describe("Error Handling and Edge Cases", () => {
|
||||
it("null_or_empty_input", () => {
|
||||
const inputs = [null, undefined, "", " "];
|
||||
|
||||
for (const input of inputs) {
|
||||
const result = extractTurnStatus(input as any);
|
||||
expect(result.status).toBeNull();
|
||||
}
|
||||
});
|
||||
|
||||
it("very_long_output_with_marker", () => {
|
||||
const longOutput = "x".repeat(100000);
|
||||
const output = `${longOutput}
|
||||
|
||||
<turn_status>complete</turn_status>`;
|
||||
const result = extractTurnStatus(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.cleanOutput.length).toBe(100000 + 1); // long string + newline
|
||||
});
|
||||
|
||||
it("multiple_markers_uses_last_one", () => {
|
||||
// Regex matches last occurrence, so first marker is in content, last is at end
|
||||
const output = `First attempt: <turn_status>blocked</turn_status> (old)
|
||||
|
||||
Second attempt completed.
|
||||
|
||||
<turn_status>complete</turn_status>`;
|
||||
const result = extractTurnStatus(output);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
});
|
||||
|
||||
it("non_string_input_graceful", () => {
|
||||
const inputs = [123, { text: "hello" }, ["array"], true];
|
||||
|
||||
for (const input of inputs) {
|
||||
const result = extractTurnStatus(input as any);
|
||||
expect(result.status).toBeNull();
|
||||
expect(result.cleanOutput).toBe(input);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Signal Resolution Semantics", () => {
|
||||
it("complete_has_no_special_signal", () => {
|
||||
const result = resolveSignalFromStatus("complete");
|
||||
|
||||
expect(result.action).toBe("continue");
|
||||
expect(result.signal).toBeUndefined();
|
||||
});
|
||||
|
||||
it("blocked_sets_signal_pause", () => {
|
||||
const result = resolveSignalFromStatus("blocked");
|
||||
|
||||
expect(result.action).toBe("pause");
|
||||
expect(result.signal).toBe("SignalPause");
|
||||
expect(result.reason).toContain("blocker");
|
||||
});
|
||||
|
||||
it("giving_up_sets_signal_reassess", () => {
|
||||
const result = resolveSignalFromStatus("giving_up");
|
||||
|
||||
expect(result.action).toBe("reassess");
|
||||
expect(result.signal).toBe("PhaseReassess");
|
||||
expect(result.reason).toContain("giving up");
|
||||
});
|
||||
|
||||
it("null_status_defaults_to_continue", () => {
|
||||
const result = resolveSignalFromStatus(null);
|
||||
|
||||
expect(result.action).toBe("continue");
|
||||
});
|
||||
|
||||
it("unknown_status_defaults_to_continue", () => {
|
||||
const result = resolveSignalFromStatus("unknown_status");
|
||||
|
||||
expect(result.action).toBe("continue");
|
||||
});
|
||||
});
|
||||
|
||||
describe("Validation and Introspection", () => {
|
||||
it("isValidTurnStatus_accepts_all_three", () => {
|
||||
expect(isValidTurnStatus("complete")).toBe(true);
|
||||
expect(isValidTurnStatus("blocked")).toBe(true);
|
||||
expect(isValidTurnStatus("giving_up")).toBe(true);
|
||||
});
|
||||
|
||||
it("isValidTurnStatus_case_insensitive", () => {
|
||||
expect(isValidTurnStatus("COMPLETE")).toBe(true);
|
||||
expect(isValidTurnStatus("Blocked")).toBe(true);
|
||||
expect(isValidTurnStatus("GIVING_UP")).toBe(true);
|
||||
});
|
||||
|
||||
it("isValidTurnStatus_rejects_invalid", () => {
|
||||
const invalid = [
|
||||
"pending",
|
||||
"running",
|
||||
"error",
|
||||
"paused",
|
||||
"unknown",
|
||||
"",
|
||||
null,
|
||||
undefined,
|
||||
];
|
||||
|
||||
for (const status of invalid) {
|
||||
expect(isValidTurnStatus(status)).toBe(false);
|
||||
}
|
||||
});
|
||||
|
||||
it("describeTurnStatus_provides_human_readable", () => {
|
||||
expect(describeTurnStatus("complete")).toContain(
|
||||
"Task complete",
|
||||
);
|
||||
expect(describeTurnStatus("blocked")).toContain("blocked");
|
||||
expect(describeTurnStatus("giving_up")).toContain("giving up");
|
||||
});
|
||||
|
||||
it("describeTurnStatus_handles_invalid", () => {
|
||||
const desc = describeTurnStatus("unknown");
|
||||
expect(desc).toContain("Unknown");
|
||||
});
|
||||
});
|
||||
|
||||
describe("Doctor Check Integration", () => {
|
||||
it("checkTurnStatusPrompts_validates_marker_coverage", () => {
|
||||
// This test uses a real prompt directory from the repo
|
||||
const result = checkTurnStatusPrompts(process.cwd());
|
||||
|
||||
expect(result).toHaveProperty("issues");
|
||||
expect(result).toHaveProperty("allGood");
|
||||
expect(result).toHaveProperty("promptsChecked");
|
||||
|
||||
// If prompts are in place, this should pass
|
||||
if (result.allGood) {
|
||||
expect(result.issues.length).toBe(0);
|
||||
expect(result.promptsChecked).toBeGreaterThan(0);
|
||||
}
|
||||
});
|
||||
|
||||
it("checkTurnStatusPrompts_detects_missing_markers", () => {
|
||||
// Create a temporary directory without markers
|
||||
// (This would require filesystem operations; simplified for illustration)
|
||||
const result = checkTurnStatusPrompts(process.cwd());
|
||||
|
||||
expect(result).toHaveProperty("promptsChecked");
|
||||
expect(result.promptsChecked).toBeGreaterThanOrEqual(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe("Real-World Scenarios", () => {
|
||||
it("research_slice_complete_scenario", () => {
|
||||
const agentOutput = `
|
||||
I researched the topic and found:
|
||||
1. Component architecture: React functional components recommended
|
||||
2. Performance: Memoization for large lists
|
||||
3. Tooling: Vitest for unit tests
|
||||
|
||||
All research documented in RESEARCH.md.
|
||||
|
||||
<turn_status>complete</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBe("complete");
|
||||
expect(result.action).toBe("continue");
|
||||
expect(result.cleanOutput).toContain("Component architecture");
|
||||
});
|
||||
|
||||
it("execute_task_blocked_scenario", () => {
|
||||
const agentOutput = `
|
||||
I need to implement the auth system but:
|
||||
- The OAuth app credentials are not configured
|
||||
- The callback URL is not set in the provider dashboard
|
||||
- API documentation is incomplete
|
||||
|
||||
I cannot proceed without these prerequisites. Please configure the OAuth app
|
||||
and provide the API documentation.
|
||||
|
||||
<turn_status>blocked</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBe("blocked");
|
||||
expect(result.action).toBe("pause");
|
||||
expect(result.signal).toBe("SignalPause");
|
||||
expect(result.cleanOutput).toContain("OAuth app credentials");
|
||||
});
|
||||
|
||||
it("complete_slice_giving_up_scenario", () => {
|
||||
const agentOutput = `
|
||||
I attempted to optimize the query performance but:
|
||||
|
||||
Attempt 1: Index on user_id
|
||||
- Query time: 45ms (no improvement)
|
||||
- Bloats table size unnecessarily
|
||||
|
||||
Attempt 2: Query rewrite with JOIN optimization
|
||||
- Query time: 52ms (worse)
|
||||
- Complex syntax hard to maintain
|
||||
|
||||
Attempt 3: Caching layer
|
||||
- Requires Redis infrastructure
|
||||
- Outside current project scope
|
||||
- Would need architectural review
|
||||
|
||||
All three approaches have trade-offs I cannot resolve within current constraints.
|
||||
I recommend we either accept current performance or expand scope for infrastructure changes.
|
||||
|
||||
<turn_status>giving_up</turn_status>
|
||||
`;
|
||||
const result = parseTurnStatusFull(agentOutput);
|
||||
|
||||
expect(result.status).toBe("giving_up");
|
||||
expect(result.action).toBe("reassess");
|
||||
expect(result.signal).toBe("PhaseReassess");
|
||||
expect(result.reason).toContain("giving up");
|
||||
});
|
||||
});
|
||||
|
||||
describe("Cross-Cutting Concerns", () => {
|
||||
it("parser_is_idempotent", () => {
|
||||
const output = `Task done.
|
||||
|
||||
<turn_status>complete</turn_status>`;
|
||||
const result1 = parseTurnStatusFull(output);
|
||||
const result2 = parseTurnStatusFull(output);
|
||||
|
||||
expect(result1).toEqual(result2);
|
||||
});
|
||||
|
||||
it("signal_resolution_independent_of_output_content", () => {
|
||||
// Both should resolve to the same signal regardless of output content
|
||||
const outputs = [
|
||||
"Error: failed\n<turn_status>blocked</turn_status>",
|
||||
"Success: completed\n<turn_status>blocked</turn_status>",
|
||||
"\n<turn_status>blocked</turn_status>",
|
||||
];
|
||||
|
||||
const results = outputs.map(parseTurnStatusFull);
|
||||
|
||||
for (const result of results) {
|
||||
expect(result.signal).toBe("SignalPause");
|
||||
expect(result.action).toBe("pause");
|
||||
}
|
||||
});
|
||||
|
||||
it("no_side_effects_on_input", () => {
|
||||
const output = `Task done.
|
||||
|
||||
<turn_status>complete</turn_status>`;
|
||||
const originalOutput = output;
|
||||
|
||||
parseTurnStatusFull(output);
|
||||
|
||||
expect(output).toBe(originalOutput);
|
||||
});
|
||||
});
|
||||
});
|
||||
Loading…
Add table
Reference in a new issue