singularity-forge/.plans/native-perf-optimizations.md

134 lines
5.3 KiB
Markdown
Raw Permalink Normal View History

# Native Performance Optimizations — deriveState, JSONL, Paths, Parsing
## Overview
Four native Rust optimizations to eliminate hot-path bottlenecks in GSD's dispatch cycle.
Building on the existing git2 migration and native parser infrastructure.
---
## 1. Native deriveState — Eliminate Frontmatter Re-serialization
### Problem
`state.ts:134-176` — When `nativeBatchParseGsdFiles()` returns parsed files, the JS
side re-serializes frontmatter back into YAML strings so downstream parsers can re-parse
them. This is a round-trip waste: Rust parses → JS re-serializes → JS re-parses.
### Solution
The native batch parser already returns `{ metadata: JSON, body, sections }`.
Instead of re-serializing frontmatter to YAML in JS, modify `cachedLoadFile()` to
return the raw body directly, and update downstream parsers to accept pre-parsed
metadata. This eliminates the entire lines 143-172 re-serialization loop.
However, the parsers (`parseRoadmap`, `parseSummary`, `parsePlan`, etc.) all expect
raw markdown strings with frontmatter. Changing their signatures would be a massive
refactor. Instead:
**Approach: Make Rust return the original file content alongside parsed data.**
Add a new field `rawContent: String` to `ParsedGsdFile` that contains the complete
original file content. The JS batch cache stores this directly, eliminating the
re-serialization entirely. Downstream parsers get exactly what `loadFile()` would return.
### Implementation
- **Rust** (`gsd_parser.rs`): Add `raw_content` field to `ParsedGsdFile`, populate with
the original file content read from disk.
- **TS** (`native-parser-bridge.ts`): Expose `rawContent` in `BatchParsedFile`.
- **TS** (`state.ts`): Replace the 30-line re-serialization loop with
`fileContentCache.set(absPath, f.rawContent)`.
### Impact
Eliminates ~30 lines of JS string building per dispatch. Removes JSON.parse of metadata
that was only used to re-serialize back to YAML.
---
## 2. Native JSONL Streaming Parser
### Problem
`session-forensics.ts:68-78` — Parses JSONL by `split("\n").map(JSON.parse)` with a
10MB cap. Large session files cause OOM or slowness.
### Solution
Add a Rust JSONL parser that streams through the file with constant memory, returning
structured data. Uses `serde_json` for parsing and handles arbitrary file sizes.
### Implementation
- **Rust** (`gsd_parser.rs`): Add `parse_jsonl_tail(path, max_entries?)` function that:
1. Memory-maps or streams the file from the tail
2. Parses each line as JSON
3. Returns the last N entries as a JSON array string
- **TS** (`native-parser-bridge.ts`): Add bridge function.
- **TS** (`session-forensics.ts`): Use native parser, fall back to JS implementation.
### Impact
Handles arbitrary file sizes. 3-5x faster parsing on 10MB files.
---
## 3. Native Directory Tree Index
### Problem
`paths.ts:20-34``cachedReaddirSync()` caches per-directory, but caches are
cleared every dispatch via `invalidateAllCaches()`. Each `resolveMilestoneFile`,
`resolveSliceFile`, `resolveTaskFile` triggers separate directory reads.
### Solution
Add a Rust function that walks the entire `.gsd/` tree once and returns a flat
file listing. The JS side builds a Map from this, making all path resolution O(1)
lookups instead of repeated `readdirSync` + regex matching.
### Implementation
- **Rust** (`gsd_parser.rs`): The `batchParseGsdFiles` already walks the tree.
Add `scan_gsd_tree(directory)` that returns `Vec<{ path, isDir, name }>` for
ALL entries (not just .md files).
- **TS** (`native-parser-bridge.ts`): Add bridge function.
- **TS** (`paths.ts`): Add native tree cache. On first access, call native scan
and build lookup maps. `clearPathCache()` clears the native cache too.
### Impact
Eliminates 20-50 `readdirSync` calls per dispatch. Makes `resolveDir`/`resolveFile`
O(1) lookups.
---
## 4. Expand Native Markdown Parsing
### Problem
`files.ts` parsers (`parsePlan`, `parseSummary`, `parseContinue`) still use JS regex.
Each runs ~10-20 regex patterns per file. Only `parseRoadmap` has a native implementation.
### Solution
Add native Rust implementations for `parsePlan` and `parseSummary` — the two parsers
called most frequently during `deriveState`. `parseContinue` is called infrequently
and can stay in JS.
### Implementation
- **Rust** (`gsd_parser.rs`): Add `parse_plan_file(content)` and `parse_summary_file(content)`.
- **TS** (`native-parser-bridge.ts`): Add bridge functions with JS fallback.
- **TS** (`files.ts`): Call native versions first, fall back to JS.
### Impact
3-5x faster parsing per file. With ~20 files per deriveState, saves 20-40ms.
---
## Implementation Order
1. **deriveState raw content** (smallest change, biggest immediate impact)
2. **Directory tree index** (eliminates readdirSync overhead)
3. **JSONL streaming parser** (helps crash recovery path)
4. **Plan/Summary native parsers** (improves parsing throughput)
## Files Modified
### Rust
- `native/crates/engine/src/gsd_parser.rs` — new functions + rawContent field
### TypeScript
- `src/resources/extensions/gsd/native-parser-bridge.ts` — new bridge functions
- `src/resources/extensions/gsd/state.ts` — simplified batch cache
- `src/resources/extensions/gsd/paths.ts` — native tree cache
- `src/resources/extensions/gsd/session-forensics.ts` — native JSONL
- `src/resources/extensions/gsd/files.ts` — native plan/summary parsers