docs(sf): surface SF_LLM_GATEWAY_* env vars in PREFERENCES template

These are runtime-only settings (not YAML keys), and the previous template mentioned only the YAML phase toggles. Operators discovering the embedding/rerank surface had to read source. Adding a clear table at the bottom of PREFERENCES.md so the env-var contract is documented next to the rest of the skill prefs. Documents: SF_LLM_GATEWAY_KEY, SF_LLM_GATEWAY_URL, SF_LLM_GATEWAY_EMBED_MODEL, SF_LLM_GATEWAY_RERANK_MODEL — including the silent-fallback semantics and the agent_end backfill cadence. Markdown-only; no recompile needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:00:15 +02:00 · 2026-05-02 23:00:15 +02:00 · 16cf479781
commit 16cf479781
parent 8299c7ac2b
1 changed files with 13 additions and 0 deletions
--- a/src/resources/extensions/sf/templates/PREFERENCES.md
+++ b/src/resources/extensions/sf/templates/PREFERENCES.md
@ -117,3 +117,16 @@ pre_dispatch_hooks: []
 # SF Skill Preferences

 See `~/.sf/agent/extensions/sf/docs/preferences-reference.md` for full field documentation and examples.
+
+## Environment variables (not in YAML — set via shell)
+
+These are runtime-only; SF reads them at startup, never persists them, never logs the key.
+
+| Variable | Purpose | Default |
+|---|---|---|
+| `SF_LLM_GATEWAY_KEY` | Bearer token for the inference-fabric llm-gateway. **When unset, embeddings are disabled** and `getRelevantMemoriesRanked` falls back to static (confidence × hit-count) ranking. | (unset) |
+| `SF_LLM_GATEWAY_URL` | OpenAI-compatible endpoint base, including `/v1`. | `https://llm-gateway.centralcloud.com/v1` |
+| `SF_LLM_GATEWAY_EMBED_MODEL` | Embedding model id served by the gateway. | `qwen/qwen3-embedding-4b` |
+| `SF_LLM_GATEWAY_RERANK_MODEL` | Rerank model id. When unset OR no rerank worker is online, rerank silently degrades and the cosine pass alone ranks results. | (unset) |
+
+Once `SF_LLM_GATEWAY_KEY` is set, the agent_end hook opportunistically backfills embeddings for any memories without vectors (50 per turn, 16 per batch). `/sf memory search "<query>"` lights up the embedding-ranked path; without the key it shows static rank.