docs(sf): surface SF_LLM_GATEWAY_* env vars in PREFERENCES template

These are runtime-only settings (not YAML keys), and the previous template
mentioned only the YAML phase toggles. Operators discovering the
embedding/rerank surface had to read source. Adding a clear table at the
bottom of PREFERENCES.md so the env-var contract is documented next to
the rest of the skill prefs.

Documents: SF_LLM_GATEWAY_KEY, SF_LLM_GATEWAY_URL,
SF_LLM_GATEWAY_EMBED_MODEL, SF_LLM_GATEWAY_RERANK_MODEL — including the
silent-fallback semantics and the agent_end backfill cadence.

Markdown-only; no recompile needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-02 23:00:15 +02:00
parent 8299c7ac2b
commit 16cf479781

View file

@ -117,3 +117,16 @@ pre_dispatch_hooks: []
# SF Skill Preferences
See `~/.sf/agent/extensions/sf/docs/preferences-reference.md` for full field documentation and examples.
## Environment variables (not in YAML — set via shell)
These are runtime-only; SF reads them at startup, never persists them, never logs the key.
| Variable | Purpose | Default |
|---|---|---|
| `SF_LLM_GATEWAY_KEY` | Bearer token for the inference-fabric llm-gateway. **When unset, embeddings are disabled** and `getRelevantMemoriesRanked` falls back to static (confidence × hit-count) ranking. | (unset) |
| `SF_LLM_GATEWAY_URL` | OpenAI-compatible endpoint base, including `/v1`. | `https://llm-gateway.centralcloud.com/v1` |
| `SF_LLM_GATEWAY_EMBED_MODEL` | Embedding model id served by the gateway. | `qwen/qwen3-embedding-4b` |
| `SF_LLM_GATEWAY_RERANK_MODEL` | Rerank model id. When unset OR no rerank worker is online, rerank silently degrades and the cosine pass alone ranks results. | (unset) |
Once `SF_LLM_GATEWAY_KEY` is set, the agent_end hook opportunistically backfills embeddings for any memories without vectors (50 per turn, 16 per batch). `/sf memory search "<query>"` lights up the embedding-ranked path; without the key it shows static rank.