- Add vault-credential-resolver.js: Async credential resolution with vault:// URI support - Integration with vault-resolver.js (low-level Vault client) - Update doctor-providers.js to detect and report vault URIs - Synchronous doctor checks (no network I/O) with lazy async resolution - Fail-open semantics: vault unavailable -> fall back to plaintext - 28 tests for credential resolver (all passing) - ADR-0078: Architecture and auth chain documentation Features: - vault://secret/path/to/secret#fieldname URI format - Auth chain: VAULT_TOKEN -> ~/.vault-token -> AppRole (reserved) - Helper functions: couldBeVaultUri, hasProviderCredentialEnvVar, resolveProviderCredential, getCredentialValue, formatCredentialInfo - Full backward compatibility with plaintext keys and auth.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7.6 KiB
| id | title | status | date |
|---|---|---|---|
| 0078 | Vault Credential Resolution for Provider Keys | accepted | 2026-05-07 |
ADR-0078: Vault Credential Resolution for Provider Keys
Problem
SF v3.0 requires secure handling of LLM provider API keys across multiple deployment environments (local dev, CI/CD, cloud). Currently, API keys are stored as plaintext in:
- Environment variables (
.env, shell, CI secrets) - Auth storage files (
auth.json)
This approach has security and operational risks:
- Secret sprawl: Keys duplicated across many environment configs
- Audit gap: No audit trail of which systems accessed which secrets
- Rotation friction: Manual key updates across multiple systems
- Principle of Least Privilege violation: All agents have access to all keys
Decision
Implement Vault credential resolution that:
- Allows provider keys to reference HashiCorp Vault URIs instead of plaintext
- Maintains backward compatibility with plaintext keys and auth.json
- Uses fail-open semantics: if Vault unavailable, falls back to plaintext
- Supports async resolution at runtime (no blocking on startup)
- Keeps doctor checks synchronous (fast health check without HTTP calls)
URI Format
vault://secret/path/to/secret#fieldname
Examples:
ANTHROPIC_API_KEY=vault://secret/anthropic/prod#api_key
OPENAI_API_KEY=vault://secret/openai/prod#api_key
GROQ_API_KEY=vault://secret/groq/prod#api_key
Authentication Chain
In order of preference:
VAULT_ADDRandVAULT_TOKENenvironment variables~/.vault-tokenfile (standard Vault client behavior)- AppRole (VAULT_ROLE_ID + VAULT_SECRET_ID) — reserved for future use
- Fail open: if no auth method available, return plaintext URI
Resolution Chain for Provider Keys
When SF or pi-ai needs a provider credential:
- Check environment variable (e.g.,
ANTHROPIC_API_KEY) - If value starts with
vault://, call async resolver to fetch from Vault - If Vault unavailable, use URI string as plaintext (fail-open)
- Otherwise, check auth.json
- Return undefined if not found
Doctor Checks (Synchronous)
Health checks remain fast by:
- Checking if env var exists AND is non-empty (doesn't matter if it's a URI)
- If env var contains
vault://, report "Vault" as source but don't resolve - Actual resolution happens later when credentials are used
Implementation
New Modules
vault-credential-resolver.js — Provider credential resolution with vault support
couldBeVaultUri(value)— Check if value looks like vault URI (no network I/O)hasProviderCredentialEnvVar(envVarName)— Check if env var exists (no network I/O)resolveProviderCredential(envValue)— Resolve vault URI to actual key (async)resolveProviderCredentials(map)— Resolve multiple credentials (async)getCredentialValue(result, strictMode)— Extract/validate resolved valueformatCredentialInfo(result, providerId)— Format for doctor output (masks value)
vault-resolver.js (existing) — Low-level vault client
parseVaultUri(uri)— Parse vault:// URIsresolveVaultToken()— Resolve auth token from env/file/AppRoleresolveSecret(uri, opts)— Fetch secret from Vault with fail-open
Integration Points
-
doctor-providers.js — Updated to detect vault URIs
resolveKey()now checkscouldBeVaultUri()for vault:// URIs- Reports "vault" as source for vault URIs (no blocking)
-
pi-ai getEnvApiKey() — No changes needed initially
- Returns vault:// URI as-is (callers must resolve async if needed)
- Future: add async variant
getEnvApiKeyAsync()for direct vault support
-
pi-coding-agent resolve-config-value.ts — Already supports vault URIs
resolveConfigValueAsync()handles vault:// URIs- Used when pi-ai actually makes API calls
-
SF agent setup — Can initialize credential cache
- Pre-resolve commonly-used credentials at startup
- Cache with TTL (default 5 minutes, configurable)
Rationale
Why Fail-Open?
- Vault may not be available in all environments (local dev, offline use)
- Graceful degradation allows fallback to plaintext keys without blocking
- Operator can choose strict mode if needed
Why Async?
- Network I/O to Vault happens at credential usage time, not startup
- Startup remains fast (doctor checks are synchronous)
- Credentials can be refreshed by re-resolving throughout session
Why Not Modify pi-ai getEnvApiKey?
getEnvApiKeyis sync; vault resolution is async- Cleaner separation: pi-ai doesn't know about vault
- SF or pi-coding-agent handles async resolution at the point of use
- Allows gradual migration: new code uses async, old code still works with plaintext
Vault KV v2 API
Vault path structure:
secret/ # Mount point
├── anthropic/ # Provider
│ ├── prod # Environment/secret name
│ │ └── api_key # Field in secret
│ └── dev
└── openai/
├── prod
│ ├── api_key
│ └── org_id
└── staging
URI to fetch api_key from secret/anthropic/prod:
vault://secret/anthropic/prod#api_key
Query Patterns (Future)
With vault URIs persisted in config, audit/operations teams can:
-- Find all provider credentials using vault
SELECT provider_id, env_var_name, env_var_value FROM provider_config
WHERE env_var_value LIKE 'vault://%';
-- Reconstruct which services were using which vault secrets
SELECT config.provider_id, secrets.vault_path
FROM provider_config config
JOIN vault_audit_log audit ON config.env_var_value = audit.uri
JOIN vault_secrets secrets ON audit.secret_id = secrets.id;
Security Considerations
- Token Storage: VAULT_TOKEN or ~/.vault-token must be protected (owner-only readable)
- Network: Use HTTPS for Vault connections (VAULT_ADDR should be https://)
- Audit: Enable Vault audit logging to track secret access
- AppRole Rotation: Rotate VAULT_SECRET_ID regularly (future implementation)
- Plaintext Fallback: Explicitly using fail-open means operators must be aware vault could be bypassed in edge cases
Backward Compatibility
- Plaintext API keys continue to work unchanged
- Existing auth.json credentials unaffected
- No breaking changes to SF or pi-ai APIs
- Doctor checks work exactly the same (just report vault as source when applicable)
Testing Strategy
-
Unit tests — Vault resolver with mocked fetch
- URI parsing (valid/invalid formats)
- Auth chain (env, file, AppRole not yet)
- Caching TTL
- Fail-open behavior
-
Integration tests (manual, requires Vault instance)
- End-to-end: set
ANTHROPIC_API_KEY=vault://..., verify SF picks it up - Auth chain: test each auth method (VAULT_TOKEN, ~/.vault-token)
- Doctor checks: verify "Vault" source reported without network I/O
- End-to-end: set
-
Regression tests
- Plaintext keys still work
- auth.json still used as fallback
- No new test failures in existing suite
Future Work
- AppRole support — For CI/CD without token files
- Dynamic credentials — Use Vault to generate temporary DB/API credentials
- Automated key rotation — Periodically fetch fresh credentials from Vault
- Audit integration — Log which credentials were used (for compliance)
- Multi-environment — Support
vault://secret/anthropic/prod#api_keyvsvault://secret/anthropic/staging#api_keyper phase