diff --git a/packages/pi-ai/package.json b/packages/pi-ai/package.json index ef927165d..a3e5d4978 100644 --- a/packages/pi-ai/package.json +++ b/packages/pi-ai/package.json @@ -26,6 +26,7 @@ "@anthropic-ai/sdk": "^0.73.0", "@anthropic-ai/vertex-sdk": "^0.14.4", "@aws-sdk/client-bedrock-runtime": "^3.983.0", + "@google/gemini-cli-core": "0.38.2", "@google/genai": "^1.40.0", "@mistralai/mistralai": "^1.14.1", "@sinclair/typebox": "^0.34.41", diff --git a/packages/pi-ai/src/providers/google-gemini-cli-core-plan.md b/packages/pi-ai/src/providers/google-gemini-cli-core-plan.md new file mode 100644 index 000000000..6b3e70a8e --- /dev/null +++ b/packages/pi-ai/src/providers/google-gemini-cli-core-plan.md @@ -0,0 +1,133 @@ +# Re-platforming `google-gemini-cli` onto `@google/gemini-cli-core` + +**Status:** Dependency installed (2026-04-19). Refactor pending next iteration. + +## Goal + +Replace the handwritten `fetch()` transport in `google-gemini-cli.ts` with calls +into `@google/gemini-cli-core`'s `CodeAssistServer` so requests to +`cloudcode-pa.googleapis.com` are byte-for-byte indistinguishable from the +official `gemini` CLI. Upside: free OAuth quota treatment, automatic inheritance +of upstream improvements, no reverse-engineered User-Agent / Client-Metadata +drift. + +## Scope + +**In-scope** +- `provider: "google-gemini-cli"` stream paths in `google-gemini-cli.ts` + (functions `streamGoogleGeminiCli` and `streamSimpleGoogleGeminiCli`). + +**Out-of-scope (keep handwritten)** +- `provider: "google-antigravity"` — different sandbox endpoints + (`daily-cloudcode-pa.sandbox.googleapis.com`), different auth contract + (Antigravity IDE-scoped), different User-Agent requirements. cli-core + does not target these endpoints. +- `provider: "google"` (API key) and `provider: "google-vertex"` — unrelated + transports, stay on `@google/genai` directly. + +## API mapping (cli-core 0.38.2) + +| Today (handwritten) | After (cli-core) | +|------------------------------------------------------------|------------------------------------------------------------------------| +| `fetch(CLOUD_CODE_ASSIST_ENDPOINT + ":streamGenerateContent?alt=sse", …)` | `await server.generateContentStream(req, promptId, role)` returns `AsyncGenerator` | +| Manual SSE body parsing (`response.body.getReader()` + `TextDecoder`) | cli-core yields already-parsed `GenerateContentResponse` chunks | +| Custom retry loop (429/5xx with backoff, endpoint cascade) | cli-core has internal retry in `requestStreamingPost` | +| Header assembly (`User-Agent`, `X-Goog-Api-Client`, `Client-Metadata`) | cli-core sets its own correct headers; just pass `httpOptions.headers` for extras | +| OAuth token carried in SF `apiKey` as `{ token, projectId }` JSON | Either keep (build `OAuth2Client` + set credentials) OR let cli-core load from `~/.gemini/oauth_creds.json` via `getOauthClient()` | + +Relevant cli-core exports: + +```ts +import { CodeAssistServer, CODE_ASSIST_ENDPOINT, type HttpOptions } from "@google/gemini-cli-core"; +import { getOauthClient } from "@google/gemini-cli-core/dist/src/code_assist/oauth2.js"; +import { AuthType } from "@google/gemini-cli-core"; +import type { GenerateContentParameters, GenerateContentResponse } from "@google/genai"; +``` + +## Two integration strategies + +### Strategy A: Transport-only (incremental, lower risk) + +Keep SF's existing auth storage (`apiKey` JSON blob with `{ token, projectId }`). +At each request: + +```ts +import { OAuth2Client } from "google-auth-library"; +import { CodeAssistServer } from "@google/gemini-cli-core"; + +const authClient = new OAuth2Client(); +authClient.setCredentials({ access_token: token }); +const server = new CodeAssistServer(authClient, projectId, { + headers: { /* extras if any */ }, +}); + +for await (const chunk of await server.generateContentStream(req, promptId, "USER")) { + // feed chunk into existing AssistantMessageEventStream adapter +} +``` + +Pros: no SF auth-layer changes, minimal blast radius. +Cons: SF still does OAuth refresh manually; cli-core's auto-refresh benefit lost. + +### Strategy B: Full cli-core auth (target state) + +Drop the `apiKey` unpacking for `google-gemini-cli`. At provider init: + +```ts +const authClient = await getOauthClient(AuthType.LOGIN_WITH_GOOGLE, config); +const server = new CodeAssistServer(authClient, projectId); +``` + +cli-core reads `~/.gemini/oauth_creds.json` (migrated to keychain on newer +installs), refreshes tokens, writes back. SF's `/login` flow for this provider +becomes "run the real `gemini` binary first" — exactly what the user asked for. + +Pros: full integration benefit, SF drops ~80 lines of auth management. +Cons: breaks existing SF auth storage path for this provider; users must +re-authenticate via `gemini` CLI at least once. + +Recommendation: **A first** (one commit, verifiable), **B second** as a +follow-up once A is stable. + +## Implementation checklist (Strategy A) + +1. Add factory helper `buildCodeAssistServer(token, projectId)` that constructs + `OAuth2Client` + `CodeAssistServer`. Put it alongside the existing helpers + near the top of `google-gemini-cli.ts`. +2. In `streamGoogleGeminiCli` (line 320): branch on `model.provider`. When + `"google-gemini-cli"`, use the new helper and replace the `fetch()` block + (lines ~392-450) with `server.generateContentStream()` consumption. When + `"google-antigravity"`, keep the existing codepath unchanged. +3. Convert cli-core's `GenerateContentResponse` chunks to SSE-equivalent + processing via the existing `processStreamChunk` helper (or inline the + minimal equivalent — cli-core already parses the JSON). +4. Remove `GEMINI_CLI_HEADERS` constant (cli-core sets its own). +5. Keep `ANTIGRAVITY_*` constants for the antigravity path. +6. Update `streamSimpleGoogleGeminiCli` similarly. +7. Tests: + - Replace `global.fetch` mocks targeting `cloudcode-pa.googleapis.com` with + `CodeAssistServer` prototype mocks (`generateContentStream` returns a + mocked AsyncGenerator). + - Keep antigravity tests unchanged (still fetch-based). +8. Live smoke test against a `gemini-*` model in dr-repo or a scratch project, + confirm OAuth flow works, streaming response arrives, cost is reported. + +## Retry semantics + +cli-core's internal retry on `requestStreamingPost` handles 429/5xx with +exponential backoff and consults `Retry-After` headers. That subsumes the +existing `MAX_RETRIES` / `BASE_DELAY_MS` loop in SF for this provider. +Keep the loop for antigravity (different endpoint, different quirks). + +`extractRetryDelay` and `isRetryableError` helpers become antigravity-only. + +## Why this matters (recap) + +- **Free OAuth quota**: Google subsidises the official CLI's free tier. Our + requests blending in byte-for-byte preserves access. +- **Bot-detection resilience**: User-Agent / Client-Metadata drift risk goes + to zero — cli-core is the authoritative client. +- **Upstream improvements**: new tool formats, grounding, session caching, + quota displays ship via `npm update @google/gemini-cli-core`. +- **Our proxy becomes "the CLI, programmable"**: identical upstream behavior, + hookable local endpoint for any OpenAI-compatible tool.