# Shadow AI Benchmarking Feature ## Context The platform uses a single AI model (Qwen 3.5 via NVIDIA NIM) for three features: Operating Health Score, Reserve Health Score, and Investment Recommendations. The platform owner needs a way to evaluate alternate models (different providers, different versions) against the production model using real tenant data — without impacting users. This enables informed model migration decisions by comparing outputs side-by-side. ## Architecture Overview - **New admin page** at `/admin/shadow-ai` with model configuration, run trigger, and history - **New backend module** `shadow-ai` with controller, service, and 3 entities - **3 new DB tables** in the `shared` schema for model configs, runs, and results - **Shared AI caller utility** to avoid duplicating HTTP logic - **Minimal changes** to existing services: make prompt-building methods public and export modules --- ## Phase 1: Shared AI Caller Utility ### New file: `backend/src/common/utils/ai-caller.ts` Extract the HTTP POST logic (currently duplicated in both `callAI()` methods) into a reusable function: ```typescript export async function callOpenAICompatible(params: { apiUrl: string; apiKey: string; model: string; messages: Array<{ role: string; content: string }>; temperature: number; maxTokens: number; timeoutMs?: number; // default 600000 }): Promise<{ content: string; // cleaned JSON string (fences + stripped) usage?: { prompt_tokens: number; completion_tokens: number; total_tokens: number }; responseTimeMs: number; }> ``` Handles: HTTPS POST to `{apiUrl}/chat/completions`, timeout, markdown fence stripping, `` block removal, timing. ## Phase 2: Expose Existing Prompt Builders ### `backend/src/modules/health-scores/health-scores.service.ts` - Change `private` → `public` on: - `gatherOperatingData(qr)` (line 252) - `gatherReserveData(qr)` (line 523) - `buildOperatingPrompt(data)` (line 790) - `buildReservePrompt(data)` (line 930) - `checkDataReadiness(qr, scoreType)` (used to validate data exists) ### `backend/src/modules/health-scores/health-scores.module.ts` - Add `exports: [HealthScoresService]` ### `backend/src/modules/investment-planning/investment-planning.service.ts` - Add new public method `buildPromptForSchema(schemaName: string)` that: 1. Creates a query runner, sets `search_path` to the tenant schema 2. Runs the same data-gathering queries (financial snapshot, market rates, monthly forecast) using the query runner directly (bypassing request-scoped `TenantService`) 3. Calls the existing `buildPromptMessages()` with gathered data 4. Returns `Array<{ role: string; content: string }>` - Change `buildPromptMessages()` from `private` → `public` (line 880) ### `backend/src/modules/investment-planning/investment-planning.module.ts` - Add `exports: [InvestmentPlanningService]` ## Phase 3: Database Tables & Entities ### 3 new tables in `shared` schema **`shared.shadow_ai_models`** — Alternate model configurations (slots A and B) | Column | Type | Notes | |--------|------|-------| | id | UUID PK | | | slot | VARCHAR(10) | CHECK IN ('A', 'B'), UNIQUE | | name | VARCHAR(100) | Display label | | api_url | VARCHAR(500) | OpenAI-compatible endpoint | | api_key | VARCHAR(500) | Bearer token | | model_name | VARCHAR(200) | Model identifier | | is_active | BOOLEAN | Default true | | created_at | TIMESTAMPTZ | | | updated_at | TIMESTAMPTZ | | **`shared.shadow_runs`** — One row per comparison execution | Column | Type | Notes | |--------|------|-------| | id | UUID PK | | | tenant_id | UUID FK | → shared.organizations | | feature | VARCHAR(30) | CHECK IN ('operating_health', 'reserve_health', 'investment_recommendations') | | status | VARCHAR(20) | CHECK IN ('running', 'completed', 'partial', 'failed') | | triggered_by | UUID FK | → shared.users | | prompt_messages | JSONB | Exact messages sent to all models (proof of identical input) | | started_at | TIMESTAMPTZ | | | completed_at | TIMESTAMPTZ | | | created_at | TIMESTAMPTZ | | **`shared.shadow_run_results`** — One row per model per run (up to 3 per run) | Column | Type | Notes | |--------|------|-------| | id | UUID PK | | | run_id | UUID FK | → shadow_runs ON DELETE CASCADE | | model_role | VARCHAR(20) | CHECK IN ('production', 'alternate_a', 'alternate_b'), UNIQUE(run_id, model_role) | | model_name | VARCHAR(200) | Snapshot of model used | | api_url | VARCHAR(500) | Snapshot of endpoint used | | raw_response | TEXT | Unprocessed AI response | | parsed_response | JSONB | Validated structured output | | response_time_ms | INTEGER | | | token_usage | JSONB | { prompt_tokens, completion_tokens, total_tokens } | | status | VARCHAR(20) | CHECK IN ('pending', 'running', 'success', 'error') | | error_message | TEXT | | | created_at | TIMESTAMPTZ | | ### Entity files - `backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts` - `backend/src/modules/shadow-ai/entities/shadow-run.entity.ts` - `backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts` All use `@Entity({ schema: 'shared', name: '...' })` pattern. ## Phase 4: Shadow AI Backend Module ### New directory: `backend/src/modules/shadow-ai/` ### `shadow-ai.service.ts` **Model CRUD:** - `getModels()` — Return both slots, mask API keys (show last 4 chars) - `upsertModel(slot, dto)` — INSERT/UPDATE config for slot A or B - `deleteModel(slot)` — Remove model config **Run Execution:** - `triggerRun(tenantId, feature, userId)`: 1. Look up tenant `schema_name` from `shared.organizations` 2. Build prompt messages by calling the appropriate exposed method: - `operating_health`: Create query runner → set search_path → `healthScoresService.gatherOperatingData(qr)` → `healthScoresService.buildOperatingPrompt(data)` - `reserve_health`: Same pattern with reserve methods - `investment_recommendations`: `investmentPlanningService.buildPromptForSchema(schemaName)` 3. Insert `shadow_runs` row with `prompt_messages` stored as JSONB 4. Get production config from env vars, alternate configs from DB 5. Insert 1-3 `shadow_run_results` rows as 'pending' (production + active alternates) 6. Return `{ runId }` immediately 7. Fire-and-forget: call all models in parallel using `callOpenAICompatible()` - Per feature: operating/reserve use temp 0.1, max_tokens 2048; investment uses temp 0.3, max_tokens 4096 8. Update each result row as it completes (success/error, parsed response, timing) 9. Update run status when all complete **History:** - `getRunHistory(page, limit, tenantFilter?, featureFilter?)` — Paginated list with tenant name JOIN - `getRunDetail(runId)` — Full run + all results ### `shadow-ai.controller.ts` All endpoints use `@UseGuards(JwtAuthGuard)` + `requireSuperadmin(req)` pattern from `admin.controller.ts`. | Method | Path | Body/Params | |--------|------|-------------| | GET | `/admin/shadow-ai/models` | — | | PUT | `/admin/shadow-ai/models/:slot` | `{ name, apiUrl, apiKey, modelName, isActive }` | | DELETE | `/admin/shadow-ai/models/:slot` | — | | POST | `/admin/shadow-ai/runs` | `{ tenantId, feature }` | | GET | `/admin/shadow-ai/runs` | `?page&limit&tenantId&feature` | | GET | `/admin/shadow-ai/runs/:id` | — | ### `shadow-ai.module.ts` ```typescript @Module({ imports: [ TypeOrmModule.forFeature([ShadowAiModel, ShadowRun, ShadowRunResult]), HealthScoresModule, InvestmentPlanningModule, UsersModule, ], controllers: [ShadowAiController], providers: [ShadowAiService], }) ``` ### Register in `backend/src/app.module.ts` - Add `import { ShadowAiModule }` and include in the `imports` array ## Phase 5: Frontend — Admin Shadow AI Page ### New file: `frontend/src/pages/admin/AdminShadowAiPage.tsx` **Layout**: Mantine `Tabs` with 3 tabs #### Tab 1: "Model Configuration" - Three `Card` components in a `SimpleGrid cols={3}`: - **Production** (read-only): Shows model name, API URL from a dedicated endpoint or hardcoded label "From environment config" - **Alternate A**: Form with `TextInput` (name, API URL, model name), `PasswordInput` (API key), `Switch` (active), Save/Delete buttons - **Alternate B**: Same form - Fetches via `GET /api/admin/shadow-ai/models` - Saves via `PUT /api/admin/shadow-ai/models/A` or `/B` #### Tab 2: "Run Comparison" - `Select` dropdown for tenant (reuse `GET /api/admin/organizations` already used by AdminPage) - `Select` for feature type (Operating Health / Reserve Health / Investment Recommendations) - `Button` "Run Shadow Comparison" - On trigger: `POST /api/admin/shadow-ai/runs` → get `runId` - Poll `GET /api/admin/shadow-ai/runs/:id` every 3s via `refetchInterval` until status !== 'running' - Show per-model progress indicators during run - Once complete, render results using shared comparison component (below) #### Tab 3: "History" - `Table` with columns: Date, Tenant, Feature, Status (Badge), Duration - Filter controls: tenant Select, feature Select - Click row → expand detail or modal showing full comparison - Pagination #### Shared Component: Side-by-Side Results Display - `SimpleGrid cols={3}` (or fewer columns if only some models were configured) - Each column: - Header: model name + response time `Badge` - **For health scores**: Score with `RingProgress`, label `Badge`, summary text, factors list (color-coded by impact), recommendations list (color-coded by priority) - **For investment**: Overall assessment text, recommendation cards with type/priority badges, risk notes - Collapsible raw JSON via `Accordion` - **Diff highlighting**: Where parsed values differ across models, apply subtle background highlight (e.g., `yellow.0` in Mantine theme). Simple recursive comparison of JSON keys/values. ### Route addition: `frontend/src/App.tsx` Within the `/admin` route group (after `} />`): ```tsx } /> ``` ### Sidebar nav: `frontend/src/components/layout/Sidebar.tsx` In the `isAdminOnly` section (after the "Admin Panel" NavLink, around line 134): ```tsx } active={location.pathname === '/admin/shadow-ai'} onClick={() => go('/admin/shadow-ai')} color="violet" /> ``` ## Implementation Order 1. **`ai-caller.ts`** — Shared utility (no dependencies) 2. **Health scores + investment planning** — Make methods public, add exports, add `buildPromptForSchema` 3. **Entities** — 3 TypeORM entity files 4. **Service + Controller + Module** — Shadow AI backend 5. **Register module** in `app.module.ts` 6. **Frontend page** — `AdminShadowAiPage.tsx` 7. **Route + Sidebar** — Wire up navigation ## Verification 1. **Backend**: Start server, confirm no TypeORM errors for new entities 2. **Model config**: Use admin UI to save/load/delete alternate model configs 3. **Run comparison**: Select a tenant, trigger a run, verify all 3 models are called with identical prompts 4. **Results display**: Confirm side-by-side output renders correctly for all 3 feature types 5. **History**: Verify past runs are persisted and browsable 6. **Auth**: Confirm non-superadmin users get 403 on all shadow-ai endpoints 7. **Production safety**: Verify no changes to production AI behavior — shadow runs are completely isolated ## Key Files to Modify - `backend/src/modules/health-scores/health-scores.service.ts` — Make 5 methods public - `backend/src/modules/health-scores/health-scores.module.ts` — Add exports - `backend/src/modules/investment-planning/investment-planning.service.ts` — Add `buildPromptForSchema()`, make `buildPromptMessages()` public - `backend/src/modules/investment-planning/investment-planning.module.ts` — Add exports - `backend/src/app.module.ts` — Register ShadowAiModule - `frontend/src/App.tsx` — Add route - `frontend/src/components/layout/Sidebar.tsx` — Add nav item ## New Files - `backend/src/common/utils/ai-caller.ts` - `backend/src/modules/shadow-ai/shadow-ai.module.ts` - `backend/src/modules/shadow-ai/shadow-ai.service.ts` - `backend/src/modules/shadow-ai/shadow-ai.controller.ts` - `backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts` - `backend/src/modules/shadow-ai/entities/shadow-run.entity.ts` - `backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts` - `frontend/src/pages/admin/AdminShadowAiPage.tsx`