HOA_Financial_Platform/docs/shadow-ai-benchmarking-plan.md

# Shadow AI Benchmarking Feature

## Context

The platform uses a single AI model (Qwen 3.5 via NVIDIA NIM) for three features: Operating Health Score, Reserve Health Score, and Investment Recommendations. The platform owner needs a way to evaluate alternate models (different providers, different versions) against the production model using real tenant data — without impacting users. This enables informed model migration decisions by comparing outputs side-by-side.

## Architecture Overview

- **New admin page** at `/admin/shadow-ai` with model configuration, run trigger, and history
- **New backend module** `shadow-ai` with controller, service, and 3 entities
- **3 new DB tables** in the `shared` schema for model configs, runs, and results
- **Shared AI caller utility** to avoid duplicating HTTP logic
- **Minimal changes** to existing services: make prompt-building methods public and export modules

---

## Phase 1: Shared AI Caller Utility

### New file: `backend/src/common/utils/ai-caller.ts`

Extract the HTTP POST logic (currently duplicated in both `callAI()` methods) into a reusable function:

```typescript
export async function callOpenAICompatible(params: {
  apiUrl: string;
  apiKey: string;
  model: string;
  messages: Array<{ role: string; content: string }>;
  temperature: number;
  maxTokens: number;
  timeoutMs?: number; // default 600000
}): Promise<{
  content: string;        // cleaned JSON string (fences + <think> stripped)
  usage?: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
  responseTimeMs: number;
}>
```

Handles: HTTPS POST to `{apiUrl}/chat/completions`, timeout, markdown fence stripping, `<think>` block removal, timing.

## Phase 2: Expose Existing Prompt Builders

### `backend/src/modules/health-scores/health-scores.service.ts`
- Change `private` → `public` on:
  - `gatherOperatingData(qr)` (line 252)
  - `gatherReserveData(qr)` (line 523)
  - `buildOperatingPrompt(data)` (line 790)
  - `buildReservePrompt(data)` (line 930)
  - `checkDataReadiness(qr, scoreType)` (used to validate data exists)

### `backend/src/modules/health-scores/health-scores.module.ts`
- Add `exports: [HealthScoresService]`

### `backend/src/modules/investment-planning/investment-planning.service.ts`
- Add new public method `buildPromptForSchema(schemaName: string)` that:
  1. Creates a query runner, sets `search_path` to the tenant schema
  2. Runs the same data-gathering queries (financial snapshot, market rates, monthly forecast) using the query runner directly (bypassing request-scoped `TenantService`)
  3. Calls the existing `buildPromptMessages()` with gathered data
  4. Returns `Array<{ role: string; content: string }>`
- Change `buildPromptMessages()` from `private` → `public` (line 880)

### `backend/src/modules/investment-planning/investment-planning.module.ts`
- Add `exports: [InvestmentPlanningService]`

## Phase 3: Database Tables & Entities

### 3 new tables in `shared` schema

**`shared.shadow_ai_models`** — Alternate model configurations (slots A and B)
| Column | Type | Notes |
|--------|------|-------|
| id | UUID PK | |
| slot | VARCHAR(10) | CHECK IN ('A', 'B'), UNIQUE |
| name | VARCHAR(100) | Display label |
| api_url | VARCHAR(500) | OpenAI-compatible endpoint |
| api_key | VARCHAR(500) | Bearer token |
| model_name | VARCHAR(200) | Model identifier |
| is_active | BOOLEAN | Default true |
| created_at | TIMESTAMPTZ | |
| updated_at | TIMESTAMPTZ | |

**`shared.shadow_runs`** — One row per comparison execution
| Column | Type | Notes |
|--------|------|-------|
| id | UUID PK | |
| tenant_id | UUID FK | → shared.organizations |
| feature | VARCHAR(30) | CHECK IN ('operating_health', 'reserve_health', 'investment_recommendations') |
| status | VARCHAR(20) | CHECK IN ('running', 'completed', 'partial', 'failed') |
| triggered_by | UUID FK | → shared.users |
| prompt_messages | JSONB | Exact messages sent to all models (proof of identical input) |
| started_at | TIMESTAMPTZ | |
| completed_at | TIMESTAMPTZ | |
| created_at | TIMESTAMPTZ | |

**`shared.shadow_run_results`** — One row per model per run (up to 3 per run)
| Column | Type | Notes |
|--------|------|-------|
| id | UUID PK | |
| run_id | UUID FK | → shadow_runs ON DELETE CASCADE |
| model_role | VARCHAR(20) | CHECK IN ('production', 'alternate_a', 'alternate_b'), UNIQUE(run_id, model_role) |
| model_name | VARCHAR(200) | Snapshot of model used |
| api_url | VARCHAR(500) | Snapshot of endpoint used |
| raw_response | TEXT | Unprocessed AI response |
| parsed_response | JSONB | Validated structured output |
| response_time_ms | INTEGER | |
| token_usage | JSONB | { prompt_tokens, completion_tokens, total_tokens } |
| status | VARCHAR(20) | CHECK IN ('pending', 'running', 'success', 'error') |
| error_message | TEXT | |
| created_at | TIMESTAMPTZ | |

### Entity files
- `backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts`
- `backend/src/modules/shadow-ai/entities/shadow-run.entity.ts`
- `backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts`

All use `@Entity({ schema: 'shared', name: '...' })` pattern.

## Phase 4: Shadow AI Backend Module

### New directory: `backend/src/modules/shadow-ai/`

### `shadow-ai.service.ts`

**Model CRUD:**
- `getModels()` — Return both slots, mask API keys (show last 4 chars)
- `upsertModel(slot, dto)` — INSERT/UPDATE config for slot A or B
- `deleteModel(slot)` — Remove model config

**Run Execution:**
- `triggerRun(tenantId, feature, userId)`:
  1. Look up tenant `schema_name` from `shared.organizations`
  2. Build prompt messages by calling the appropriate exposed method:
     - `operating_health`: Create query runner → set search_path → `healthScoresService.gatherOperatingData(qr)` → `healthScoresService.buildOperatingPrompt(data)`
     - `reserve_health`: Same pattern with reserve methods
     - `investment_recommendations`: `investmentPlanningService.buildPromptForSchema(schemaName)`
  3. Insert `shadow_runs` row with `prompt_messages` stored as JSONB
  4. Get production config from env vars, alternate configs from DB
  5. Insert 1-3 `shadow_run_results` rows as 'pending' (production + active alternates)
  6. Return `{ runId }` immediately
  7. Fire-and-forget: call all models in parallel using `callOpenAICompatible()`
     - Per feature: operating/reserve use temp 0.1, max_tokens 2048; investment uses temp 0.3, max_tokens 4096
  8. Update each result row as it completes (success/error, parsed response, timing)
  9. Update run status when all complete

**History:**
- `getRunHistory(page, limit, tenantFilter?, featureFilter?)` — Paginated list with tenant name JOIN
- `getRunDetail(runId)` — Full run + all results

### `shadow-ai.controller.ts`

All endpoints use `@UseGuards(JwtAuthGuard)` + `requireSuperadmin(req)` pattern from `admin.controller.ts`.

| Method | Path | Body/Params |
|--------|------|-------------|
| GET | `/admin/shadow-ai/models` | — |
| PUT | `/admin/shadow-ai/models/:slot` | `{ name, apiUrl, apiKey, modelName, isActive }` |
| DELETE | `/admin/shadow-ai/models/:slot` | — |
| POST | `/admin/shadow-ai/runs` | `{ tenantId, feature }` |
| GET | `/admin/shadow-ai/runs` | `?page&limit&tenantId&feature` |
| GET | `/admin/shadow-ai/runs/:id` | — |

### `shadow-ai.module.ts`

```typescript
@Module({
  imports: [
    TypeOrmModule.forFeature([ShadowAiModel, ShadowRun, ShadowRunResult]),
    HealthScoresModule,
    InvestmentPlanningModule,
    UsersModule,
  ],
  controllers: [ShadowAiController],
  providers: [ShadowAiService],
})
```

### Register in `backend/src/app.module.ts`
- Add `import { ShadowAiModule }` and include in the `imports` array

## Phase 5: Frontend — Admin Shadow AI Page

### New file: `frontend/src/pages/admin/AdminShadowAiPage.tsx`

**Layout**: Mantine `Tabs` with 3 tabs

#### Tab 1: "Model Configuration"
- Three `Card` components in a `SimpleGrid cols={3}`:
  - **Production** (read-only): Shows model name, API URL from a dedicated endpoint or hardcoded label "From environment config"
  - **Alternate A**: Form with `TextInput` (name, API URL, model name), `PasswordInput` (API key), `Switch` (active), Save/Delete buttons
  - **Alternate B**: Same form
- Fetches via `GET /api/admin/shadow-ai/models`
- Saves via `PUT /api/admin/shadow-ai/models/A` or `/B`

#### Tab 2: "Run Comparison"
- `Select` dropdown for tenant (reuse `GET /api/admin/organizations` already used by AdminPage)
- `Select` for feature type (Operating Health / Reserve Health / Investment Recommendations)
- `Button` "Run Shadow Comparison"
- On trigger: `POST /api/admin/shadow-ai/runs` → get `runId`
- Poll `GET /api/admin/shadow-ai/runs/:id` every 3s via `refetchInterval` until status !== 'running'
- Show per-model progress indicators during run
- Once complete, render results using shared comparison component (below)

#### Tab 3: "History"
- `Table` with columns: Date, Tenant, Feature, Status (Badge), Duration
- Filter controls: tenant Select, feature Select
- Click row → expand detail or modal showing full comparison
- Pagination

#### Shared Component: Side-by-Side Results Display
- `SimpleGrid cols={3}` (or fewer columns if only some models were configured)
- Each column:
  - Header: model name + response time `Badge`
  - **For health scores**: Score with `RingProgress`, label `Badge`, summary text, factors list (color-coded by impact), recommendations list (color-coded by priority)
  - **For investment**: Overall assessment text, recommendation cards with type/priority badges, risk notes
  - Collapsible raw JSON via `Accordion`
- **Diff highlighting**: Where parsed values differ across models, apply subtle background highlight (e.g., `yellow.0` in Mantine theme). Simple recursive comparison of JSON keys/values.

### Route addition: `frontend/src/App.tsx`
Within the `/admin` route group (after `<Route index element={<AdminPage />} />`):
```tsx
<Route path="shadow-ai" element={<AdminShadowAiPage />} />
```

### Sidebar nav: `frontend/src/components/layout/Sidebar.tsx`
In the `isAdminOnly` section (after the "Admin Panel" NavLink, around line 134):
```tsx
<NavLink
  label="AI Benchmarking"
  leftSection={<IconScale size={18} />}
  active={location.pathname === '/admin/shadow-ai'}
  onClick={() => go('/admin/shadow-ai')}
  color="violet"
/>
```

## Implementation Order

1. **`ai-caller.ts`** — Shared utility (no dependencies)
2. **Health scores + investment planning** — Make methods public, add exports, add `buildPromptForSchema`
3. **Entities** — 3 TypeORM entity files
4. **Service + Controller + Module** — Shadow AI backend
5. **Register module** in `app.module.ts`
6. **Frontend page** — `AdminShadowAiPage.tsx`
7. **Route + Sidebar** — Wire up navigation

## Verification

1. **Backend**: Start server, confirm no TypeORM errors for new entities
2. **Model config**: Use admin UI to save/load/delete alternate model configs
3. **Run comparison**: Select a tenant, trigger a run, verify all 3 models are called with identical prompts
4. **Results display**: Confirm side-by-side output renders correctly for all 3 feature types
5. **History**: Verify past runs are persisted and browsable
6. **Auth**: Confirm non-superadmin users get 403 on all shadow-ai endpoints
7. **Production safety**: Verify no changes to production AI behavior — shadow runs are completely isolated

## Key Files to Modify

- `backend/src/modules/health-scores/health-scores.service.ts` — Make 5 methods public
- `backend/src/modules/health-scores/health-scores.module.ts` — Add exports
- `backend/src/modules/investment-planning/investment-planning.service.ts` — Add `buildPromptForSchema()`, make `buildPromptMessages()` public
- `backend/src/modules/investment-planning/investment-planning.module.ts` — Add exports
- `backend/src/app.module.ts` — Register ShadowAiModule
- `frontend/src/App.tsx` — Add route
- `frontend/src/components/layout/Sidebar.tsx` — Add nav item

## New Files

- `backend/src/common/utils/ai-caller.ts`
- `backend/src/modules/shadow-ai/shadow-ai.module.ts`
- `backend/src/modules/shadow-ai/shadow-ai.service.ts`
- `backend/src/modules/shadow-ai/shadow-ai.controller.ts`
- `backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts`
- `backend/src/modules/shadow-ai/entities/shadow-run.entity.ts`
- `backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts`
- `frontend/src/pages/admin/AdminShadowAiPage.tsx`