Add a new admin-only feature that allows the platform owner to benchmark the production AI model against up to 2 alternate models (any OpenAI-compatible API) using real tenant data, without impacting users. Backend: - Shared AI caller utility (ai-caller.ts) for OpenAI-compatible endpoints - Shadow AI module with service, controller, and 3 entities - 6 admin API endpoints for model config CRUD, run trigger, and history - Auto-creates shadow_ai_models, shadow_runs, shadow_run_results tables - Exposes health-scores and investment-planning prompt builders for reuse Frontend: - New admin page at /admin/shadow-ai with 3 tabs: - Model Configuration (production + 2 alternate slots) - Run Comparison (tenant select, feature select, side-by-side results) - History (filterable run log with detail drill-down) - Full side-by-side output display with diff highlighting - Sidebar navigation link for AI Benchmarking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
276 lines
12 KiB
Markdown
276 lines
12 KiB
Markdown
# Shadow AI Benchmarking Feature
|
|
|
|
## Context
|
|
|
|
The platform uses a single AI model (Qwen 3.5 via NVIDIA NIM) for three features: Operating Health Score, Reserve Health Score, and Investment Recommendations. The platform owner needs a way to evaluate alternate models (different providers, different versions) against the production model using real tenant data — without impacting users. This enables informed model migration decisions by comparing outputs side-by-side.
|
|
|
|
## Architecture Overview
|
|
|
|
- **New admin page** at `/admin/shadow-ai` with model configuration, run trigger, and history
|
|
- **New backend module** `shadow-ai` with controller, service, and 3 entities
|
|
- **3 new DB tables** in the `shared` schema for model configs, runs, and results
|
|
- **Shared AI caller utility** to avoid duplicating HTTP logic
|
|
- **Minimal changes** to existing services: make prompt-building methods public and export modules
|
|
|
|
---
|
|
|
|
## Phase 1: Shared AI Caller Utility
|
|
|
|
### New file: `backend/src/common/utils/ai-caller.ts`
|
|
|
|
Extract the HTTP POST logic (currently duplicated in both `callAI()` methods) into a reusable function:
|
|
|
|
```typescript
|
|
export async function callOpenAICompatible(params: {
|
|
apiUrl: string;
|
|
apiKey: string;
|
|
model: string;
|
|
messages: Array<{ role: string; content: string }>;
|
|
temperature: number;
|
|
maxTokens: number;
|
|
timeoutMs?: number; // default 600000
|
|
}): Promise<{
|
|
content: string; // cleaned JSON string (fences + <think> stripped)
|
|
usage?: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
|
|
responseTimeMs: number;
|
|
}>
|
|
```
|
|
|
|
Handles: HTTPS POST to `{apiUrl}/chat/completions`, timeout, markdown fence stripping, `<think>` block removal, timing.
|
|
|
|
## Phase 2: Expose Existing Prompt Builders
|
|
|
|
### `backend/src/modules/health-scores/health-scores.service.ts`
|
|
- Change `private` → `public` on:
|
|
- `gatherOperatingData(qr)` (line 252)
|
|
- `gatherReserveData(qr)` (line 523)
|
|
- `buildOperatingPrompt(data)` (line 790)
|
|
- `buildReservePrompt(data)` (line 930)
|
|
- `checkDataReadiness(qr, scoreType)` (used to validate data exists)
|
|
|
|
### `backend/src/modules/health-scores/health-scores.module.ts`
|
|
- Add `exports: [HealthScoresService]`
|
|
|
|
### `backend/src/modules/investment-planning/investment-planning.service.ts`
|
|
- Add new public method `buildPromptForSchema(schemaName: string)` that:
|
|
1. Creates a query runner, sets `search_path` to the tenant schema
|
|
2. Runs the same data-gathering queries (financial snapshot, market rates, monthly forecast) using the query runner directly (bypassing request-scoped `TenantService`)
|
|
3. Calls the existing `buildPromptMessages()` with gathered data
|
|
4. Returns `Array<{ role: string; content: string }>`
|
|
- Change `buildPromptMessages()` from `private` → `public` (line 880)
|
|
|
|
### `backend/src/modules/investment-planning/investment-planning.module.ts`
|
|
- Add `exports: [InvestmentPlanningService]`
|
|
|
|
## Phase 3: Database Tables & Entities
|
|
|
|
### 3 new tables in `shared` schema
|
|
|
|
**`shared.shadow_ai_models`** — Alternate model configurations (slots A and B)
|
|
| Column | Type | Notes |
|
|
|--------|------|-------|
|
|
| id | UUID PK | |
|
|
| slot | VARCHAR(10) | CHECK IN ('A', 'B'), UNIQUE |
|
|
| name | VARCHAR(100) | Display label |
|
|
| api_url | VARCHAR(500) | OpenAI-compatible endpoint |
|
|
| api_key | VARCHAR(500) | Bearer token |
|
|
| model_name | VARCHAR(200) | Model identifier |
|
|
| is_active | BOOLEAN | Default true |
|
|
| created_at | TIMESTAMPTZ | |
|
|
| updated_at | TIMESTAMPTZ | |
|
|
|
|
**`shared.shadow_runs`** — One row per comparison execution
|
|
| Column | Type | Notes |
|
|
|--------|------|-------|
|
|
| id | UUID PK | |
|
|
| tenant_id | UUID FK | → shared.organizations |
|
|
| feature | VARCHAR(30) | CHECK IN ('operating_health', 'reserve_health', 'investment_recommendations') |
|
|
| status | VARCHAR(20) | CHECK IN ('running', 'completed', 'partial', 'failed') |
|
|
| triggered_by | UUID FK | → shared.users |
|
|
| prompt_messages | JSONB | Exact messages sent to all models (proof of identical input) |
|
|
| started_at | TIMESTAMPTZ | |
|
|
| completed_at | TIMESTAMPTZ | |
|
|
| created_at | TIMESTAMPTZ | |
|
|
|
|
**`shared.shadow_run_results`** — One row per model per run (up to 3 per run)
|
|
| Column | Type | Notes |
|
|
|--------|------|-------|
|
|
| id | UUID PK | |
|
|
| run_id | UUID FK | → shadow_runs ON DELETE CASCADE |
|
|
| model_role | VARCHAR(20) | CHECK IN ('production', 'alternate_a', 'alternate_b'), UNIQUE(run_id, model_role) |
|
|
| model_name | VARCHAR(200) | Snapshot of model used |
|
|
| api_url | VARCHAR(500) | Snapshot of endpoint used |
|
|
| raw_response | TEXT | Unprocessed AI response |
|
|
| parsed_response | JSONB | Validated structured output |
|
|
| response_time_ms | INTEGER | |
|
|
| token_usage | JSONB | { prompt_tokens, completion_tokens, total_tokens } |
|
|
| status | VARCHAR(20) | CHECK IN ('pending', 'running', 'success', 'error') |
|
|
| error_message | TEXT | |
|
|
| created_at | TIMESTAMPTZ | |
|
|
|
|
### Entity files
|
|
- `backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts`
|
|
- `backend/src/modules/shadow-ai/entities/shadow-run.entity.ts`
|
|
- `backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts`
|
|
|
|
All use `@Entity({ schema: 'shared', name: '...' })` pattern.
|
|
|
|
## Phase 4: Shadow AI Backend Module
|
|
|
|
### New directory: `backend/src/modules/shadow-ai/`
|
|
|
|
### `shadow-ai.service.ts`
|
|
|
|
**Model CRUD:**
|
|
- `getModels()` — Return both slots, mask API keys (show last 4 chars)
|
|
- `upsertModel(slot, dto)` — INSERT/UPDATE config for slot A or B
|
|
- `deleteModel(slot)` — Remove model config
|
|
|
|
**Run Execution:**
|
|
- `triggerRun(tenantId, feature, userId)`:
|
|
1. Look up tenant `schema_name` from `shared.organizations`
|
|
2. Build prompt messages by calling the appropriate exposed method:
|
|
- `operating_health`: Create query runner → set search_path → `healthScoresService.gatherOperatingData(qr)` → `healthScoresService.buildOperatingPrompt(data)`
|
|
- `reserve_health`: Same pattern with reserve methods
|
|
- `investment_recommendations`: `investmentPlanningService.buildPromptForSchema(schemaName)`
|
|
3. Insert `shadow_runs` row with `prompt_messages` stored as JSONB
|
|
4. Get production config from env vars, alternate configs from DB
|
|
5. Insert 1-3 `shadow_run_results` rows as 'pending' (production + active alternates)
|
|
6. Return `{ runId }` immediately
|
|
7. Fire-and-forget: call all models in parallel using `callOpenAICompatible()`
|
|
- Per feature: operating/reserve use temp 0.1, max_tokens 2048; investment uses temp 0.3, max_tokens 4096
|
|
8. Update each result row as it completes (success/error, parsed response, timing)
|
|
9. Update run status when all complete
|
|
|
|
**History:**
|
|
- `getRunHistory(page, limit, tenantFilter?, featureFilter?)` — Paginated list with tenant name JOIN
|
|
- `getRunDetail(runId)` — Full run + all results
|
|
|
|
### `shadow-ai.controller.ts`
|
|
|
|
All endpoints use `@UseGuards(JwtAuthGuard)` + `requireSuperadmin(req)` pattern from `admin.controller.ts`.
|
|
|
|
| Method | Path | Body/Params |
|
|
|--------|------|-------------|
|
|
| GET | `/admin/shadow-ai/models` | — |
|
|
| PUT | `/admin/shadow-ai/models/:slot` | `{ name, apiUrl, apiKey, modelName, isActive }` |
|
|
| DELETE | `/admin/shadow-ai/models/:slot` | — |
|
|
| POST | `/admin/shadow-ai/runs` | `{ tenantId, feature }` |
|
|
| GET | `/admin/shadow-ai/runs` | `?page&limit&tenantId&feature` |
|
|
| GET | `/admin/shadow-ai/runs/:id` | — |
|
|
|
|
### `shadow-ai.module.ts`
|
|
|
|
```typescript
|
|
@Module({
|
|
imports: [
|
|
TypeOrmModule.forFeature([ShadowAiModel, ShadowRun, ShadowRunResult]),
|
|
HealthScoresModule,
|
|
InvestmentPlanningModule,
|
|
UsersModule,
|
|
],
|
|
controllers: [ShadowAiController],
|
|
providers: [ShadowAiService],
|
|
})
|
|
```
|
|
|
|
### Register in `backend/src/app.module.ts`
|
|
- Add `import { ShadowAiModule }` and include in the `imports` array
|
|
|
|
## Phase 5: Frontend — Admin Shadow AI Page
|
|
|
|
### New file: `frontend/src/pages/admin/AdminShadowAiPage.tsx`
|
|
|
|
**Layout**: Mantine `Tabs` with 3 tabs
|
|
|
|
#### Tab 1: "Model Configuration"
|
|
- Three `Card` components in a `SimpleGrid cols={3}`:
|
|
- **Production** (read-only): Shows model name, API URL from a dedicated endpoint or hardcoded label "From environment config"
|
|
- **Alternate A**: Form with `TextInput` (name, API URL, model name), `PasswordInput` (API key), `Switch` (active), Save/Delete buttons
|
|
- **Alternate B**: Same form
|
|
- Fetches via `GET /api/admin/shadow-ai/models`
|
|
- Saves via `PUT /api/admin/shadow-ai/models/A` or `/B`
|
|
|
|
#### Tab 2: "Run Comparison"
|
|
- `Select` dropdown for tenant (reuse `GET /api/admin/organizations` already used by AdminPage)
|
|
- `Select` for feature type (Operating Health / Reserve Health / Investment Recommendations)
|
|
- `Button` "Run Shadow Comparison"
|
|
- On trigger: `POST /api/admin/shadow-ai/runs` → get `runId`
|
|
- Poll `GET /api/admin/shadow-ai/runs/:id` every 3s via `refetchInterval` until status !== 'running'
|
|
- Show per-model progress indicators during run
|
|
- Once complete, render results using shared comparison component (below)
|
|
|
|
#### Tab 3: "History"
|
|
- `Table` with columns: Date, Tenant, Feature, Status (Badge), Duration
|
|
- Filter controls: tenant Select, feature Select
|
|
- Click row → expand detail or modal showing full comparison
|
|
- Pagination
|
|
|
|
#### Shared Component: Side-by-Side Results Display
|
|
- `SimpleGrid cols={3}` (or fewer columns if only some models were configured)
|
|
- Each column:
|
|
- Header: model name + response time `Badge`
|
|
- **For health scores**: Score with `RingProgress`, label `Badge`, summary text, factors list (color-coded by impact), recommendations list (color-coded by priority)
|
|
- **For investment**: Overall assessment text, recommendation cards with type/priority badges, risk notes
|
|
- Collapsible raw JSON via `Accordion`
|
|
- **Diff highlighting**: Where parsed values differ across models, apply subtle background highlight (e.g., `yellow.0` in Mantine theme). Simple recursive comparison of JSON keys/values.
|
|
|
|
### Route addition: `frontend/src/App.tsx`
|
|
Within the `/admin` route group (after `<Route index element={<AdminPage />} />`):
|
|
```tsx
|
|
<Route path="shadow-ai" element={<AdminShadowAiPage />} />
|
|
```
|
|
|
|
### Sidebar nav: `frontend/src/components/layout/Sidebar.tsx`
|
|
In the `isAdminOnly` section (after the "Admin Panel" NavLink, around line 134):
|
|
```tsx
|
|
<NavLink
|
|
label="AI Benchmarking"
|
|
leftSection={<IconScale size={18} />}
|
|
active={location.pathname === '/admin/shadow-ai'}
|
|
onClick={() => go('/admin/shadow-ai')}
|
|
color="violet"
|
|
/>
|
|
```
|
|
|
|
## Implementation Order
|
|
|
|
1. **`ai-caller.ts`** — Shared utility (no dependencies)
|
|
2. **Health scores + investment planning** — Make methods public, add exports, add `buildPromptForSchema`
|
|
3. **Entities** — 3 TypeORM entity files
|
|
4. **Service + Controller + Module** — Shadow AI backend
|
|
5. **Register module** in `app.module.ts`
|
|
6. **Frontend page** — `AdminShadowAiPage.tsx`
|
|
7. **Route + Sidebar** — Wire up navigation
|
|
|
|
## Verification
|
|
|
|
1. **Backend**: Start server, confirm no TypeORM errors for new entities
|
|
2. **Model config**: Use admin UI to save/load/delete alternate model configs
|
|
3. **Run comparison**: Select a tenant, trigger a run, verify all 3 models are called with identical prompts
|
|
4. **Results display**: Confirm side-by-side output renders correctly for all 3 feature types
|
|
5. **History**: Verify past runs are persisted and browsable
|
|
6. **Auth**: Confirm non-superadmin users get 403 on all shadow-ai endpoints
|
|
7. **Production safety**: Verify no changes to production AI behavior — shadow runs are completely isolated
|
|
|
|
## Key Files to Modify
|
|
|
|
- `backend/src/modules/health-scores/health-scores.service.ts` — Make 5 methods public
|
|
- `backend/src/modules/health-scores/health-scores.module.ts` — Add exports
|
|
- `backend/src/modules/investment-planning/investment-planning.service.ts` — Add `buildPromptForSchema()`, make `buildPromptMessages()` public
|
|
- `backend/src/modules/investment-planning/investment-planning.module.ts` — Add exports
|
|
- `backend/src/app.module.ts` — Register ShadowAiModule
|
|
- `frontend/src/App.tsx` — Add route
|
|
- `frontend/src/components/layout/Sidebar.tsx` — Add nav item
|
|
|
|
## New Files
|
|
|
|
- `backend/src/common/utils/ai-caller.ts`
|
|
- `backend/src/modules/shadow-ai/shadow-ai.module.ts`
|
|
- `backend/src/modules/shadow-ai/shadow-ai.service.ts`
|
|
- `backend/src/modules/shadow-ai/shadow-ai.controller.ts`
|
|
- `backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts`
|
|
- `backend/src/modules/shadow-ai/entities/shadow-run.entity.ts`
|
|
- `backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts`
|
|
- `frontend/src/pages/admin/AdminShadowAiPage.tsx`
|