Add a new admin-only feature that allows the platform owner to benchmark the production AI model against up to 2 alternate models (any OpenAI-compatible API) using real tenant data, without impacting users. Backend: - Shared AI caller utility (ai-caller.ts) for OpenAI-compatible endpoints - Shadow AI module with service, controller, and 3 entities - 6 admin API endpoints for model config CRUD, run trigger, and history - Auto-creates shadow_ai_models, shadow_runs, shadow_run_results tables - Exposes health-scores and investment-planning prompt builders for reuse Frontend: - New admin page at /admin/shadow-ai with 3 tabs: - Model Configuration (production + 2 alternate slots) - Run Comparison (tenant select, feature select, side-by-side results) - History (filterable run log with detail drill-down) - Full side-by-side output display with diff highlighting - Sidebar navigation link for AI Benchmarking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
Shadow AI Benchmarking Feature
Context
The platform uses a single AI model (Qwen 3.5 via NVIDIA NIM) for three features: Operating Health Score, Reserve Health Score, and Investment Recommendations. The platform owner needs a way to evaluate alternate models (different providers, different versions) against the production model using real tenant data — without impacting users. This enables informed model migration decisions by comparing outputs side-by-side.
Architecture Overview
- New admin page at
/admin/shadow-aiwith model configuration, run trigger, and history - New backend module
shadow-aiwith controller, service, and 3 entities - 3 new DB tables in the
sharedschema for model configs, runs, and results - Shared AI caller utility to avoid duplicating HTTP logic
- Minimal changes to existing services: make prompt-building methods public and export modules
Phase 1: Shared AI Caller Utility
New file: backend/src/common/utils/ai-caller.ts
Extract the HTTP POST logic (currently duplicated in both callAI() methods) into a reusable function:
export async function callOpenAICompatible(params: {
apiUrl: string;
apiKey: string;
model: string;
messages: Array<{ role: string; content: string }>;
temperature: number;
maxTokens: number;
timeoutMs?: number; // default 600000
}): Promise<{
content: string; // cleaned JSON string (fences + <think> stripped)
usage?: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
responseTimeMs: number;
}>
Handles: HTTPS POST to {apiUrl}/chat/completions, timeout, markdown fence stripping, <think> block removal, timing.
Phase 2: Expose Existing Prompt Builders
backend/src/modules/health-scores/health-scores.service.ts
- Change
private→publicon:gatherOperatingData(qr)(line 252)gatherReserveData(qr)(line 523)buildOperatingPrompt(data)(line 790)buildReservePrompt(data)(line 930)checkDataReadiness(qr, scoreType)(used to validate data exists)
backend/src/modules/health-scores/health-scores.module.ts
- Add
exports: [HealthScoresService]
backend/src/modules/investment-planning/investment-planning.service.ts
- Add new public method
buildPromptForSchema(schemaName: string)that:- Creates a query runner, sets
search_pathto the tenant schema - Runs the same data-gathering queries (financial snapshot, market rates, monthly forecast) using the query runner directly (bypassing request-scoped
TenantService) - Calls the existing
buildPromptMessages()with gathered data - Returns
Array<{ role: string; content: string }>
- Creates a query runner, sets
- Change
buildPromptMessages()fromprivate→public(line 880)
backend/src/modules/investment-planning/investment-planning.module.ts
- Add
exports: [InvestmentPlanningService]
Phase 3: Database Tables & Entities
3 new tables in shared schema
shared.shadow_ai_models — Alternate model configurations (slots A and B)
| Column | Type | Notes |
|---|---|---|
| id | UUID PK | |
| slot | VARCHAR(10) | CHECK IN ('A', 'B'), UNIQUE |
| name | VARCHAR(100) | Display label |
| api_url | VARCHAR(500) | OpenAI-compatible endpoint |
| api_key | VARCHAR(500) | Bearer token |
| model_name | VARCHAR(200) | Model identifier |
| is_active | BOOLEAN | Default true |
| created_at | TIMESTAMPTZ | |
| updated_at | TIMESTAMPTZ |
shared.shadow_runs — One row per comparison execution
| Column | Type | Notes |
|---|---|---|
| id | UUID PK | |
| tenant_id | UUID FK | → shared.organizations |
| feature | VARCHAR(30) | CHECK IN ('operating_health', 'reserve_health', 'investment_recommendations') |
| status | VARCHAR(20) | CHECK IN ('running', 'completed', 'partial', 'failed') |
| triggered_by | UUID FK | → shared.users |
| prompt_messages | JSONB | Exact messages sent to all models (proof of identical input) |
| started_at | TIMESTAMPTZ | |
| completed_at | TIMESTAMPTZ | |
| created_at | TIMESTAMPTZ |
shared.shadow_run_results — One row per model per run (up to 3 per run)
| Column | Type | Notes |
|---|---|---|
| id | UUID PK | |
| run_id | UUID FK | → shadow_runs ON DELETE CASCADE |
| model_role | VARCHAR(20) | CHECK IN ('production', 'alternate_a', 'alternate_b'), UNIQUE(run_id, model_role) |
| model_name | VARCHAR(200) | Snapshot of model used |
| api_url | VARCHAR(500) | Snapshot of endpoint used |
| raw_response | TEXT | Unprocessed AI response |
| parsed_response | JSONB | Validated structured output |
| response_time_ms | INTEGER | |
| token_usage | JSONB | { prompt_tokens, completion_tokens, total_tokens } |
| status | VARCHAR(20) | CHECK IN ('pending', 'running', 'success', 'error') |
| error_message | TEXT | |
| created_at | TIMESTAMPTZ |
Entity files
backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.tsbackend/src/modules/shadow-ai/entities/shadow-run.entity.tsbackend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts
All use @Entity({ schema: 'shared', name: '...' }) pattern.
Phase 4: Shadow AI Backend Module
New directory: backend/src/modules/shadow-ai/
shadow-ai.service.ts
Model CRUD:
getModels()— Return both slots, mask API keys (show last 4 chars)upsertModel(slot, dto)— INSERT/UPDATE config for slot A or BdeleteModel(slot)— Remove model config
Run Execution:
triggerRun(tenantId, feature, userId):- Look up tenant
schema_namefromshared.organizations - Build prompt messages by calling the appropriate exposed method:
operating_health: Create query runner → set search_path →healthScoresService.gatherOperatingData(qr)→healthScoresService.buildOperatingPrompt(data)reserve_health: Same pattern with reserve methodsinvestment_recommendations:investmentPlanningService.buildPromptForSchema(schemaName)
- Insert
shadow_runsrow withprompt_messagesstored as JSONB - Get production config from env vars, alternate configs from DB
- Insert 1-3
shadow_run_resultsrows as 'pending' (production + active alternates) - Return
{ runId }immediately - Fire-and-forget: call all models in parallel using
callOpenAICompatible()- Per feature: operating/reserve use temp 0.1, max_tokens 2048; investment uses temp 0.3, max_tokens 4096
- Update each result row as it completes (success/error, parsed response, timing)
- Update run status when all complete
- Look up tenant
History:
getRunHistory(page, limit, tenantFilter?, featureFilter?)— Paginated list with tenant name JOINgetRunDetail(runId)— Full run + all results
shadow-ai.controller.ts
All endpoints use @UseGuards(JwtAuthGuard) + requireSuperadmin(req) pattern from admin.controller.ts.
| Method | Path | Body/Params |
|---|---|---|
| GET | /admin/shadow-ai/models |
— |
| PUT | /admin/shadow-ai/models/:slot |
{ name, apiUrl, apiKey, modelName, isActive } |
| DELETE | /admin/shadow-ai/models/:slot |
— |
| POST | /admin/shadow-ai/runs |
{ tenantId, feature } |
| GET | /admin/shadow-ai/runs |
?page&limit&tenantId&feature |
| GET | /admin/shadow-ai/runs/:id |
— |
shadow-ai.module.ts
@Module({
imports: [
TypeOrmModule.forFeature([ShadowAiModel, ShadowRun, ShadowRunResult]),
HealthScoresModule,
InvestmentPlanningModule,
UsersModule,
],
controllers: [ShadowAiController],
providers: [ShadowAiService],
})
Register in backend/src/app.module.ts
- Add
import { ShadowAiModule }and include in theimportsarray
Phase 5: Frontend — Admin Shadow AI Page
New file: frontend/src/pages/admin/AdminShadowAiPage.tsx
Layout: Mantine Tabs with 3 tabs
Tab 1: "Model Configuration"
- Three
Cardcomponents in aSimpleGrid cols={3}:- Production (read-only): Shows model name, API URL from a dedicated endpoint or hardcoded label "From environment config"
- Alternate A: Form with
TextInput(name, API URL, model name),PasswordInput(API key),Switch(active), Save/Delete buttons - Alternate B: Same form
- Fetches via
GET /api/admin/shadow-ai/models - Saves via
PUT /api/admin/shadow-ai/models/Aor/B
Tab 2: "Run Comparison"
Selectdropdown for tenant (reuseGET /api/admin/organizationsalready used by AdminPage)Selectfor feature type (Operating Health / Reserve Health / Investment Recommendations)Button"Run Shadow Comparison"- On trigger:
POST /api/admin/shadow-ai/runs→ getrunId - Poll
GET /api/admin/shadow-ai/runs/:idevery 3s viarefetchIntervaluntil status !== 'running' - Show per-model progress indicators during run
- Once complete, render results using shared comparison component (below)
Tab 3: "History"
Tablewith columns: Date, Tenant, Feature, Status (Badge), Duration- Filter controls: tenant Select, feature Select
- Click row → expand detail or modal showing full comparison
- Pagination
Shared Component: Side-by-Side Results Display
SimpleGrid cols={3}(or fewer columns if only some models were configured)- Each column:
- Header: model name + response time
Badge - For health scores: Score with
RingProgress, labelBadge, summary text, factors list (color-coded by impact), recommendations list (color-coded by priority) - For investment: Overall assessment text, recommendation cards with type/priority badges, risk notes
- Collapsible raw JSON via
Accordion
- Header: model name + response time
- Diff highlighting: Where parsed values differ across models, apply subtle background highlight (e.g.,
yellow.0in Mantine theme). Simple recursive comparison of JSON keys/values.
Route addition: frontend/src/App.tsx
Within the /admin route group (after <Route index element={<AdminPage />} />):
<Route path="shadow-ai" element={<AdminShadowAiPage />} />
Sidebar nav: frontend/src/components/layout/Sidebar.tsx
In the isAdminOnly section (after the "Admin Panel" NavLink, around line 134):
<NavLink
label="AI Benchmarking"
leftSection={<IconScale size={18} />}
active={location.pathname === '/admin/shadow-ai'}
onClick={() => go('/admin/shadow-ai')}
color="violet"
/>
Implementation Order
ai-caller.ts— Shared utility (no dependencies)- Health scores + investment planning — Make methods public, add exports, add
buildPromptForSchema - Entities — 3 TypeORM entity files
- Service + Controller + Module — Shadow AI backend
- Register module in
app.module.ts - Frontend page —
AdminShadowAiPage.tsx - Route + Sidebar — Wire up navigation
Verification
- Backend: Start server, confirm no TypeORM errors for new entities
- Model config: Use admin UI to save/load/delete alternate model configs
- Run comparison: Select a tenant, trigger a run, verify all 3 models are called with identical prompts
- Results display: Confirm side-by-side output renders correctly for all 3 feature types
- History: Verify past runs are persisted and browsable
- Auth: Confirm non-superadmin users get 403 on all shadow-ai endpoints
- Production safety: Verify no changes to production AI behavior — shadow runs are completely isolated
Key Files to Modify
backend/src/modules/health-scores/health-scores.service.ts— Make 5 methods publicbackend/src/modules/health-scores/health-scores.module.ts— Add exportsbackend/src/modules/investment-planning/investment-planning.service.ts— AddbuildPromptForSchema(), makebuildPromptMessages()publicbackend/src/modules/investment-planning/investment-planning.module.ts— Add exportsbackend/src/app.module.ts— Register ShadowAiModulefrontend/src/App.tsx— Add routefrontend/src/components/layout/Sidebar.tsx— Add nav item
New Files
backend/src/common/utils/ai-caller.tsbackend/src/modules/shadow-ai/shadow-ai.module.tsbackend/src/modules/shadow-ai/shadow-ai.service.tsbackend/src/modules/shadow-ai/shadow-ai.controller.tsbackend/src/modules/shadow-ai/entities/shadow-ai-model.entity.tsbackend/src/modules/shadow-ai/entities/shadow-run.entity.tsbackend/src/modules/shadow-ai/entities/shadow-run-result.entity.tsfrontend/src/pages/admin/AdminShadowAiPage.tsx