Files

JoeBot 4797669591 feat: add shadow AI benchmarking for admin model comparison

Add a new admin-only feature that allows the platform owner to benchmark
the production AI model against up to 2 alternate models (any OpenAI-compatible
API) using real tenant data, without impacting users.

Backend:
- Shared AI caller utility (ai-caller.ts) for OpenAI-compatible endpoints
- Shadow AI module with service, controller, and 3 entities
- 6 admin API endpoints for model config CRUD, run trigger, and history
- Auto-creates shadow_ai_models, shadow_runs, shadow_run_results tables
- Exposes health-scores and investment-planning prompt builders for reuse

Frontend:
- New admin page at /admin/shadow-ai with 3 tabs:
  - Model Configuration (production + 2 alternate slots)
  - Run Comparison (tenant select, feature select, side-by-side results)
  - History (filterable run log with detail drill-down)
- Full side-by-side output display with diff highlighting
- Sidebar navigation link for AI Benchmarking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-05 07:50:59 -04:00

12 KiB

Raw Permalink Blame History

Shadow AI Benchmarking Feature

Context

The platform uses a single AI model (Qwen 3.5 via NVIDIA NIM) for three features: Operating Health Score, Reserve Health Score, and Investment Recommendations. The platform owner needs a way to evaluate alternate models (different providers, different versions) against the production model using real tenant data — without impacting users. This enables informed model migration decisions by comparing outputs side-by-side.

Architecture Overview

New admin page at /admin/shadow-ai with model configuration, run trigger, and history
New backend module shadow-ai with controller, service, and 3 entities
3 new DB tables in the shared schema for model configs, runs, and results
Shared AI caller utility to avoid duplicating HTTP logic
Minimal changes to existing services: make prompt-building methods public and export modules

Phase 1: Shared AI Caller Utility

New file: `backend/src/common/utils/ai-caller.ts`

Extract the HTTP POST logic (currently duplicated in both callAI() methods) into a reusable function:

export async function callOpenAICompatible(params: {
  apiUrl: string;
  apiKey: string;
  model: string;
  messages: Array<{ role: string; content: string }>;
  temperature: number;
  maxTokens: number;
  timeoutMs?: number; // default 600000
}): Promise<{
  content: string;        // cleaned JSON string (fences + <think> stripped)
  usage?: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
  responseTimeMs: number;
}>

Handles: HTTPS POST to {apiUrl}/chat/completions, timeout, markdown fence stripping, <think> block removal, timing.

Phase 2: Expose Existing Prompt Builders

`backend/src/modules/health-scores/health-scores.service.ts`

Change private → public on:
- gatherOperatingData(qr) (line 252)
- gatherReserveData(qr) (line 523)
- buildOperatingPrompt(data) (line 790)
- buildReservePrompt(data) (line 930)
- checkDataReadiness(qr, scoreType) (used to validate data exists)

`backend/src/modules/health-scores/health-scores.module.ts`

Add exports: [HealthScoresService]

`backend/src/modules/investment-planning/investment-planning.service.ts`

Add new public method buildPromptForSchema(schemaName: string) that:
1. Creates a query runner, sets search_path to the tenant schema
2. Runs the same data-gathering queries (financial snapshot, market rates, monthly forecast) using the query runner directly (bypassing request-scoped TenantService)
3. Calls the existing buildPromptMessages() with gathered data
4. Returns Array<{ role: string; content: string }>
Change buildPromptMessages() from private → public (line 880)

`backend/src/modules/investment-planning/investment-planning.module.ts`

Add exports: [InvestmentPlanningService]

Phase 3: Database Tables & Entities

3 new tables in `shared` schema

shared.shadow_ai_models — Alternate model configurations (slots A and B)

Column	Type	Notes
id	UUID PK
slot	VARCHAR(10)	CHECK IN ('A', 'B'), UNIQUE
name	VARCHAR(100)	Display label
api_url	VARCHAR(500)	OpenAI-compatible endpoint
api_key	VARCHAR(500)	Bearer token
model_name	VARCHAR(200)	Model identifier
is_active	BOOLEAN	Default true
created_at	TIMESTAMPTZ
updated_at	TIMESTAMPTZ

shared.shadow_runs — One row per comparison execution

Column	Type	Notes
id	UUID PK
tenant_id	UUID FK	→ shared.organizations
feature	VARCHAR(30)	CHECK IN ('operating_health', 'reserve_health', 'investment_recommendations')
status	VARCHAR(20)	CHECK IN ('running', 'completed', 'partial', 'failed')
triggered_by	UUID FK	→ shared.users
prompt_messages	JSONB	Exact messages sent to all models (proof of identical input)
started_at	TIMESTAMPTZ
completed_at	TIMESTAMPTZ
created_at	TIMESTAMPTZ

shared.shadow_run_results — One row per model per run (up to 3 per run)

Column	Type	Notes
id	UUID PK
run_id	UUID FK	→ shadow_runs ON DELETE CASCADE
model_role	VARCHAR(20)	CHECK IN ('production', 'alternate_a', 'alternate_b'), UNIQUE(run_id, model_role)
model_name	VARCHAR(200)	Snapshot of model used
api_url	VARCHAR(500)	Snapshot of endpoint used
raw_response	TEXT	Unprocessed AI response
parsed_response	JSONB	Validated structured output
response_time_ms	INTEGER
token_usage	JSONB	{ prompt_tokens, completion_tokens, total_tokens }
status	VARCHAR(20)	CHECK IN ('pending', 'running', 'success', 'error')
error_message	TEXT
created_at	TIMESTAMPTZ

Entity files

backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts
backend/src/modules/shadow-ai/entities/shadow-run.entity.ts
backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts

All use @Entity({ schema: 'shared', name: '...' }) pattern.

Phase 4: Shadow AI Backend Module

New directory: `backend/src/modules/shadow-ai/`

`shadow-ai.service.ts`

Model CRUD:

getModels() — Return both slots, mask API keys (show last 4 chars)
upsertModel(slot, dto) — INSERT/UPDATE config for slot A or B
deleteModel(slot) — Remove model config

Run Execution:

triggerRun(tenantId, feature, userId):
1. Look up tenant schema_name from shared.organizations
2. Build prompt messages by calling the appropriate exposed method:
  - operating_health: Create query runner → set search_path → healthScoresService.gatherOperatingData(qr) → healthScoresService.buildOperatingPrompt(data)
  - reserve_health: Same pattern with reserve methods
  - investment_recommendations: investmentPlanningService.buildPromptForSchema(schemaName)
3. Insert shadow_runs row with prompt_messages stored as JSONB
4. Get production config from env vars, alternate configs from DB
5. Insert 1-3 shadow_run_results rows as 'pending' (production + active alternates)
6. Return { runId } immediately
7. Fire-and-forget: call all models in parallel using callOpenAICompatible()
  - Per feature: operating/reserve use temp 0.1, max_tokens 2048; investment uses temp 0.3, max_tokens 4096
8. Update each result row as it completes (success/error, parsed response, timing)
9. Update run status when all complete

History:

getRunHistory(page, limit, tenantFilter?, featureFilter?) — Paginated list with tenant name JOIN
getRunDetail(runId) — Full run + all results

`shadow-ai.controller.ts`

All endpoints use @UseGuards(JwtAuthGuard) + requireSuperadmin(req) pattern from admin.controller.ts.

Method	Path	Body/Params
GET	`/admin/shadow-ai/models`	—
PUT	`/admin/shadow-ai/models/:slot`	`{ name, apiUrl, apiKey, modelName, isActive }`
DELETE	`/admin/shadow-ai/models/:slot`	—
POST	`/admin/shadow-ai/runs`	`{ tenantId, feature }`
GET	`/admin/shadow-ai/runs`	`?page&limit&tenantId&feature`
GET	`/admin/shadow-ai/runs/:id`	—

`shadow-ai.module.ts`

@Module({
  imports: [
    TypeOrmModule.forFeature([ShadowAiModel, ShadowRun, ShadowRunResult]),
    HealthScoresModule,
    InvestmentPlanningModule,
    UsersModule,
  ],
  controllers: [ShadowAiController],
  providers: [ShadowAiService],
})

Register in `backend/src/app.module.ts`

Add import { ShadowAiModule } and include in the imports array

Phase 5: Frontend — Admin Shadow AI Page

New file: `frontend/src/pages/admin/AdminShadowAiPage.tsx`

Layout: Mantine Tabs with 3 tabs

Tab 1: "Model Configuration"

Three Card components in a SimpleGrid cols={3}:
- Production (read-only): Shows model name, API URL from a dedicated endpoint or hardcoded label "From environment config"
- Alternate A: Form with TextInput (name, API URL, model name), PasswordInput (API key), Switch (active), Save/Delete buttons
- Alternate B: Same form
Fetches via GET /api/admin/shadow-ai/models
Saves via PUT /api/admin/shadow-ai/models/A or /B

Tab 2: "Run Comparison"

Select dropdown for tenant (reuse GET /api/admin/organizations already used by AdminPage)
Select for feature type (Operating Health / Reserve Health / Investment Recommendations)
Button "Run Shadow Comparison"
On trigger: POST /api/admin/shadow-ai/runs → get runId
Poll GET /api/admin/shadow-ai/runs/:id every 3s via refetchInterval until status !== 'running'
Show per-model progress indicators during run
Once complete, render results using shared comparison component (below)

Tab 3: "History"

Table with columns: Date, Tenant, Feature, Status (Badge), Duration
Filter controls: tenant Select, feature Select
Click row → expand detail or modal showing full comparison
Pagination

Shared Component: Side-by-Side Results Display

SimpleGrid cols={3} (or fewer columns if only some models were configured)
Each column:
- Header: model name + response time Badge
- For health scores: Score with RingProgress, label Badge, summary text, factors list (color-coded by impact), recommendations list (color-coded by priority)
- For investment: Overall assessment text, recommendation cards with type/priority badges, risk notes
- Collapsible raw JSON via Accordion
Diff highlighting: Where parsed values differ across models, apply subtle background highlight (e.g., yellow.0 in Mantine theme). Simple recursive comparison of JSON keys/values.

Route addition: `frontend/src/App.tsx`

Within the /admin route group (after <Route index element={<AdminPage />} />):

<Route path="shadow-ai" element={<AdminShadowAiPage />} />

Sidebar nav: `frontend/src/components/layout/Sidebar.tsx`

In the isAdminOnly section (after the "Admin Panel" NavLink, around line 134):

<NavLink
  label="AI Benchmarking"
  leftSection={<IconScale size={18} />}
  active={location.pathname === '/admin/shadow-ai'}
  onClick={() => go('/admin/shadow-ai')}
  color="violet"
/>

Implementation Order

ai-caller.ts — Shared utility (no dependencies)
Health scores + investment planning — Make methods public, add exports, add buildPromptForSchema
Entities — 3 TypeORM entity files
Service + Controller + Module — Shadow AI backend
Register module in app.module.ts
Frontend page — AdminShadowAiPage.tsx
Route + Sidebar — Wire up navigation

Verification

Backend: Start server, confirm no TypeORM errors for new entities
Model config: Use admin UI to save/load/delete alternate model configs
Run comparison: Select a tenant, trigger a run, verify all 3 models are called with identical prompts
Results display: Confirm side-by-side output renders correctly for all 3 feature types
History: Verify past runs are persisted and browsable
Auth: Confirm non-superadmin users get 403 on all shadow-ai endpoints
Production safety: Verify no changes to production AI behavior — shadow runs are completely isolated

Key Files to Modify

backend/src/modules/health-scores/health-scores.service.ts — Make 5 methods public
backend/src/modules/health-scores/health-scores.module.ts — Add exports
backend/src/modules/investment-planning/investment-planning.service.ts — Add buildPromptForSchema(), make buildPromptMessages() public
backend/src/modules/investment-planning/investment-planning.module.ts — Add exports
backend/src/app.module.ts — Register ShadowAiModule
frontend/src/App.tsx — Add route
frontend/src/components/layout/Sidebar.tsx — Add nav item

New Files

backend/src/common/utils/ai-caller.ts
backend/src/modules/shadow-ai/shadow-ai.module.ts
backend/src/modules/shadow-ai/shadow-ai.service.ts
backend/src/modules/shadow-ai/shadow-ai.controller.ts
backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts
backend/src/modules/shadow-ai/entities/shadow-run.entity.ts
backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts
frontend/src/pages/admin/AdminShadowAiPage.tsx

12 KiB Raw Permalink Blame History

Shadow AI Benchmarking Feature

Context

Architecture Overview

Phase 1: Shared AI Caller Utility

New file: backend/src/common/utils/ai-caller.ts

Phase 2: Expose Existing Prompt Builders

backend/src/modules/health-scores/health-scores.service.ts

backend/src/modules/health-scores/health-scores.module.ts

backend/src/modules/investment-planning/investment-planning.service.ts

backend/src/modules/investment-planning/investment-planning.module.ts