Files
HOA_Financial_Platform/docs/shadow-ai-benchmarking-plan.md
JoeBot 4797669591 feat: add shadow AI benchmarking for admin model comparison
Add a new admin-only feature that allows the platform owner to benchmark
the production AI model against up to 2 alternate models (any OpenAI-compatible
API) using real tenant data, without impacting users.

Backend:
- Shared AI caller utility (ai-caller.ts) for OpenAI-compatible endpoints
- Shadow AI module with service, controller, and 3 entities
- 6 admin API endpoints for model config CRUD, run trigger, and history
- Auto-creates shadow_ai_models, shadow_runs, shadow_run_results tables
- Exposes health-scores and investment-planning prompt builders for reuse

Frontend:
- New admin page at /admin/shadow-ai with 3 tabs:
  - Model Configuration (production + 2 alternate slots)
  - Run Comparison (tenant select, feature select, side-by-side results)
  - History (filterable run log with detail drill-down)
- Full side-by-side output display with diff highlighting
- Sidebar navigation link for AI Benchmarking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 07:50:59 -04:00

12 KiB

Shadow AI Benchmarking Feature

Context

The platform uses a single AI model (Qwen 3.5 via NVIDIA NIM) for three features: Operating Health Score, Reserve Health Score, and Investment Recommendations. The platform owner needs a way to evaluate alternate models (different providers, different versions) against the production model using real tenant data — without impacting users. This enables informed model migration decisions by comparing outputs side-by-side.

Architecture Overview

  • New admin page at /admin/shadow-ai with model configuration, run trigger, and history
  • New backend module shadow-ai with controller, service, and 3 entities
  • 3 new DB tables in the shared schema for model configs, runs, and results
  • Shared AI caller utility to avoid duplicating HTTP logic
  • Minimal changes to existing services: make prompt-building methods public and export modules

Phase 1: Shared AI Caller Utility

New file: backend/src/common/utils/ai-caller.ts

Extract the HTTP POST logic (currently duplicated in both callAI() methods) into a reusable function:

export async function callOpenAICompatible(params: {
  apiUrl: string;
  apiKey: string;
  model: string;
  messages: Array<{ role: string; content: string }>;
  temperature: number;
  maxTokens: number;
  timeoutMs?: number; // default 600000
}): Promise<{
  content: string;        // cleaned JSON string (fences + <think> stripped)
  usage?: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
  responseTimeMs: number;
}>

Handles: HTTPS POST to {apiUrl}/chat/completions, timeout, markdown fence stripping, <think> block removal, timing.

Phase 2: Expose Existing Prompt Builders

backend/src/modules/health-scores/health-scores.service.ts

  • Change privatepublic on:
    • gatherOperatingData(qr) (line 252)
    • gatherReserveData(qr) (line 523)
    • buildOperatingPrompt(data) (line 790)
    • buildReservePrompt(data) (line 930)
    • checkDataReadiness(qr, scoreType) (used to validate data exists)

backend/src/modules/health-scores/health-scores.module.ts

  • Add exports: [HealthScoresService]

backend/src/modules/investment-planning/investment-planning.service.ts

  • Add new public method buildPromptForSchema(schemaName: string) that:
    1. Creates a query runner, sets search_path to the tenant schema
    2. Runs the same data-gathering queries (financial snapshot, market rates, monthly forecast) using the query runner directly (bypassing request-scoped TenantService)
    3. Calls the existing buildPromptMessages() with gathered data
    4. Returns Array<{ role: string; content: string }>
  • Change buildPromptMessages() from privatepublic (line 880)

backend/src/modules/investment-planning/investment-planning.module.ts

  • Add exports: [InvestmentPlanningService]

Phase 3: Database Tables & Entities

3 new tables in shared schema

shared.shadow_ai_models — Alternate model configurations (slots A and B)

Column Type Notes
id UUID PK
slot VARCHAR(10) CHECK IN ('A', 'B'), UNIQUE
name VARCHAR(100) Display label
api_url VARCHAR(500) OpenAI-compatible endpoint
api_key VARCHAR(500) Bearer token
model_name VARCHAR(200) Model identifier
is_active BOOLEAN Default true
created_at TIMESTAMPTZ
updated_at TIMESTAMPTZ

shared.shadow_runs — One row per comparison execution

Column Type Notes
id UUID PK
tenant_id UUID FK → shared.organizations
feature VARCHAR(30) CHECK IN ('operating_health', 'reserve_health', 'investment_recommendations')
status VARCHAR(20) CHECK IN ('running', 'completed', 'partial', 'failed')
triggered_by UUID FK → shared.users
prompt_messages JSONB Exact messages sent to all models (proof of identical input)
started_at TIMESTAMPTZ
completed_at TIMESTAMPTZ
created_at TIMESTAMPTZ

shared.shadow_run_results — One row per model per run (up to 3 per run)

Column Type Notes
id UUID PK
run_id UUID FK → shadow_runs ON DELETE CASCADE
model_role VARCHAR(20) CHECK IN ('production', 'alternate_a', 'alternate_b'), UNIQUE(run_id, model_role)
model_name VARCHAR(200) Snapshot of model used
api_url VARCHAR(500) Snapshot of endpoint used
raw_response TEXT Unprocessed AI response
parsed_response JSONB Validated structured output
response_time_ms INTEGER
token_usage JSONB { prompt_tokens, completion_tokens, total_tokens }
status VARCHAR(20) CHECK IN ('pending', 'running', 'success', 'error')
error_message TEXT
created_at TIMESTAMPTZ

Entity files

  • backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts
  • backend/src/modules/shadow-ai/entities/shadow-run.entity.ts
  • backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts

All use @Entity({ schema: 'shared', name: '...' }) pattern.

Phase 4: Shadow AI Backend Module

New directory: backend/src/modules/shadow-ai/

shadow-ai.service.ts

Model CRUD:

  • getModels() — Return both slots, mask API keys (show last 4 chars)
  • upsertModel(slot, dto) — INSERT/UPDATE config for slot A or B
  • deleteModel(slot) — Remove model config

Run Execution:

  • triggerRun(tenantId, feature, userId):
    1. Look up tenant schema_name from shared.organizations
    2. Build prompt messages by calling the appropriate exposed method:
      • operating_health: Create query runner → set search_path → healthScoresService.gatherOperatingData(qr)healthScoresService.buildOperatingPrompt(data)
      • reserve_health: Same pattern with reserve methods
      • investment_recommendations: investmentPlanningService.buildPromptForSchema(schemaName)
    3. Insert shadow_runs row with prompt_messages stored as JSONB
    4. Get production config from env vars, alternate configs from DB
    5. Insert 1-3 shadow_run_results rows as 'pending' (production + active alternates)
    6. Return { runId } immediately
    7. Fire-and-forget: call all models in parallel using callOpenAICompatible()
      • Per feature: operating/reserve use temp 0.1, max_tokens 2048; investment uses temp 0.3, max_tokens 4096
    8. Update each result row as it completes (success/error, parsed response, timing)
    9. Update run status when all complete

History:

  • getRunHistory(page, limit, tenantFilter?, featureFilter?) — Paginated list with tenant name JOIN
  • getRunDetail(runId) — Full run + all results

shadow-ai.controller.ts

All endpoints use @UseGuards(JwtAuthGuard) + requireSuperadmin(req) pattern from admin.controller.ts.

Method Path Body/Params
GET /admin/shadow-ai/models
PUT /admin/shadow-ai/models/:slot { name, apiUrl, apiKey, modelName, isActive }
DELETE /admin/shadow-ai/models/:slot
POST /admin/shadow-ai/runs { tenantId, feature }
GET /admin/shadow-ai/runs ?page&limit&tenantId&feature
GET /admin/shadow-ai/runs/:id

shadow-ai.module.ts

@Module({
  imports: [
    TypeOrmModule.forFeature([ShadowAiModel, ShadowRun, ShadowRunResult]),
    HealthScoresModule,
    InvestmentPlanningModule,
    UsersModule,
  ],
  controllers: [ShadowAiController],
  providers: [ShadowAiService],
})

Register in backend/src/app.module.ts

  • Add import { ShadowAiModule } and include in the imports array

Phase 5: Frontend — Admin Shadow AI Page

New file: frontend/src/pages/admin/AdminShadowAiPage.tsx

Layout: Mantine Tabs with 3 tabs

Tab 1: "Model Configuration"

  • Three Card components in a SimpleGrid cols={3}:
    • Production (read-only): Shows model name, API URL from a dedicated endpoint or hardcoded label "From environment config"
    • Alternate A: Form with TextInput (name, API URL, model name), PasswordInput (API key), Switch (active), Save/Delete buttons
    • Alternate B: Same form
  • Fetches via GET /api/admin/shadow-ai/models
  • Saves via PUT /api/admin/shadow-ai/models/A or /B

Tab 2: "Run Comparison"

  • Select dropdown for tenant (reuse GET /api/admin/organizations already used by AdminPage)
  • Select for feature type (Operating Health / Reserve Health / Investment Recommendations)
  • Button "Run Shadow Comparison"
  • On trigger: POST /api/admin/shadow-ai/runs → get runId
  • Poll GET /api/admin/shadow-ai/runs/:id every 3s via refetchInterval until status !== 'running'
  • Show per-model progress indicators during run
  • Once complete, render results using shared comparison component (below)

Tab 3: "History"

  • Table with columns: Date, Tenant, Feature, Status (Badge), Duration
  • Filter controls: tenant Select, feature Select
  • Click row → expand detail or modal showing full comparison
  • Pagination

Shared Component: Side-by-Side Results Display

  • SimpleGrid cols={3} (or fewer columns if only some models were configured)
  • Each column:
    • Header: model name + response time Badge
    • For health scores: Score with RingProgress, label Badge, summary text, factors list (color-coded by impact), recommendations list (color-coded by priority)
    • For investment: Overall assessment text, recommendation cards with type/priority badges, risk notes
    • Collapsible raw JSON via Accordion
  • Diff highlighting: Where parsed values differ across models, apply subtle background highlight (e.g., yellow.0 in Mantine theme). Simple recursive comparison of JSON keys/values.

Route addition: frontend/src/App.tsx

Within the /admin route group (after <Route index element={<AdminPage />} />):

<Route path="shadow-ai" element={<AdminShadowAiPage />} />

Sidebar nav: frontend/src/components/layout/Sidebar.tsx

In the isAdminOnly section (after the "Admin Panel" NavLink, around line 134):

<NavLink
  label="AI Benchmarking"
  leftSection={<IconScale size={18} />}
  active={location.pathname === '/admin/shadow-ai'}
  onClick={() => go('/admin/shadow-ai')}
  color="violet"
/>

Implementation Order

  1. ai-caller.ts — Shared utility (no dependencies)
  2. Health scores + investment planning — Make methods public, add exports, add buildPromptForSchema
  3. Entities — 3 TypeORM entity files
  4. Service + Controller + Module — Shadow AI backend
  5. Register module in app.module.ts
  6. Frontend pageAdminShadowAiPage.tsx
  7. Route + Sidebar — Wire up navigation

Verification

  1. Backend: Start server, confirm no TypeORM errors for new entities
  2. Model config: Use admin UI to save/load/delete alternate model configs
  3. Run comparison: Select a tenant, trigger a run, verify all 3 models are called with identical prompts
  4. Results display: Confirm side-by-side output renders correctly for all 3 feature types
  5. History: Verify past runs are persisted and browsable
  6. Auth: Confirm non-superadmin users get 403 on all shadow-ai endpoints
  7. Production safety: Verify no changes to production AI behavior — shadow runs are completely isolated

Key Files to Modify

  • backend/src/modules/health-scores/health-scores.service.ts — Make 5 methods public
  • backend/src/modules/health-scores/health-scores.module.ts — Add exports
  • backend/src/modules/investment-planning/investment-planning.service.ts — Add buildPromptForSchema(), make buildPromptMessages() public
  • backend/src/modules/investment-planning/investment-planning.module.ts — Add exports
  • backend/src/app.module.ts — Register ShadowAiModule
  • frontend/src/App.tsx — Add route
  • frontend/src/components/layout/Sidebar.tsx — Add nav item

New Files

  • backend/src/common/utils/ai-caller.ts
  • backend/src/modules/shadow-ai/shadow-ai.module.ts
  • backend/src/modules/shadow-ai/shadow-ai.service.ts
  • backend/src/modules/shadow-ai/shadow-ai.controller.ts
  • backend/src/modules/shadow-ai/entities/shadow-ai-model.entity.ts
  • backend/src/modules/shadow-ai/entities/shadow-run.entity.ts
  • backend/src/modules/shadow-ai/entities/shadow-run-result.entity.ts
  • frontend/src/pages/admin/AdminShadowAiPage.tsx