From 3790a3bd9e74c806443415d1b0ba6ee233db6687 Mon Sep 17 00:00:00 2001
From: olsch01 <olson2cm@icloud.com>
Date: Tue, 3 Mar 2026 15:06:22 -0500
Subject: [PATCH] docs: add scaling guide for production infrastructure

Covers vertical tuning, managed service offloading, horizontal scaling
with replicas, and multi-node strategies. Includes resource budgets for
the current 4-core/24GB VM, monitoring thresholds for New Relic alerts,
PostgreSQL/Redis tuning values, and a scaling decision tree.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/SCALING.md | 532 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 532 insertions(+)
 create mode 100644 docs/SCALING.md

diff --git a/docs/SCALING.md b/docs/SCALING.md
new file mode 100644
index 0000000..b0014dd
--- /dev/null
+++ b/docs/SCALING.md
@@ -0,0 +1,532 @@
+# HOA LedgerIQ — Scaling Guide
+
+**Version:** 2026.3.2 (beta)
+**Last updated:** 2026-03-03
+**Current infrastructure:** 4 ARM cores, 24 GB RAM, single VM
+
+---
+
+## Table of Contents
+
+1. [Current Architecture Baseline](#current-architecture-baseline)
+2. [Resource Budget — Where Your 24 GB Goes](#resource-budget--where-your-24-gb-goes)
+3. [Scaling Signals — When to Act](#scaling-signals--when-to-act)
+4. [Phase 1: Vertical Tuning (Same VM)](#phase-1-vertical-tuning-same-vm)
+5. [Phase 2: Offload Services (Managed DB + Cache)](#phase-2-offload-services-managed-db--cache)
+6. [Phase 3: Horizontal Scaling (Multiple Backend Instances)](#phase-3-horizontal-scaling-multiple-backend-instances)
+7. [Phase 4: Full Horizontal (Multi-Node)](#phase-4-full-horizontal-multi-node)
+8. [Component-by-Component Scaling Reference](#component-by-component-scaling-reference)
+9. [Docker Daemon Tuning](#docker-daemon-tuning)
+10. [Monitoring with New Relic](#monitoring-with-new-relic)
+
+---
+
+## Current Architecture Baseline
+
+```
+  Internet
+     │
+     ▼
+┌─────────────────────────────────────────────────────────┐
+│  Host VM  (4 ARM cores, 24 GB RAM)                      │
+│                                                         │
+│  ┌──────────────────────────────────┐                   │
+│  │  Host nginx :80/:443 (SSL)      │                   │
+│  │  /api/* → 127.0.0.1:3000        │                   │
+│  │  /*     → 127.0.0.1:3001        │                   │
+│  └──────────┬───────────┬──────────┘                   │
+│             ▼           ▼                               │
+│  ┌──────────────┐ ┌──────────────┐    Docker (hoanet)  │
+│  │ backend :3000│ │frontend :3001│                      │
+│  │  4 workers   │ │ static nginx │                      │
+│  │  1024 MB cap │ │  ~5 MB used  │                      │
+│  └──────┬───────┘ └──────────────┘                      │
+│    ┌────┴────┐                                          │
+│    ▼         ▼                                          │
+│  ┌────────────┐ ┌───────────┐                           │
+│  │postgres    │ │redis      │                           │
+│  │ 1024 MB cap│ │ 256 MB cap│                           │
+│  └────────────┘ └───────────┘                           │
+└─────────────────────────────────────────────────────────┘
+```
+
+**How requests flow:**
+
+1. Browser hits host nginx (SSL termination, rate limiting)
+2. API requests proxy to the NestJS backend (4 clustered workers)
+3. Static asset requests proxy to the frontend nginx container
+4. Backend queries PostgreSQL and Redis over the Docker bridge network
+5. All inter-container traffic stays on the `hoanet` bridge (kernel-routed, no userland proxy)
+
+**Key configuration facts:**
+
+| Component | Current config | Bottleneck at scale |
+|-----------|---------------|---------------------|
+| Backend | 4 Node.js workers (1 per core) | CPU-bound under heavy API load |
+| PostgreSQL | 200 max connections, 256 MB shared_buffers | Connection count, then memory |
+| Redis | 256 MB maxmemory, LRU eviction | Memory, then network |
+| Frontend | Static nginx, ~5 MB memory | Effectively unlimited for static serving |
+| Host nginx | Rate limit: 10 req/s per IP, burst 30 | File descriptors, worker connections |
+
+---
+
+## Resource Budget — Where Your 24 GB Goes
+
+| Component | Memory limit | Typical usage | Notes |
+|-----------|-------------|---------------|-------|
+| Backend | 1024 MB | 250–400 MB | 4 workers share one container limit |
+| PostgreSQL | 1024 MB | 50–300 MB | Grows with active queries and shared_buffers |
+| Redis | 256 MB | 3–10 MB | Very low until caching is heavily used |
+| Frontend | None set | ~5 MB | Static nginx, negligible |
+| Host nginx | N/A (host) | ~10 MB | Runs on the host, not in Docker |
+| New Relic agent | (inside backend) | ~30–50 MB | Included in backend memory |
+| **Total reserved** | **~2.3 GB** | **~500 MB idle** | **~21.5 GB available for growth** |
+
+You have significant headroom. The current configuration is conservative and can handle considerably more load before any changes are needed.
+
+---
+
+## Scaling Signals — When to Act
+
+Use these thresholds from New Relic and system metrics to decide when to scale:
+
+### Immediate action required
+
+| Signal | Threshold | Likely bottleneck |
+|--------|-----------|-------------------|
+| API response time (p95) | > 2 seconds | Backend CPU or DB queries |
+| Error rate | > 1% of requests | Backend memory, DB connections, or bugs |
+| PostgreSQL connection wait time | > 100 ms | Connection pool exhaustion |
+| Container OOM kills | Any occurrence | Memory limit too low |
+
+### Plan scaling within 2–4 weeks
+
+| Signal | Threshold | Likely bottleneck |
+|--------|-----------|-------------------|
+| API response time (p95) | > 500 ms sustained | Backend approaching CPU saturation |
+| Backend CPU (container) | > 80% sustained | Need more workers or replicas |
+| PostgreSQL CPU | > 70% sustained | Query optimization or read replicas |
+| PostgreSQL connections | > 150 of 200 | Pool size or connection leaks |
+| Redis memory | > 200 MB of 256 MB | Increase limit or review eviction |
+| Host disk usage | > 80% | Postgres WAL or Docker image bloat |
+
+### No action needed
+
+| Signal | Range | Meaning |
+|--------|-------|---------|
+| Backend CPU | < 50% | Normal headroom |
+| API response time (p95) | < 200 ms | Healthy |
+| PostgreSQL connections | < 100 | Plenty of capacity |
+| Memory usage (all containers) | < 60% of limits | Well-sized |
+
+---
+
+## Phase 1: Vertical Tuning (Same VM)
+
+**When:** 50–200 concurrent users, response times starting to climb.
+**Cost:** Free — just configuration changes.
+
+### 1.1 Increase backend memory limit
+
+The backend runs 4 workers in a 1024 MB container. Each Node.js worker uses
+60–100 MB at baseline. Under load with New Relic active, they can reach
+150 MB each (600 MB total). Raise the limit to give headroom:
+
+```yaml
+# docker-compose.prod.yml
+backend:
+  deploy:
+    resources:
+      limits:
+        memory: 2048M      # was 1024M
+      reservations:
+        memory: 512M       # was 256M
+```
+
+### 1.2 Tune PostgreSQL for available RAM
+
+With 24 GB on the host, PostgreSQL can use significantly more memory. These
+settings assume PostgreSQL is the only memory-heavy workload besides the
+backend:
+
+```yaml
+# docker-compose.prod.yml
+postgres:
+  command: >
+    postgres
+      -c max_connections=200
+      -c shared_buffers=1GB           # was 256MB (25% of 4GB rule of thumb)
+      -c effective_cache_size=4GB     # was 512MB (OS page cache estimate)
+      -c work_mem=16MB                # was 4MB (per-sort memory)
+      -c maintenance_work_mem=256MB   # was 64MB (VACUUM, CREATE INDEX)
+      -c checkpoint_completion_target=0.9
+      -c wal_buffers=64MB             # was 16MB
+      -c random_page_cost=1.1
+  deploy:
+    resources:
+      limits:
+        memory: 4096M                 # was 1024M
+      reservations:
+        memory: 1024M                 # was 512M
+```
+
+### 1.3 Increase Redis memory
+
+If you start using Redis for session storage or response caching:
+
+```yaml
+# docker-compose.prod.yml
+redis:
+  command: redis-server --appendonly yes --maxmemory 1gb --maxmemory-policy allkeys-lru
+```
+
+### 1.4 Tune host nginx worker connections
+
+```nginx
+# /etc/nginx/nginx.conf (host)
+worker_processes auto;          # matches CPU cores (4)
+events {
+    worker_connections 2048;    # default is often 768
+    multi_accept on;
+}
+```
+
+### Phase 1 capacity estimate
+
+| Metric | Estimate |
+|--------|----------|
+| Concurrent users | 200–500 |
+| API requests/sec | 400–800 |
+| Tenants | 50–100 |
+
+---
+
+## Phase 2: Offload Services (Managed DB + Cache)
+
+**When:** 500+ concurrent users, or you need high availability / automated backups.
+**Cost:** $50–200/month depending on provider and tier.
+
+### 2.1 Move PostgreSQL to a managed service
+
+Replace the Docker PostgreSQL container with a managed instance:
+- **AWS:** RDS for PostgreSQL (db.t4g.medium — 2 vCPU, 4 GB, ~$70/mo)
+- **GCP:** Cloud SQL for PostgreSQL (db-custom-2-4096, ~$65/mo)
+- **DigitalOcean:** Managed Databases ($60/mo for 2 vCPU / 4 GB)
+
+**Changes required:**
+
+1. Update `.env` to point `DATABASE_URL` at the managed instance
+2. In `docker-compose.prod.yml`, disable the postgres container:
+   ```yaml
+   postgres:
+     deploy:
+       replicas: 0
+   ```
+3. Remove the `depends_on: postgres` from the backend service
+4. Ensure the managed DB allows connections from your VM's IP
+
+**Benefits:** Automated backups, point-in-time recovery, read replicas,
+automatic failover, no memory/CPU contention with the application.
+
+### 2.2 Move Redis to a managed service
+
+Replace the Docker Redis container similarly:
+- **AWS:** ElastiCache (cache.t4g.micro, ~$15/mo)
+- **DigitalOcean:** Managed Redis ($15/mo)
+
+Update `REDIS_URL` in `.env` and disable the container.
+
+### Phase 2 resource reclaim
+
+Offloading DB and cache frees ~5 GB of reserved memory on the VM,
+leaving the full 24 GB available for backend scaling (Phase 3).
+
+---
+
+## Phase 3: Horizontal Scaling (Multiple Backend Instances)
+
+**When:** Single backend container hits CPU ceiling (4 workers maxed),
+or you need zero-downtime deployments.
+
+### 3.1 Run multiple backend replicas with Docker Compose
+
+```yaml
+# docker-compose.prod.yml
+backend:
+  deploy:
+    replicas: 2                       # 2 containers × 4 workers = 8 workers
+    resources:
+      limits:
+        memory: 2048M
+      reservations:
+        memory: 512M
+```
+
+**Important:** With replicas > 1 you cannot use `ports:` directly.
+Switch the host nginx upstream to use Docker's internal DNS:
+
+```nginx
+# /etc/nginx/sites-available/your-site
+upstream backend {
+    # Docker Compose assigns container IPs dynamically.
+    # Use a resolver to look up the service name.
+    server 127.0.0.1:3000;
+    server 127.0.0.1:3010;    # second replica on different host port
+}
+```
+
+Alternatively, use Docker Compose port ranges:
+
+```yaml
+backend:
+  ports:
+    - "127.0.0.1:3000-3009:3000"
+  deploy:
+    replicas: 2
+```
+
+### 3.2 Connection pool considerations
+
+Each backend container runs up to 4 workers, each with its own connection
+pool. With the default pool size of 30:
+
+| Replicas | Workers | Max DB connections |
+|----------|---------|-------------------|
+| 1 | 4 | 120 |
+| 2 | 8 | 240 |
+| 3 | 12 | 360 |
+
+If using managed PostgreSQL, ensure `max_connections` on the DB is high
+enough. For > 2 replicas, consider adding **PgBouncer** as a connection
+pooler (transaction-mode pooling) to multiplex connections:
+
+```
+Backend workers (12) → PgBouncer (50 server connections) → PostgreSQL
+```
+
+### 3.3 Session and state considerations
+
+The application currently uses **stateless JWT authentication** — no
+server-side sessions. This means backend replicas can handle any request
+without sticky sessions. Redis is used for caching only. This architecture
+is already horizontal-ready.
+
+### Phase 3 capacity estimate
+
+| Replicas | Concurrent users | API req/sec |
+|----------|-----------------|-------------|
+| 2 | 500–1,000 | 800–1,500 |
+| 3 | 1,000–2,000 | 1,500–2,500 |
+
+---
+
+## Phase 4: Full Horizontal (Multi-Node)
+
+**When:** Single VM resources exhausted, or you need geographic distribution
+and high availability.
+
+### 4.1 Docker Swarm (simplest multi-node)
+
+Docker Swarm is the easiest migration from Docker Compose. The compose
+files are already compatible:
+
+```bash
+# On the manager node
+docker swarm init
+
+# On worker nodes
+docker swarm join --token <token> <manager-ip>:2377
+
+# Deploy the stack
+docker stack deploy -c docker-compose.yml -c docker-compose.prod.yml hoaledgeriq
+```
+
+Scale the backend across nodes:
+
+```bash
+docker service scale hoaledgeriq_backend=4
+```
+
+Swarm handles load balancing across nodes via its built-in ingress network.
+
+### 4.2 Kubernetes (full orchestration)
+
+For larger deployments, migrate to Kubernetes:
+
+- **Backend:** Deployment with HPA (Horizontal Pod Autoscaler) on CPU
+- **Frontend:** Deployment with 2+ replicas behind a Service
+- **PostgreSQL:** External managed service (not in the cluster)
+- **Redis:** External managed service or StatefulSet
+- **Ingress:** nginx-ingress or cloud load balancer
+
+This is a significant migration but provides auto-scaling, self-healing,
+rolling deployments, and multi-region capability.
+
+### 4.3 CDN for static assets
+
+At any point in the scaling journey, a CDN provides the biggest return on
+investment for frontend performance:
+
+- **Cloudflare** (free tier works): Proxy DNS, caches static assets at edge
+- **AWS CloudFront** or **GCP Cloud CDN**: More control, ~$0.085/GB
+
+This eliminates nearly all load on the frontend nginx container and reduces
+latency for geographically distributed users. Static assets (JS, CSS,
+images) are served from edge nodes instead of your VM.
+
+---
+
+## Component-by-Component Scaling Reference
+
+### Backend (NestJS)
+
+| Approach | When | How |
+|----------|------|-----|
+| Tune worker count | CPU underused | Set `WORKERS` env var or modify `main.ts` cap |
+| Increase memory limit | OOM or >80% usage | Raise `deploy.resources.limits.memory` |
+| Add replicas | CPU maxed at 4 workers | `deploy.replicas: N` in compose |
+| Move to separate VM | VM resources exhausted | Run backend on dedicated compute |
+
+**Current clustering logic** (from `backend/src/main.ts`):
+- Production: `Math.min(os.cpus().length, 4)` workers
+- Development: 1 worker
+- To allow more than 4 workers, change the cap in `main.ts`
+
+### PostgreSQL
+
+| Approach | When | How |
+|----------|------|-----|
+| Increase shared_buffers | Cache hit ratio < 99% | Tune postgres command args |
+| Increase max_connections | Pool exhaustion errors | Increase in postgres command + add PgBouncer |
+| Add read replica | Read-heavy workload | Managed DB feature or streaming replication |
+| Vertical scale | Query latency high | Larger managed DB instance |
+
+**Key queries to monitor:**
+```sql
+-- Connection usage
+SELECT count(*) AS active, max_conn FROM pg_stat_activity,
+  (SELECT setting::int AS max_conn FROM pg_settings WHERE name='max_connections') s
+GROUP BY max_conn;
+
+-- Cache hit ratio (should be > 99%)
+SELECT
+  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
+FROM pg_statio_user_tables;
+
+-- Slow queries (if pg_stat_statements is enabled)
+SELECT query, mean_exec_time, calls
+FROM pg_stat_statements
+ORDER BY mean_exec_time DESC
+LIMIT 10;
+```
+
+### Redis
+
+| Approach | When | How |
+|----------|------|-----|
+| Increase maxmemory | Evictions happening frequently | Change `--maxmemory` in compose command |
+| Move to managed | Need persistence guarantees | AWS ElastiCache / DigitalOcean Managed Redis |
+| Add replica | Read-heavy caching | Managed service with read replicas |
+
+### Host Nginx
+
+| Approach | When | How |
+|----------|------|-----|
+| Tune worker_connections | Connection refused errors | Increase in `/etc/nginx/nginx.conf` |
+| Add upstream servers | Multiple backend replicas | upstream block with multiple servers |
+| Move to load balancer | Multi-node deployment | Cloud LB (ALB, GCP LB) or HAProxy |
+| Add CDN | Static asset latency | Cloudflare, CloudFront, etc. |
+
+---
+
+## Docker Daemon Tuning
+
+These settings are applied on the host in `/etc/docker/daemon.json`:
+
+```json
+{
+  "userland-proxy": false,
+  "log-driver": "json-file",
+  "log-opts": {
+    "max-size": "50m",
+    "max-file": "3"
+  },
+  "default-ulimits": {
+    "nofile": {
+      "Name": "nofile",
+      "Hard": 65536,
+      "Soft": 65536
+    }
+  }
+}
+```
+
+| Setting | Purpose |
+|---------|---------|
+| `userland-proxy: false` | Kernel-level port forwarding instead of userspace Go proxy (already applied) |
+| `log-opts` | Prevents Docker container logs from filling the disk |
+| `default-ulimits.nofile` | Raises file descriptor limit for containers handling many connections |
+
+After changing, restart Docker: `sudo systemctl restart docker`
+
+---
+
+## Monitoring with New Relic
+
+New Relic is deployed on the backend via the conditional preload
+(`NEW_RELIC_ENABLED=true` in `.env`). Key dashboards to set up:
+
+### Alerts to configure
+
+| Alert | Condition | Priority |
+|-------|-----------|----------|
+| High error rate | > 1% for 5 minutes | Critical |
+| Slow transactions | p95 > 2s for 5 minutes | Critical |
+| Apdex score drop | < 0.7 for 10 minutes | Warning |
+| Memory usage | > 80% of container limit for 10 minutes | Warning |
+| Transaction throughput drop | > 50% decrease vs. baseline | Warning |
+
+### Key transactions to monitor
+
+| Endpoint | Why |
+|----------|-----|
+| `POST /api/auth/login` | Authentication performance, first thing every user hits |
+| `GET /api/journal-entries` | Heaviest read query (double-entry bookkeeping with lines) |
+| `POST /api/investment-planning/recommendations` | AI endpoint, 30–180s response time, external dependency |
+| `GET /api/reports/*` | Financial reports with aggregate queries |
+| `GET /api/projects` | Includes real-time funding computation across all reserve projects |
+
+### Infrastructure metrics to export
+
+If you later add the New Relic Infrastructure agent to the host VM,
+you can correlate application performance with system metrics:
+
+```bash
+# Install on the host (not in Docker)
+curl -Ls https://download.newrelic.com/install/newrelic-cli/scripts/install.sh | bash
+sudo NEW_RELIC_API_KEY=<your-key> NEW_RELIC_ACCOUNT_ID=<your-id> \
+  /usr/local/bin/newrelic install -n infrastructure-agent-installer
+```
+
+This provides host-level CPU, memory, disk, and network metrics alongside
+your application telemetry.
+
+---
+
+## Quick Reference — Scaling Decision Tree
+
+```
+Is API response time (p95) > 500ms?
+├── Yes → Is backend CPU > 80%?
+│   ├── Yes → Phase 1: Already at 4 workers?
+│   │   ├── Yes → Phase 3: Add backend replicas
+│   │   └── No  → Raise worker cap in main.ts
+│   └── No  → Is PostgreSQL slow?
+│       ├── Yes → Phase 1: Tune PG memory, or Phase 2: Managed DB
+│       └── No  → Profile the slow endpoints in New Relic
+├── No  → Is memory > 80% on any container?
+│   ├── Yes → Phase 1: Raise memory limits (you have 21+ GB free)
+│   └── No  → Is disk > 80%?
+│       ├── Yes → Clean Docker images, tune PG WAL retention, add log rotation
+│       └── No  → No scaling needed
+```