docs: add scaling guide for production infrastructure

Covers vertical tuning, managed service offloading, horizontal scaling with replicas, and multi-node strategies. Includes resource budgets for the current 4-core/24GB VM, monitoring thresholds for New Relic alerts, PostgreSQL/Redis tuning values, and a scaling decision tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 15:06:22 -05:00
parent 0a07c61ca3
commit 3790a3bd9e
1 changed files with 532 additions and 0 deletions
--- a/docs/SCALING.md
+++ b/docs/SCALING.md
@@ -0,0 +1,532 @@
 # HOA LedgerIQ — Scaling Guide
 **Version:** 2026.3.2 (beta)
 **Last updated:** 2026-03-03
 **Current infrastructure:** 4 ARM cores, 24 GB RAM, single VM
 ---
 ## Table of Contents
 1. [Current Architecture Baseline](#current-architecture-baseline)
 2. [Resource Budget — Where Your 24 GB Goes](#resource-budget--where-your-24-gb-goes)
 3. [Scaling Signals — When to Act](#scaling-signals--when-to-act)
 4. [Phase 1: Vertical Tuning (Same VM)](#phase-1-vertical-tuning-same-vm)
 5. [Phase 2: Offload Services (Managed DB + Cache)](#phase-2-offload-services-managed-db--cache)
 6. [Phase 3: Horizontal Scaling (Multiple Backend Instances)](#phase-3-horizontal-scaling-multiple-backend-instances)
 7. [Phase 4: Full Horizontal (Multi-Node)](#phase-4-full-horizontal-multi-node)
 8. [Component-by-Component Scaling Reference](#component-by-component-scaling-reference)
 9. [Docker Daemon Tuning](#docker-daemon-tuning)
 10. [Monitoring with New Relic](#monitoring-with-new-relic)
 ---
 ## Current Architecture Baseline
 ```
  Internet
     │
     ▼
 ┌─────────────────────────────────────────────────────────┐
 │  Host VM  (4 ARM cores, 24 GB RAM)                      │
 │                                                         │
 │  ┌──────────────────────────────────┐                   │
 │  │  Host nginx :80/:443 (SSL)      │                   │
 │  │  /api/* → 127.0.0.1:3000        │                   │
 │  │  /*     → 127.0.0.1:3001        │                   │
 │  └──────────┬───────────┬──────────┘                   │
 │             ▼           ▼                               │
 │  ┌──────────────┐ ┌──────────────┐    Docker (hoanet)  │
 │  │ backend :3000│ │frontend :3001│                      │
 │  │  4 workers   │ │ static nginx │                      │
 │  │  1024 MB cap │ │  ~5 MB used  │                      │
 │  └──────┬───────┘ └──────────────┘                      │
 │    ┌────┴────┐                                          │
 │    ▼         ▼                                          │
 │  ┌────────────┐ ┌───────────┐                           │
 │  │postgres    │ │redis      │                           │
 │  │ 1024 MB cap│ │ 256 MB cap│                           │
 │  └────────────┘ └───────────┘                           │
 └─────────────────────────────────────────────────────────┘
 ```
 **How requests flow:**
 1. Browser hits host nginx (SSL termination, rate limiting)
 2. API requests proxy to the NestJS backend (4 clustered workers)
 3. Static asset requests proxy to the frontend nginx container
 4. Backend queries PostgreSQL and Redis over the Docker bridge network
 5. All inter-container traffic stays on the `hoanet` bridge (kernel-routed, no userland proxy)
 **Key configuration facts:**
 | Component | Current config | Bottleneck at scale |
 |-----------|---------------|---------------------|
 | Backend | 4 Node.js workers (1 per core) | CPU-bound under heavy API load |
 | PostgreSQL | 200 max connections, 256 MB shared_buffers | Connection count, then memory |
 | Redis | 256 MB maxmemory, LRU eviction | Memory, then network |
 | Frontend | Static nginx, ~5 MB memory | Effectively unlimited for static serving |
 | Host nginx | Rate limit: 10 req/s per IP, burst 30 | File descriptors, worker connections |
 ---
 ## Resource Budget — Where Your 24 GB Goes
 | Component | Memory limit | Typical usage | Notes |
 |-----------|-------------|---------------|-------|
 | Backend | 1024 MB | 250–400 MB | 4 workers share one container limit |
 | PostgreSQL | 1024 MB | 50–300 MB | Grows with active queries and shared_buffers |
 | Redis | 256 MB | 3–10 MB | Very low until caching is heavily used |
 | Frontend | None set | ~5 MB | Static nginx, negligible |
 | Host nginx | N/A (host) | ~10 MB | Runs on the host, not in Docker |
 | New Relic agent | (inside backend) | ~30–50 MB | Included in backend memory |
 | **Total reserved** | **~2.3 GB** | **~500 MB idle** | **~21.5 GB available for growth** |
 You have significant headroom. The current configuration is conservative and can handle considerably more load before any changes are needed.
 ---
 ## Scaling Signals — When to Act
 Use these thresholds from New Relic and system metrics to decide when to scale:
 ### Immediate action required
 | Signal | Threshold | Likely bottleneck |
 |--------|-----------|-------------------|
 | API response time (p95) | > 2 seconds | Backend CPU or DB queries |
 | Error rate | > 1% of requests | Backend memory, DB connections, or bugs |
 | PostgreSQL connection wait time | > 100 ms | Connection pool exhaustion |
 | Container OOM kills | Any occurrence | Memory limit too low |
 ### Plan scaling within 2–4 weeks
 | Signal | Threshold | Likely bottleneck |
 |--------|-----------|-------------------|
 | API response time (p95) | > 500 ms sustained | Backend approaching CPU saturation |
 | Backend CPU (container) | > 80% sustained | Need more workers or replicas |
 | PostgreSQL CPU | > 70% sustained | Query optimization or read replicas |
 | PostgreSQL connections | > 150 of 200 | Pool size or connection leaks |
 | Redis memory | > 200 MB of 256 MB | Increase limit or review eviction |
 | Host disk usage | > 80% | Postgres WAL or Docker image bloat |
 ### No action needed
 | Signal | Range | Meaning |
 |--------|-------|---------|
 | Backend CPU | < 50% | Normal headroom |
 | API response time (p95) | < 200 ms | Healthy |
 | PostgreSQL connections | < 100 | Plenty of capacity |
 | Memory usage (all containers) | < 60% of limits | Well-sized |
 ---
 ## Phase 1: Vertical Tuning (Same VM)
 **When:** 50–200 concurrent users, response times starting to climb.
 **Cost:** Free — just configuration changes.
 ### 1.1 Increase backend memory limit
 The backend runs 4 workers in a 1024 MB container. Each Node.js worker uses
 60–100 MB at baseline. Under load with New Relic active, they can reach
 150 MB each (600 MB total). Raise the limit to give headroom:
 ```yaml
 # docker-compose.prod.yml
 backend:
  deploy:
    resources:
      limits:
        memory: 2048M      # was 1024M
      reservations:
        memory: 512M       # was 256M
 ```
 ### 1.2 Tune PostgreSQL for available RAM
 With 24 GB on the host, PostgreSQL can use significantly more memory. These
 settings assume PostgreSQL is the only memory-heavy workload besides the
 backend:
 ```yaml
 # docker-compose.prod.yml
 postgres:
  command: >
    postgres
      -c max_connections=200
      -c shared_buffers=1GB           # was 256MB (25% of 4GB rule of thumb)
      -c effective_cache_size=4GB     # was 512MB (OS page cache estimate)
      -c work_mem=16MB                # was 4MB (per-sort memory)
      -c maintenance_work_mem=256MB   # was 64MB (VACUUM, CREATE INDEX)
      -c checkpoint_completion_target=0.9
      -c wal_buffers=64MB             # was 16MB
      -c random_page_cost=1.1
  deploy:
    resources:
      limits:
        memory: 4096M                 # was 1024M
      reservations:
        memory: 1024M                 # was 512M
 ```
 ### 1.3 Increase Redis memory
 If you start using Redis for session storage or response caching:
 ```yaml
 # docker-compose.prod.yml
 redis:
  command: redis-server --appendonly yes --maxmemory 1gb --maxmemory-policy allkeys-lru
 ```
 ### 1.4 Tune host nginx worker connections
 ```nginx
 # /etc/nginx/nginx.conf (host)
 worker_processes auto;          # matches CPU cores (4)
 events {
    worker_connections 2048;    # default is often 768
    multi_accept on;
 }
 ```
 ### Phase 1 capacity estimate
 | Metric | Estimate |
 |--------|----------|
 | Concurrent users | 200–500 |
 | API requests/sec | 400–800 |
 | Tenants | 50–100 |
 ---
 ## Phase 2: Offload Services (Managed DB + Cache)
 **When:** 500+ concurrent users, or you need high availability / automated backups.
 **Cost:** $50–200/month depending on provider and tier.
 ### 2.1 Move PostgreSQL to a managed service
 Replace the Docker PostgreSQL container with a managed instance:
 - **AWS:** RDS for PostgreSQL (db.t4g.medium — 2 vCPU, 4 GB, ~$70/mo)
 - **GCP:** Cloud SQL for PostgreSQL (db-custom-2-4096, ~$65/mo)
 - **DigitalOcean:** Managed Databases ($60/mo for 2 vCPU / 4 GB)
 **Changes required:**
 1. Update `.env` to point `DATABASE_URL` at the managed instance
 2. In `docker-compose.prod.yml`, disable the postgres container:
   ```yaml
   postgres:
     deploy:
       replicas: 0
   ```
 3. Remove the `depends_on: postgres` from the backend service
 4. Ensure the managed DB allows connections from your VM's IP
 **Benefits:** Automated backups, point-in-time recovery, read replicas,
 automatic failover, no memory/CPU contention with the application.
 ### 2.2 Move Redis to a managed service
 Replace the Docker Redis container similarly:
 - **AWS:** ElastiCache (cache.t4g.micro, ~$15/mo)
 - **DigitalOcean:** Managed Redis ($15/mo)
 Update `REDIS_URL` in `.env` and disable the container.
 ### Phase 2 resource reclaim
 Offloading DB and cache frees ~5 GB of reserved memory on the VM,
 leaving the full 24 GB available for backend scaling (Phase 3).
 ---
 ## Phase 3: Horizontal Scaling (Multiple Backend Instances)
 **When:** Single backend container hits CPU ceiling (4 workers maxed),
 or you need zero-downtime deployments.
 ### 3.1 Run multiple backend replicas with Docker Compose
 ```yaml
 # docker-compose.prod.yml
 backend:
  deploy:
    replicas: 2                       # 2 containers × 4 workers = 8 workers
    resources:
      limits:
        memory: 2048M
      reservations:
        memory: 512M
 ```
 **Important:** With replicas > 1 you cannot use `ports:` directly.
 Switch the host nginx upstream to use Docker's internal DNS:
 ```nginx
 # /etc/nginx/sites-available/your-site
 upstream backend {
    # Docker Compose assigns container IPs dynamically.
    # Use a resolver to look up the service name.
    server 127.0.0.1:3000;
    server 127.0.0.1:3010;    # second replica on different host port
 }
 ```
 Alternatively, use Docker Compose port ranges:
 ```yaml
 backend:
  ports:
    - "127.0.0.1:3000-3009:3000"
  deploy:
    replicas: 2
 ```
 ### 3.2 Connection pool considerations
 Each backend container runs up to 4 workers, each with its own connection
 pool. With the default pool size of 30:
 | Replicas | Workers | Max DB connections |
 |----------|---------|-------------------|
 | 1 | 4 | 120 |
 | 2 | 8 | 240 |
 | 3 | 12 | 360 |
 If using managed PostgreSQL, ensure `max_connections` on the DB is high
 enough. For > 2 replicas, consider adding **PgBouncer** as a connection
 pooler (transaction-mode pooling) to multiplex connections:
 ```
 Backend workers (12) → PgBouncer (50 server connections) → PostgreSQL
 ```
 ### 3.3 Session and state considerations
 The application currently uses **stateless JWT authentication** — no
 server-side sessions. This means backend replicas can handle any request
 without sticky sessions. Redis is used for caching only. This architecture
 is already horizontal-ready.
 ### Phase 3 capacity estimate
 | Replicas | Concurrent users | API req/sec |
 |----------|-----------------|-------------|
 | 2 | 500–1,000 | 800–1,500 |
 | 3 | 1,000–2,000 | 1,500–2,500 |
 ---
 ## Phase 4: Full Horizontal (Multi-Node)
 **When:** Single VM resources exhausted, or you need geographic distribution
 and high availability.
 ### 4.1 Docker Swarm (simplest multi-node)
 Docker Swarm is the easiest migration from Docker Compose. The compose
 files are already compatible:
 ```bash
 # On the manager node
 docker swarm init
 # On worker nodes
 docker swarm join --token <token> <manager-ip>:2377
 # Deploy the stack
 docker stack deploy -c docker-compose.yml -c docker-compose.prod.yml hoaledgeriq
 ```
 Scale the backend across nodes:
 ```bash
 docker service scale hoaledgeriq_backend=4
 ```
 Swarm handles load balancing across nodes via its built-in ingress network.
 ### 4.2 Kubernetes (full orchestration)
 For larger deployments, migrate to Kubernetes:
 - **Backend:** Deployment with HPA (Horizontal Pod Autoscaler) on CPU
 - **Frontend:** Deployment with 2+ replicas behind a Service
 - **PostgreSQL:** External managed service (not in the cluster)
 - **Redis:** External managed service or StatefulSet
 - **Ingress:** nginx-ingress or cloud load balancer
 This is a significant migration but provides auto-scaling, self-healing,
 rolling deployments, and multi-region capability.
 ### 4.3 CDN for static assets
 At any point in the scaling journey, a CDN provides the biggest return on
 investment for frontend performance:
 - **Cloudflare** (free tier works): Proxy DNS, caches static assets at edge
 - **AWS CloudFront** or **GCP Cloud CDN**: More control, ~$0.085/GB
 This eliminates nearly all load on the frontend nginx container and reduces
 latency for geographically distributed users. Static assets (JS, CSS,
 images) are served from edge nodes instead of your VM.
 ---
 ## Component-by-Component Scaling Reference
 ### Backend (NestJS)
 | Approach | When | How |
 |----------|------|-----|
 | Tune worker count | CPU underused | Set `WORKERS` env var or modify `main.ts` cap |
 | Increase memory limit | OOM or >80% usage | Raise `deploy.resources.limits.memory` |
 | Add replicas | CPU maxed at 4 workers | `deploy.replicas: N` in compose |
 | Move to separate VM | VM resources exhausted | Run backend on dedicated compute |
 **Current clustering logic** (from `backend/src/main.ts`):
 - Production: `Math.min(os.cpus().length, 4)` workers
 - Development: 1 worker
 - To allow more than 4 workers, change the cap in `main.ts`
 ### PostgreSQL
 | Approach | When | How |
 |----------|------|-----|
 | Increase shared_buffers | Cache hit ratio < 99% | Tune postgres command args |
 | Increase max_connections | Pool exhaustion errors | Increase in postgres command + add PgBouncer |
 | Add read replica | Read-heavy workload | Managed DB feature or streaming replication |
 | Vertical scale | Query latency high | Larger managed DB instance |
 **Key queries to monitor:**
 ```sql
 -- Connection usage
 SELECT count(*) AS active, max_conn FROM pg_stat_activity,
  (SELECT setting::int AS max_conn FROM pg_settings WHERE name='max_connections') s
 GROUP BY max_conn;
 -- Cache hit ratio (should be > 99%)
 SELECT
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
 FROM pg_statio_user_tables;
 -- Slow queries (if pg_stat_statements is enabled)
 SELECT query, mean_exec_time, calls
 FROM pg_stat_statements
 ORDER BY mean_exec_time DESC
 LIMIT 10;
 ```
 ### Redis
 | Approach | When | How |
 |----------|------|-----|
 | Increase maxmemory | Evictions happening frequently | Change `--maxmemory` in compose command |
 | Move to managed | Need persistence guarantees | AWS ElastiCache / DigitalOcean Managed Redis |
 | Add replica | Read-heavy caching | Managed service with read replicas |
 ### Host Nginx
 | Approach | When | How |
 |----------|------|-----|
 | Tune worker_connections | Connection refused errors | Increase in `/etc/nginx/nginx.conf` |
 | Add upstream servers | Multiple backend replicas | upstream block with multiple servers |
 | Move to load balancer | Multi-node deployment | Cloud LB (ALB, GCP LB) or HAProxy |
 | Add CDN | Static asset latency | Cloudflare, CloudFront, etc. |
 ---
 ## Docker Daemon Tuning
 These settings are applied on the host in `/etc/docker/daemon.json`:
 ```json
 {
  "userland-proxy": false,
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "50m",
    "max-file": "3"
  },
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 65536,
      "Soft": 65536
    }
  }
 }
 ```
 | Setting | Purpose |
 |---------|---------|
 | `userland-proxy: false` | Kernel-level port forwarding instead of userspace Go proxy (already applied) |
 | `log-opts` | Prevents Docker container logs from filling the disk |
 | `default-ulimits.nofile` | Raises file descriptor limit for containers handling many connections |
 After changing, restart Docker: `sudo systemctl restart docker`
 ---
 ## Monitoring with New Relic
 New Relic is deployed on the backend via the conditional preload
 (`NEW_RELIC_ENABLED=true` in `.env`). Key dashboards to set up:
 ### Alerts to configure
 | Alert | Condition | Priority |
 |-------|-----------|----------|
 | High error rate | > 1% for 5 minutes | Critical |
 | Slow transactions | p95 > 2s for 5 minutes | Critical |
 | Apdex score drop | < 0.7 for 10 minutes | Warning |
 | Memory usage | > 80% of container limit for 10 minutes | Warning |
 | Transaction throughput drop | > 50% decrease vs. baseline | Warning |
 ### Key transactions to monitor
 | Endpoint | Why |
 |----------|-----|
 | `POST /api/auth/login` | Authentication performance, first thing every user hits |
 | `GET /api/journal-entries` | Heaviest read query (double-entry bookkeeping with lines) |
 | `POST /api/investment-planning/recommendations` | AI endpoint, 30–180s response time, external dependency |
 | `GET /api/reports/*` | Financial reports with aggregate queries |
 | `GET /api/projects` | Includes real-time funding computation across all reserve projects |
 ### Infrastructure metrics to export
 If you later add the New Relic Infrastructure agent to the host VM,
 you can correlate application performance with system metrics:
 ```bash
 # Install on the host (not in Docker)
 curl -Ls https://download.newrelic.com/install/newrelic-cli/scripts/install.sh | bash
 sudo NEW_RELIC_API_KEY=<your-key> NEW_RELIC_ACCOUNT_ID=<your-id> \
  /usr/local/bin/newrelic install -n infrastructure-agent-installer
 ```
 This provides host-level CPU, memory, disk, and network metrics alongside
 your application telemetry.
 ---
 ## Quick Reference — Scaling Decision Tree
 ```
 Is API response time (p95) > 500ms?
 ├── Yes → Is backend CPU > 80%?
 │   ├── Yes → Phase 1: Already at 4 workers?
 │   │   ├── Yes → Phase 3: Add backend replicas
 │   │   └── No  → Raise worker cap in main.ts
 │   └── No  → Is PostgreSQL slow?
 │       ├── Yes → Phase 1: Tune PG memory, or Phase 2: Managed DB
 │       └── No  → Profile the slow endpoints in New Relic
 ├── No  → Is memory > 80% on any container?
 │   ├── Yes → Phase 1: Raise memory limits (you have 21+ GB free)
 │   └── No  → Is disk > 80%?
 │       ├── Yes → Clean Docker images, tune PG WAL retention, add log rotation
 │       └── No  → No scaling needed
 ```