fix: health check now probes HTTP directly with 3-min timeout

The previous approach relied on Docker's container health status, but Docker's healthcheck (start_period:30s + 3x15s retries = ~75s) marks the container "unhealthy" before NestJS finishes cold-starting after a fresh image build (New Relic + TypeORM + Redis + BullMQ init can take 2-3 minutes). Changes: - Primary check is now direct wget to localhost:3000/api from the host - Docker health status used only for informational logging - Total timeout increased from 130s to 190s (~3 min) for cold starts - Early exit if container has stopped/exited (no point waiting) - More backend log lines (30 vs 20) shown on failure for diagnostics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
docs: add Gitea Actions runner setup guide for production server
2026-04-09 09:42:49 -04:00 · 2026-04-09 09:35:49 -04:00 · 2026-04-09 09:22:28 -04:00 · 2026-04-09 09:15:54 -04:00 · 2026-04-09 09:09:24 -04:00 · 2026-04-09 09:05:45 -04:00
4 changed files with 741 additions and 5 deletions
--- a/.gitea/workflows/deploy.yml
+++ b/.gitea/workflows/deploy.yml
@@ -0,0 +1,65 @@
+# ---------------------------------------------------------------------------
+# Production Deployment Workflow for HOA LedgerIQ
+#
+# Trigger: Manual only (workflow_dispatch) — production deploys are intentional.
+# Runner:  Self-hosted on the production server at /opt/hoa-ledgeriq.
+#
+# This workflow does NOT use actions/checkout. The runner operates directly
+# on the production directory. The deploy script itself handles git pull.
+# ---------------------------------------------------------------------------
+
+name: Deploy to Production
+
+on:
+  workflow_dispatch:
+    inputs:
+      seed_existing:
+        description: "Mark existing migrations as applied without running them (first deployment only)"
+        required: false
+        default: "false"
+        type: boolean
+
+jobs:
+  deploy:
+    name: Deploy
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: /opt/hoa-ledgeriq
+
+    steps:
+      - name: Pre-deploy info
+        run: |
+          echo "## Pre-Deploy Info" >> $GITHUB_STEP_SUMMARY
+          echo "- **Server:** $(hostname)" >> $GITHUB_STEP_SUMMARY
+          echo "- **Directory:** $(pwd)" >> $GITHUB_STEP_SUMMARY
+          echo "- **Current commit:** $(git rev-parse --short HEAD)" >> $GITHUB_STEP_SUMMARY
+          echo "- **Branch:** $(git branch --show-current || echo 'detached')" >> $GITHUB_STEP_SUMMARY
+          echo "- **Triggered by:** ${{ github.actor }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Seed existing:** ${{ inputs.seed_existing }}" >> $GITHUB_STEP_SUMMARY
+          echo "- **Started at:** $(date -Iseconds)" >> $GITHUB_STEP_SUMMARY
+
+      - name: Run deployment
+        run: |
+          DEPLOY_FLAGS=""
+          if [ "${{ inputs.seed_existing }}" = "true" ]; then
+            DEPLOY_FLAGS="--seed-existing"
+          fi
+          bash scripts/deploy-prod.sh $DEPLOY_FLAGS
+        env:
+          TERM: xterm
+
+      - name: Deployment result
+        if: always()
+        run: |
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "## Deployment Result" >> $GITHUB_STEP_SUMMARY
+          if [ "${{ job.status }}" = "success" ]; then
+            echo "- **Status:** Successful" >> $GITHUB_STEP_SUMMARY
+            echo "- **Commit:** $(git rev-parse --short HEAD)" >> $GITHUB_STEP_SUMMARY
+          else
+            echo "- **Status:** FAILED (auto-rollback triggered)" >> $GITHUB_STEP_SUMMARY
+            echo "- **Commit (after rollback):** $(git rev-parse --short HEAD)" >> $GITHUB_STEP_SUMMARY
+            echo "- Check the deploy log on the server for details" >> $GITHUB_STEP_SUMMARY
+          fi
+          echo "- **Completed at:** $(date -Iseconds)" >> $GITHUB_STEP_SUMMARY
--- a/docs/gitea-runner-setup.md
+++ b/docs/gitea-runner-setup.md
@@ -0,0 +1,230 @@
+# Gitea Actions Runner Setup — HOALedgerIQ Production Server
+
+This guide walks through setting up a self-hosted Gitea Actions runner on the production server so the deployment workflow (`.gitea/workflows/deploy.yml`) can execute automatically.
+
+The runner uses **host execution mode** — jobs run directly on the server (not inside Docker containers) so the deploy script has access to Docker, the git repo, and the local filesystem.
+
+---
+
+## Prerequisites
+
+- Ubuntu Linux production server
+- Gitea instance (e.g., `https://git.sensetostyle.com`)
+- Docker and Docker Compose installed on the server
+- The HOALedgerIQ repo cloned at `/opt/hoa-ledgeriq`
+
+---
+
+## Step 1: Enable Actions in Gitea
+
+Ensure Actions are enabled in your Gitea configuration (`/etc/gitea/app.ini`):
+
+```ini
+[actions]
+ENABLED = true
+```
+
+Restart Gitea after making changes:
+
+```bash
+sudo systemctl restart gitea
+```
+
+---
+
+## Step 2: Get a Registration Token
+
+1. Log into your Gitea instance
+2. Navigate to **Site Administration** → **Actions** → **Runners**
+3. Copy the **Registration Token**
+
+> **Tip:** For tighter security, you can get a repo-scoped token instead:
+> Repo → **Settings** → **Actions** → **Runners** → copy the token shown there.
+> This limits the runner to only execute workflows from that specific repository.
+
+---
+
+## Step 3: Install the Act Runner Binary
+
+```bash
+# Download the latest act_runner for x86_64 Linux
+wget https://dl.gitea.com/act_runner/latest/act_runner-linux-amd64
+
+# Make executable and install to system path
+chmod +x act_runner-linux-amd64
+sudo mv act_runner-linux-amd64 /usr/local/bin/act_runner
+
+# Verify installation
+act_runner --version
+```
+
+> For ARM64 servers, use `act_runner-linux-arm64` instead.
+
+---
+
+## Step 4: Generate and Edit the Configuration
+
+```bash
+sudo mkdir -p /etc/act_runner
+act_runner generate-config > /tmp/config.yaml
+```
+
+Edit `/tmp/config.yaml` and set the **labels to use host execution mode**:
+
+```yaml
+runner:
+  labels:
+    - "ubuntu-latest:host"
+    - "ubuntu-22.04:host"
+```
+
+The `:host` suffix tells the runner to execute jobs directly on the server rather than spinning up Docker containers. This is required because the deploy script needs access to:
+
+- The Docker socket (to run `docker compose`)
+- The git repository at `/opt/hoa-ledgeriq`
+- The backup scripts and database
+
+Move the config into place and lock down permissions:
+
+```bash
+sudo mv /tmp/config.yaml /etc/act_runner/config.yaml
+sudo chmod 600 /etc/act_runner/config.yaml
+```
+
+---
+
+## Step 5: Register the Runner
+
+```bash
+act_runner register \
+  --no-interactive \
+  --instance "https://git.sensetostyle.com" \
+  --token "YOUR_REGISTRATION_TOKEN_HERE" \
+  --name "hoaledgeriq-prod" \
+  --labels "ubuntu-latest:host,ubuntu-22.04:host" \
+  --config /etc/act_runner/config.yaml
+```
+
+This creates a `.runner` file in the current directory containing the registration state.
+
+> **Interactive alternative:** Run `act_runner register --config /etc/act_runner/config.yaml` and follow the prompts.
+
+---
+
+## Step 6: Set Up as a Systemd Service
+
+Create the service file at `/etc/systemd/system/act_runner.service`:
+
+```ini
+[Unit]
+Description=Gitea Actions Runner (HOALedgerIQ Prod)
+Documentation=https://docs.gitea.com/usage/actions/act-runner
+After=docker.service network-online.target
+
+[Service]
+Type=simple
+User=root
+WorkingDirectory=/opt/hoa-ledgeriq
+ExecStart=/usr/local/bin/act_runner daemon --config /etc/act_runner/config.yaml
+Restart=always
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+
+[Install]
+WantedBy=multi-user.target
+```
+
+> **Security note on `User=root`:** The deploy script needs to run `docker compose`, `git reset --hard`, etc. If you have a dedicated deploy user in the `docker` group with write access to `/opt/hoa-ledgeriq`, use that instead. Running as root is the simplest option but grants maximum privileges.
+
+Enable and start the service:
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable act_runner
+sudo systemctl start act_runner
+```
+
+---
+
+## Step 7: Verify the Runner Is Online
+
+Check the service is running:
+
+```bash
+sudo systemctl status act_runner
+```
+
+View logs:
+
+```bash
+sudo journalctl -u act_runner -f
+```
+
+Then confirm in Gitea:
+
+1. Go to **Site Administration** → **Actions** → **Runners**
+2. You should see **"hoaledgeriq-prod"** listed with status **Online**
+
+---
+
+## Step 8: Test the Workflow
+
+1. Go to your repo on Gitea → **Actions** tab
+2. Select the **"Deploy to Production"** workflow
+3. Click **Run Workflow**
+4. If this is the first deployment against an existing database, check the **"Mark existing migrations as applied"** box
+5. Monitor the run in the Actions tab
+
+---
+
+## Troubleshooting
+
+### Runner shows as Offline
+
+```bash
+# Check service status and logs
+sudo systemctl status act_runner
+sudo journalctl -u act_runner -n 50
+
+# Verify the instance URL is reachable from the server
+wget -qO- https://git.sensetostyle.com/api/v1/version
+```
+
+### Workflow stuck on "Waiting for runner"
+
+- Verify the runner labels match what the workflow expects. The workflow uses `runs-on: ubuntu-latest` which must match the `ubuntu-latest:host` label.
+- Check the runner is registered at the correct scope (instance-wide, org-level, or repo-level).
+
+### Permission denied errors during deploy
+
+- Ensure the systemd service `User` has Docker access (`usermod -aG docker <user>`)
+- Ensure the user has write access to `/opt/hoa-ledgeriq`
+
+### Re-registering after token expiry
+
+```bash
+sudo systemctl stop act_runner
+# Get a new token from Gitea admin panel, then:
+act_runner register \
+  --no-interactive \
+  --instance "https://git.sensetostyle.com" \
+  --token "NEW_TOKEN_HERE" \
+  --name "hoaledgeriq-prod" \
+  --labels "ubuntu-latest:host,ubuntu-22.04:host" \
+  --config /etc/act_runner/config.yaml
+sudo systemctl start act_runner
+```
+
+---
+
+## Security Best Practices
+
+| Concern | Recommendation |
+|---------|----------------|
+| Runner user | Use a dedicated user with `docker` group access rather than `root` when possible |
+| Registration token | Rotate periodically in the Gitea admin panel |
+| Config file | Keep `/etc/act_runner/config.yaml` at mode `600` (owner-read only) |
+| Runner scope | Register at the **repo level** instead of instance-wide so only this repo can trigger deployments |
+| Workflow triggers | The deploy workflow uses `workflow_dispatch` (manual only) — no automatic triggers on push |
+| Network | Ensure Gitea is accessed over HTTPS with valid SSL certificates |
--- a/scripts/db-backup.sh
+++ b/scripts/db-backup.sh
@@ -21,6 +21,7 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 PROJECT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
 BACKUP_DIR="$PROJECT_DIR/backups"
 KEEP_DAYS=0          # 0 = keep forever
+FORCE_YES=false      # skip interactive confirmations (for automation)
 DB_USER="${POSTGRES_USER:-hoafinance}"
 DB_NAME="${POSTGRES_DB:-hoafinance}"
 COMPOSE_CMD="docker compose"
@@ -51,9 +52,9 @@ ensure_postgres_running() {

 format_size() {
  local bytes=$1
-  if   (( bytes >= 1073741824 )); then printf "%.1f GB" "$(echo "$bytes / 1073741824" | bc -l)"
-  elif (( bytes >= 1048576 ));    then printf "%.1f MB" "$(echo "$bytes / 1048576" | bc -l)"
-  elif (( bytes >= 1024 ));       then printf "%.1f KB" "$(echo "$bytes / 1024" | bc -l)"
+  if   (( bytes >= 1073741824 )); then printf "%d.%d GB" $((bytes / 1073741824)) $(( (bytes % 1073741824) * 10 / 1073741824 ))
+  elif (( bytes >= 1048576 ));    then printf "%d.%d MB" $((bytes / 1048576))    $(( (bytes % 1048576)    * 10 / 1048576 ))
+  elif (( bytes >= 1024 ));       then printf "%d.%d KB" $((bytes / 1024))       $(( (bytes % 1024)       * 10 / 1024 ))
  else printf "%d B" "$bytes"
  fi
 }
@@ -121,8 +122,12 @@ do_restore() {
  warn "This will DESTROY the current '${DB_NAME}' database and replace it"
  warn "with the contents of: $(basename "$file")"
  echo ""
-  read -rp "Type 'yes' to continue: " confirm
-  [ "$confirm" = "yes" ] || { info "Aborted."; exit 0; }
+  if [ "$FORCE_YES" = true ]; then
+    info "Skipping confirmation (--yes flag set)"
+  else
+    read -rp "Type 'yes' to continue: " confirm
+    [ "$confirm" = "yes" ] || { info "Aborted."; exit 0; }
+  fi

  echo ""
  info "Step 1/4 — Terminating active connections ..."
@@ -229,6 +234,7 @@ Usage:
 Options:
  --dir DIR      Backup directory (default: ./backups)
  --keep DAYS    Auto-delete backups older than DAYS (default: keep all)
+  --yes, -y      Skip interactive confirmation prompts (for automation)

 Supported restore formats:
  .dump.gz       Custom-format pg_dump, gzipped (default backup format)
@@ -255,6 +261,7 @@ while [ $# -gt 0 ]; do
  case "$1" in
    --dir)   BACKUP_DIR="$2"; shift 2 ;;
    --keep)  KEEP_DAYS="$2";  shift 2 ;;
+    --yes|-y) FORCE_YES=true; shift ;;
    --help)  usage ;;
    *)
      if [ "$COMMAND" = "restore" ] && [ -z "$RESTORE_FILE" ]; then
--- a/scripts/deploy-prod.sh
+++ b/scripts/deploy-prod.sh
@@ -0,0 +1,434 @@
+#!/usr/bin/env bash
+# ---------------------------------------------------------------------------
+# deploy-prod.sh — Production deployment script for HOA LedgerIQ
+#
+# Usage:
+#   ./scripts/deploy-prod.sh [--seed-existing]
+#
+# This script performs a full production deployment:
+#   1. Takes a pre-upgrade database backup
+#   2. Pulls the latest code from the main branch
+#   3. Rebuilds and restarts Docker containers
+#   4. Runs any pending database migrations (tracked in shared.schema_migrations)
+#   5. Verifies the application is healthy
+#   6. Takes a post-upgrade database backup
+#
+# On failure (migration error or health check), the script automatically:
+#   - Restores the pre-upgrade database backup
+#   - Reverts the code to the previous commit
+#   - Rebuilds containers from the reverted code
+#
+# Flags:
+#   --seed-existing   Mark all existing migration files as applied without
+#                     executing them. Use this ONLY on the first deployment
+#                     against an existing database where migrations were
+#                     previously applied manually.
+#
+# Environment:
+#   PROJECT_DIR        Override the project directory (default: /opt/hoa-ledgeriq)
+#   POSTGRES_USER      Database user (default: hoafinance)
+#   POSTGRES_DB        Database name (default: hoafinance)
+# ---------------------------------------------------------------------------
+
+set -euo pipefail
+
+# ---- Defaults ----
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+PROJECT_DIR="${PROJECT_DIR:-/opt/hoa-ledgeriq}"
+COMPOSE_CMD="docker compose -f $PROJECT_DIR/docker-compose.yml -f $PROJECT_DIR/docker-compose.prod.yml"
+DB_USER="${POSTGRES_USER:-hoafinance}"
+DB_NAME="${POSTGRES_DB:-hoafinance}"
+MIGRATION_DIR="$PROJECT_DIR/db/migrations"
+HEALTH_URL="http://localhost:3000/api"
+HEALTH_RETRIES=36
+HEALTH_INTERVAL=5
+HEALTH_START_WAIT=10
+LOG_DIR="$PROJECT_DIR/logs"
+LOG_FILE="$LOG_DIR/deploy-$(date +%Y%m%d_%H%M%S).log"
+
+# State tracking
+SEED_EXISTING=false
+PREV_COMMIT=""
+BACKUP_FILE=""
+ROLLBACK_NEEDED=false
+DEPLOY_SUCCESS=false
+DEPLOY_START_TIME=""
+
+# ---- Colors ----
+RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; CYAN='\033[0;36m'; NC='\033[0m'
+
+# ---- Logging ----
+log()  { echo -e "$(date -Iseconds) ${CYAN}[DEPLOY]${NC} $*"; }
+ok()   { echo -e "$(date -Iseconds) ${GREEN}[OK]${NC}     $*"; }
+warn() { echo -e "$(date -Iseconds) ${YELLOW}[WARN]${NC}   $*"; }
+err()  { echo -e "$(date -Iseconds) ${RED}[ERROR]${NC}  $*" >&2; }
+die()  { err "$@"; exit 1; }
+
+# ---- Parse flags ----
+while [ $# -gt 0 ]; do
+  case "$1" in
+    --seed-existing) SEED_EXISTING=true; shift ;;
+    --help|-h)
+      head -35 "$0" | tail -33
+      exit 0
+      ;;
+    *) die "Unknown argument: $1" ;;
+  esac
+done
+
+# ---- Setup logging ----
+mkdir -p "$LOG_DIR"
+exec > >(tee -a "$LOG_FILE") 2>&1
+
+# ---- Cleanup / Rollback trap ----
+cleanup() {
+  if [ "$DEPLOY_SUCCESS" = true ]; then
+    return 0
+  fi
+
+  if [ "$ROLLBACK_NEEDED" = true ] && [ -n "$BACKUP_FILE" ]; then
+    echo ""
+    err "=========================================="
+    err "  DEPLOYMENT FAILED — STARTING ROLLBACK"
+    err "=========================================="
+    echo ""
+
+    # Step 1: Restore the pre-upgrade database backup
+    log "Restoring database from pre-upgrade backup: $(basename "$BACKUP_FILE")"
+    if "$SCRIPT_DIR/db-backup.sh" restore --yes "$BACKUP_FILE"; then
+      ok "Database restored successfully"
+    else
+      err "DATABASE RESTORE FAILED — manual intervention required!"
+      err "Backup file: $BACKUP_FILE"
+      exit 1
+    fi
+
+    # Step 2: Revert code to previous commit
+    if [ -n "$PREV_COMMIT" ]; then
+      log "Reverting code to previous commit: $PREV_COMMIT"
+      cd "$PROJECT_DIR"
+      git reset --hard "$PREV_COMMIT"
+      ok "Code reverted to $PREV_COMMIT"
+    fi
+
+    # Step 3: Rebuild containers from old code
+    log "Rebuilding containers from reverted code ..."
+    cd "$PROJECT_DIR"
+    $COMPOSE_CMD up -d --build
+    ok "Containers rebuilt from previous version"
+
+    echo ""
+    err "Rollback complete. The system is restored to the pre-deployment state."
+    err "Review the deploy log for details: $LOG_FILE"
+    exit 1
+  elif [ "$ROLLBACK_NEEDED" = true ]; then
+    err "Rollback needed but no backup file available — manual intervention required!"
+    exit 1
+  fi
+}
+
+trap cleanup EXIT
+
+# ====================================================================
+#  STEP 1: Pre-flight checks
+# ====================================================================
+log "============================================"
+log "  HOA LedgerIQ — Production Deployment"
+log "============================================"
+log "Project directory: $PROJECT_DIR"
+log "Timestamp: $(date -Iseconds)"
+DEPLOY_START_TIME=$(date +%s)
+echo ""
+
+cd "$PROJECT_DIR"
+
+# Verify prerequisites
+command -v git >/dev/null 2>&1 || die "git is not installed"
+command -v docker >/dev/null 2>&1 || die "docker is not installed"
+docker compose version >/dev/null 2>&1 || die "docker compose is not available"
+
+# Verify we're in a git repo
+[ -d ".git" ] || die "$PROJECT_DIR is not a git repository"
+
+# Verify postgres is running
+if ! $COMPOSE_CMD ps postgres 2>/dev/null | grep -q "running\|Up"; then
+  die "PostgreSQL container is not running. Start it with: $COMPOSE_CMD up -d postgres"
+fi
+
+# Store current commit for rollback
+PREV_COMMIT=$(git rev-parse HEAD)
+log "Current commit: $PREV_COMMIT"
+
+# ====================================================================
+#  STEP 2: Pre-upgrade database backup
+# ====================================================================
+echo ""
+log "--- Step 1/6: Pre-upgrade database backup ---"
+
+BACKUP_OUTPUT=$("$SCRIPT_DIR/db-backup.sh" backup 2>&1)
+echo "$BACKUP_OUTPUT"
+
+# Extract the backup file path from the output (strip ANSI color codes first)
+BACKUP_FILE=$(echo "$BACKUP_OUTPUT" | sed 's/\x1b\[[0-9;]*m//g' | grep -oP 'Backup complete: \K\S+' || true)
+
+if [ -z "$BACKUP_FILE" ]; then
+  die "Failed to capture backup file path from db-backup.sh output"
+fi
+
+if [ ! -f "$BACKUP_FILE" ]; then
+  die "Backup file does not exist: $BACKUP_FILE"
+fi
+
+ok "Pre-upgrade backup saved: $(basename "$BACKUP_FILE")"
+
+# From this point forward, rollback is possible
+ROLLBACK_NEEDED=true
+
+# ====================================================================
+#  STEP 3: Pull latest code
+# ====================================================================
+echo ""
+log "--- Step 2/6: Pulling latest code from main ---"
+
+git fetch origin main
+git reset --hard origin/main
+
+NEW_COMMIT=$(git rev-parse HEAD)
+log "Updated to commit: $NEW_COMMIT"
+
+if [ "$PREV_COMMIT" = "$NEW_COMMIT" ]; then
+  warn "No new commits — continuing anyway (migrations or rebuilds may still be needed)"
+fi
+
+# ====================================================================
+#  STEP 4: Rebuild and restart containers
+# ====================================================================
+echo ""
+log "--- Step 3/6: Rebuilding and restarting containers ---"
+
+$COMPOSE_CMD up -d --build
+
+# Wait for postgres to be healthy before running migrations
+log "Waiting for PostgreSQL to be healthy ..."
+PG_RETRIES=30
+PG_COUNT=0
+while [ $PG_COUNT -lt $PG_RETRIES ]; do
+  if $COMPOSE_CMD exec -T postgres pg_isready -U "$DB_USER" -d "$DB_NAME" >/dev/null 2>&1; then
+    ok "PostgreSQL is ready"
+    break
+  fi
+  ((PG_COUNT++))
+  if [ $PG_COUNT -eq $PG_RETRIES ]; then
+    die "PostgreSQL did not become healthy after $((PG_RETRIES * 2))s"
+  fi
+  sleep 2
+done
+
+# ====================================================================
+#  STEP 5: Run database migrations
+# ====================================================================
+echo ""
+log "--- Step 4/6: Running database migrations ---"
+
+# Helper: run SQL via psql in the postgres container
+run_sql() {
+  $COMPOSE_CMD exec -T postgres psql -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 --quiet "$@"
+}
+
+# Step 5a: Ensure the migration tracking table exists
+log "Ensuring shared.schema_migrations table exists ..."
+run_sql <<'SQL'
+CREATE SCHEMA IF NOT EXISTS shared;
+CREATE TABLE IF NOT EXISTS shared.schema_migrations (
+    id SERIAL PRIMARY KEY,
+    filename TEXT NOT NULL UNIQUE,
+    applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    checksum TEXT
+);
+SQL
+ok "Migration tracking table ready"
+
+# Helper: check if a migration has been applied (safe with set -u)
+is_applied() {
+  local key="$1"
+  # Use a subshell test to avoid unbound variable with set -u on empty associative arrays
+  [[ -n "${APPLIED_MIGRATIONS[$key]:-}" ]]
+}
+
+# Step 5b: Get list of already-applied migrations
+declare -A APPLIED_MIGRATIONS=()
+while IFS= read -r fname; do
+  fname=$(echo "$fname" | xargs)  # trim whitespace
+  [ -n "$fname" ] && APPLIED_MIGRATIONS["$fname"]=1
+done < <(run_sql -t -c "SELECT filename FROM shared.schema_migrations ORDER BY filename;" 2>/dev/null || true)
+
+APPLIED_COUNT=${#APPLIED_MIGRATIONS[@]}
+log "Previously applied migrations: $APPLIED_COUNT"
+
+# Step 5c: Scan migration directory for .sql files
+MIGRATION_FILES=()
+if [ -d "$MIGRATION_DIR" ]; then
+  while IFS= read -r f; do
+    MIGRATION_FILES+=("$(basename "$f")")
+  done < <(find "$MIGRATION_DIR" -name "*.sql" -type f | sort)
+fi
+
+TOTAL_MIGRATIONS=${#MIGRATION_FILES[@]}
+log "Total migration files found: $TOTAL_MIGRATIONS"
+
+# Step 5d: Handle --seed-existing (first deployment only)
+if [ "$SEED_EXISTING" = true ]; then
+  if [ "$APPLIED_COUNT" -gt 0 ]; then
+    warn "--seed-existing flag set but $APPLIED_COUNT migrations are already tracked. Skipping seed."
+  else
+    log "Seeding migration tracking table with ${TOTAL_MIGRATIONS} existing migration files ..."
+    for filename in "${MIGRATION_FILES[@]}"; do
+      checksum=$(md5sum "$MIGRATION_DIR/$filename" | awk '{print $1}')
+      run_sql -c "INSERT INTO shared.schema_migrations (filename, checksum) VALUES ('$filename', '$checksum') ON CONFLICT (filename) DO NOTHING;"
+      log "  Seeded: $filename"
+    done
+    ok "All existing migrations marked as applied (not executed)"
+    # Refresh the applied list
+    APPLIED_COUNT=$TOTAL_MIGRATIONS
+    for filename in "${MIGRATION_FILES[@]}"; do
+      APPLIED_MIGRATIONS["$filename"]=1
+    done
+  fi
+fi
+
+# Step 5e: Detect first-run without --seed-existing
+if [ "$APPLIED_COUNT" -eq 0 ] && [ "$TOTAL_MIGRATIONS" -gt 0 ] && [ "$SEED_EXISTING" = false ]; then
+  warn "The migration tracking table is empty but $TOTAL_MIGRATIONS migration files exist."
+  warn "If these migrations were previously applied manually, re-run with --seed-existing"
+  warn "to register them without re-executing. Otherwise, all migrations will be applied."
+  warn ""
+  warn "Continuing in 10 seconds ... (Ctrl+C to abort)"
+  sleep 10
+fi
+
+# Step 5f: Apply pending migrations
+PENDING_COUNT=0
+APPLIED_THIS_RUN=0
+
+for filename in "${MIGRATION_FILES[@]}"; do
+  if is_applied "$filename"; then
+    continue
+  fi
+  ((PENDING_COUNT++))
+done
+
+if [ "$PENDING_COUNT" -eq 0 ]; then
+  ok "No pending migrations to apply"
+else
+  log "$PENDING_COUNT pending migration(s) to apply"
+  echo ""
+
+  for filename in "${MIGRATION_FILES[@]}"; do
+    if is_applied "$filename"; then
+      continue
+    fi
+
+    checksum=$(md5sum "$MIGRATION_DIR/$filename" | awk '{print $1}')
+    log "  Applying: $filename ..."
+
+    # Run the migration in a single transaction with error stopping
+    if cat "$MIGRATION_DIR/$filename" | $COMPOSE_CMD exec -T postgres psql \
+        -U "$DB_USER" \
+        -d "$DB_NAME" \
+        -v ON_ERROR_STOP=1 \
+        --single-transaction \
+        --quiet 2>&1; then
+
+      # Record successful migration
+      run_sql -c "INSERT INTO shared.schema_migrations (filename, checksum) VALUES ('$filename', '$checksum');"
+      ok "  Applied: $filename"
+      ((APPLIED_THIS_RUN++))
+    else
+      err "Migration FAILED: $filename"
+      err "Triggering automatic rollback ..."
+      exit 1  # trap will handle rollback
+    fi
+  done
+
+  echo ""
+  ok "Successfully applied $APPLIED_THIS_RUN migration(s)"
+fi
+
+# ====================================================================
+#  STEP 6: Health check
+# ====================================================================
+echo ""
+log "--- Step 5/6: Verifying application health ---"
+
+# After a fresh image build, NestJS cold-start can take 2-3 minutes:
+#   New Relic init → TypeORM connections → Redis → BullMQ → NestJS bootstrap
+# Docker's own healthcheck (start_period:30s + 3×15s retries = ~75s) is too
+# aggressive and will mark the container "unhealthy" before the app finishes
+# booting. So we do NOT rely on Docker's health status — we probe the HTTP
+# endpoint directly from the host and give it up to ~3 minutes total.
+TOTAL_WAIT=$((HEALTH_START_WAIT + HEALTH_RETRIES * HEALTH_INTERVAL))
+log "Will wait up to ${TOTAL_WAIT}s for backend to respond at $HEALTH_URL ..."
+sleep "$HEALTH_START_WAIT"
+
+HEALTHY=false
+for ((i=1; i<=HEALTH_RETRIES; i++)); do
+  # Direct HTTP check from the host using wget (available on Ubuntu)
+  if wget -qO- --timeout=5 "$HEALTH_URL" >/dev/null 2>&1; then
+    HEALTHY=true
+    break
+  fi
+
+  # Also check Docker's container health for informational logging
+  CONTAINER_HEALTH=$($COMPOSE_CMD ps backend --format '{{.Health}}' 2>/dev/null || echo "unknown")
+
+  # If the container exited or was removed, fail immediately — no point waiting
+  CONTAINER_STATUS=$($COMPOSE_CMD ps backend --format '{{.Status}}' 2>/dev/null || echo "unknown")
+  if echo "$CONTAINER_STATUS" | grep -qi "exit\|dead\|removed"; then
+    err "Backend container has stopped unexpectedly: $CONTAINER_STATUS"
+    break
+  fi
+
+  log "  Health check attempt $i/$HEALTH_RETRIES — docker: ${CONTAINER_HEALTH}, retrying in ${HEALTH_INTERVAL}s ..."
+  sleep "$HEALTH_INTERVAL"
+done
+
+if [ "$HEALTHY" = true ]; then
+  ok "Backend is healthy and responding at $HEALTH_URL"
+else
+  # Log diagnostics before triggering rollback
+  err "Backend failed to respond after ${TOTAL_WAIT}s"
+  warn "Container status: $($COMPOSE_CMD ps backend 2>/dev/null || echo 'unknown')"
+  warn "Recent backend logs:"
+  $COMPOSE_CMD logs --tail=30 backend 2>/dev/null || true
+  err "Triggering automatic rollback ..."
+  exit 1  # trap will handle rollback
+fi
+
+# ====================================================================
+#  STEP 7: Post-upgrade database backup
+# ====================================================================
+echo ""
+log "--- Step 6/6: Post-upgrade database backup ---"
+
+"$SCRIPT_DIR/db-backup.sh" backup
+
+# ====================================================================
+#  Deployment complete
+# ====================================================================
+DEPLOY_SUCCESS=true
+ROLLBACK_NEEDED=false
+
+DEPLOY_END_TIME=$(date +%s)
+DEPLOY_DURATION=$((DEPLOY_END_TIME - DEPLOY_START_TIME))
+
+echo ""
+log "============================================"
+ok "  DEPLOYMENT COMPLETE"
+log "============================================"
+log "  Previous commit : $PREV_COMMIT"
+log "  Current commit  : $NEW_COMMIT"
+log "  Migrations run  : $APPLIED_THIS_RUN"
+log "  Duration        : ${DEPLOY_DURATION}s"
+log "  Log file        : $LOG_FILE"
+log "============================================"
+echo ""
Author	SHA1	Message	Date
olsch01	a025c9e979	fix: health check now probes HTTP directly with 3-min timeout The previous approach relied on Docker's container health status, but Docker's healthcheck (start_period:30s + 3x15s retries = ~75s) marks the container "unhealthy" before NestJS finishes cold-starting after a fresh image build (New Relic + TypeORM + Redis + BullMQ init can take 2-3 minutes). Changes: - Primary check is now direct wget to localhost:3000/api from the host - Docker health status used only for informational logging - Total timeout increased from 130s to 190s (~3 min) for cold starts - Early exit if container has stopped/exited (no point waiting) - More backend log lines (30 vs 20) shown on failure for diagnostics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 09:42:49 -04:00
olsch01	19bd19b0c4	docs: add Gitea Actions runner setup guide for production server Step-by-step guide covering act_runner installation, registration, host execution mode configuration, systemd service setup, and troubleshooting for the HOALedgerIQ production deployment workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 09:35:49 -04:00
olsch01	3e7463cf46	fix: replace curl with Docker health status and wget for health check The health check used curl which is not installed on the prod server. Replace with a dual approach: 1. Primary: check Docker's own container health status (already running via docker-compose.prod.yml healthcheck with wget inside container) 2. Secondary: wget from host as fallback signal Also add diagnostic logging (container status + recent backend logs) before triggering rollback on health check failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 09:22:28 -04:00
olsch01	2aad137bd7	fix: resolve unbound variable error in deploy script migration check The APPLIED_MIGRATIONS associative array triggered "unbound variable" under set -u when empty (first run / seed-existing). Fix by initializing with =() and using a safe helper function with ${:-} default syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 09:15:54 -04:00
olsch01	e06ca74d1d	fix: remove bc dependency from db-backup.sh format_size function Replace bc-based floating point division with pure bash integer arithmetic so the script works on minimal Ubuntu servers without bc installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 09:09:24 -04:00
olsch01	95c83a57b6	feat: add production deploy script with auto-rollback and Gitea Actions workflow Add automated production deployment pipeline: - scripts/deploy-prod.sh: Full deployment script with pre/post DB backups, migration tracking via shared.schema_migrations table, health checks, and automatic rollback on failure (restores DB, reverts code, rebuilds) - .gitea/workflows/deploy.yml: Manual-trigger Gitea Actions workflow for intentional production deployments with optional --seed-existing flag - scripts/db-backup.sh: Add --yes/-y flag to skip interactive confirmation prompts, enabling automated restore during rollback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 09:05:45 -04:00