Merge pull request 'fix: remove bc dependency from db-backup.sh format_size function' (#16 ) from feature-deploy-script into main

Reviewed-on: #16
Merge pull request 'feat: add production deploy script with auto-rollback and Gitea Actions workflow' (#15 ) from feature-deploy-script into main
2026-04-09 09:10:00 -04:00 · 2026-04-09 09:06:54 -04:00
2 changed files with 17 additions and 271 deletions
--- a/docs/gitea-runner-setup.md
+++ b/docs/gitea-runner-setup.md
@@ -1,230 +0,0 @@
 # Gitea Actions Runner Setup — HOALedgerIQ Production Server
 This guide walks through setting up a self-hosted Gitea Actions runner on the production server so the deployment workflow (`.gitea/workflows/deploy.yml`) can execute automatically.
 The runner uses **host execution mode** — jobs run directly on the server (not inside Docker containers) so the deploy script has access to Docker, the git repo, and the local filesystem.
 ---
 ## Prerequisites
 - Ubuntu Linux production server
 - Gitea instance (e.g., `https://git.sensetostyle.com`)
 - Docker and Docker Compose installed on the server
 - The HOALedgerIQ repo cloned at `/opt/hoa-ledgeriq`
 ---
 ## Step 1: Enable Actions in Gitea
 Ensure Actions are enabled in your Gitea configuration (`/etc/gitea/app.ini`):
 ```ini
 [actions]
 ENABLED = true
 ```
 Restart Gitea after making changes:
 ```bash
 sudo systemctl restart gitea
 ```
 ---
 ## Step 2: Get a Registration Token
 1. Log into your Gitea instance
 2. Navigate to **Site Administration** → **Actions** → **Runners**
 3. Copy the **Registration Token**
 > **Tip:** For tighter security, you can get a repo-scoped token instead:
 > Repo → **Settings** → **Actions** → **Runners** → copy the token shown there.
 > This limits the runner to only execute workflows from that specific repository.
 ---
 ## Step 3: Install the Act Runner Binary
 ```bash
 # Download the latest act_runner for x86_64 Linux
 wget https://dl.gitea.com/act_runner/latest/act_runner-linux-amd64
 # Make executable and install to system path
 chmod +x act_runner-linux-amd64
 sudo mv act_runner-linux-amd64 /usr/local/bin/act_runner
 # Verify installation
 act_runner --version
 ```
 > For ARM64 servers, use `act_runner-linux-arm64` instead.
 ---
 ## Step 4: Generate and Edit the Configuration
 ```bash
 sudo mkdir -p /etc/act_runner
 act_runner generate-config > /tmp/config.yaml
 ```
 Edit `/tmp/config.yaml` and set the **labels to use host execution mode**:
 ```yaml
 runner:
  labels:
    - "ubuntu-latest:host"
    - "ubuntu-22.04:host"
 ```
 The `:host` suffix tells the runner to execute jobs directly on the server rather than spinning up Docker containers. This is required because the deploy script needs access to:
 - The Docker socket (to run `docker compose`)
 - The git repository at `/opt/hoa-ledgeriq`
 - The backup scripts and database
 Move the config into place and lock down permissions:
 ```bash
 sudo mv /tmp/config.yaml /etc/act_runner/config.yaml
 sudo chmod 600 /etc/act_runner/config.yaml
 ```
 ---
 ## Step 5: Register the Runner
 ```bash
 act_runner register \
  --no-interactive \
  --instance "https://git.sensetostyle.com" \
  --token "YOUR_REGISTRATION_TOKEN_HERE" \
  --name "hoaledgeriq-prod" \
  --labels "ubuntu-latest:host,ubuntu-22.04:host" \
  --config /etc/act_runner/config.yaml
 ```
 This creates a `.runner` file in the current directory containing the registration state.
 > **Interactive alternative:** Run `act_runner register --config /etc/act_runner/config.yaml` and follow the prompts.
 ---
 ## Step 6: Set Up as a Systemd Service
 Create the service file at `/etc/systemd/system/act_runner.service`:
 ```ini
 [Unit]
 Description=Gitea Actions Runner (HOALedgerIQ Prod)
 Documentation=https://docs.gitea.com/usage/actions/act-runner
 After=docker.service network-online.target
 [Service]
 Type=simple
 User=root
 WorkingDirectory=/opt/hoa-ledgeriq
 ExecStart=/usr/local/bin/act_runner daemon --config /etc/act_runner/config.yaml
 Restart=always
 RestartSec=10
 StandardOutput=journal
 StandardError=journal
 [Install]
 WantedBy=multi-user.target
 ```
 > **Security note on `User=root`:** The deploy script needs to run `docker compose`, `git reset --hard`, etc. If you have a dedicated deploy user in the `docker` group with write access to `/opt/hoa-ledgeriq`, use that instead. Running as root is the simplest option but grants maximum privileges.
 Enable and start the service:
 ```bash
 sudo systemctl daemon-reload
 sudo systemctl enable act_runner
 sudo systemctl start act_runner
 ```
 ---
 ## Step 7: Verify the Runner Is Online
 Check the service is running:
 ```bash
 sudo systemctl status act_runner
 ```
 View logs:
 ```bash
 sudo journalctl -u act_runner -f
 ```
 Then confirm in Gitea:
 1. Go to **Site Administration** → **Actions** → **Runners**
 2. You should see **"hoaledgeriq-prod"** listed with status **Online**
 ---
 ## Step 8: Test the Workflow
 1. Go to your repo on Gitea → **Actions** tab
 2. Select the **"Deploy to Production"** workflow
 3. Click **Run Workflow**
 4. If this is the first deployment against an existing database, check the **"Mark existing migrations as applied"** box
 5. Monitor the run in the Actions tab
 ---
 ## Troubleshooting
 ### Runner shows as Offline
 ```bash
 # Check service status and logs
 sudo systemctl status act_runner
 sudo journalctl -u act_runner -n 50
 # Verify the instance URL is reachable from the server
 wget -qO- https://git.sensetostyle.com/api/v1/version
 ```
 ### Workflow stuck on "Waiting for runner"
 - Verify the runner labels match what the workflow expects. The workflow uses `runs-on: ubuntu-latest` which must match the `ubuntu-latest:host` label.
 - Check the runner is registered at the correct scope (instance-wide, org-level, or repo-level).
 ### Permission denied errors during deploy
 - Ensure the systemd service `User` has Docker access (`usermod -aG docker <user>`)
 - Ensure the user has write access to `/opt/hoa-ledgeriq`
 ### Re-registering after token expiry
 ```bash
 sudo systemctl stop act_runner
 # Get a new token from Gitea admin panel, then:
 act_runner register \
  --no-interactive \
  --instance "https://git.sensetostyle.com" \
  --token "NEW_TOKEN_HERE" \
  --name "hoaledgeriq-prod" \
  --labels "ubuntu-latest:host,ubuntu-22.04:host" \
  --config /etc/act_runner/config.yaml
 sudo systemctl start act_runner
 ```
 ---
 ## Security Best Practices
 | Concern | Recommendation |
 |---------|----------------|
 | Runner user | Use a dedicated user with `docker` group access rather than `root` when possible |
 | Registration token | Rotate periodically in the Gitea admin panel |
 | Config file | Keep `/etc/act_runner/config.yaml` at mode `600` (owner-read only) |
 | Runner scope | Register at the **repo level** instead of instance-wide so only this repo can trigger deployments |
 | Workflow triggers | The deploy workflow uses `workflow_dispatch` (manual only) — no automatic triggers on push |
 | Network | Ensure Gitea is accessed over HTTPS with valid SSL certificates |
--- a/scripts/deploy-prod.sh
+++ b/scripts/deploy-prod.sh
@@ -40,9 +40,9 @@ DB_USER="${POSTGRES_USER:-hoafinance}"
 DB_NAME="${POSTGRES_DB:-hoafinance}"
 MIGRATION_DIR="$PROJECT_DIR/db/migrations"
 HEALTH_URL="http://localhost:3000/api"
-HEALTH_RETRIES=36
+HEALTH_RETRIES=20
 HEALTH_INTERVAL=5
-HEALTH_START_WAIT=10
+HEALTH_START_WAIT=30
 LOG_DIR="$PROJECT_DIR/logs"
 LOG_FILE="$LOG_DIR/deploy-$(date +%Y%m%d_%H%M%S).log"
@@ -248,19 +248,12 @@ CREATE TABLE IF NOT EXISTS shared.schema_migrations (
 SQL
 ok "Migration tracking table ready"
 # Helper: check if a migration has been applied (safe with set -u)
 is_applied() {
  local key="$1"
  # Use a subshell test to avoid unbound variable with set -u on empty associative arrays
  [[ -n "${APPLIED_MIGRATIONS[$key]:-}" ]]
 }
 # Step 5b: Get list of already-applied migrations
-declare -A APPLIED_MIGRATIONS=()
+declare -A APPLIED_MIGRATIONS
 while IFS= read -r fname; do
  fname=$(echo "$fname" | xargs)  # trim whitespace
  [ -n "$fname" ] && APPLIED_MIGRATIONS["$fname"]=1
-done < <(run_sql -t -c "SELECT filename FROM shared.schema_migrations ORDER BY filename;" 2>/dev/null || true)
+done < <(run_sql -t -c "SELECT filename FROM shared.schema_migrations ORDER BY filename;")
 APPLIED_COUNT=${#APPLIED_MIGRATIONS[@]}
 log "Previously applied migrations: $APPLIED_COUNT"
@@ -311,7 +304,7 @@ PENDING_COUNT=0
 APPLIED_THIS_RUN=0
 for filename in "${MIGRATION_FILES[@]}"; do
-  if is_applied "$filename"; then
+  if [ -n "${APPLIED_MIGRATIONS[$filename]+x}" ]; then
    continue
  fi
  ((PENDING_COUNT++))
@@ -324,7 +317,7 @@ else
  echo ""
  for filename in "${MIGRATION_FILES[@]}"; do
-    if is_applied "$filename"; then
+    if [ -n "${APPLIED_MIGRATIONS[$filename]+x}" ]; then
      continue
    fi
@@ -359,51 +352,34 @@ fi
 # ====================================================================
 echo ""
 log "--- Step 5/6: Verifying application health ---"
-
+log "Waiting ${HEALTH_START_WAIT}s for backend to initialize ..."
 # After a fresh image build, NestJS cold-start can take 2-3 minutes:
 #   New Relic init → TypeORM connections → Redis → BullMQ → NestJS bootstrap
 # Docker's own healthcheck (start_period:30s + 3×15s retries = ~75s) is too
 # aggressive and will mark the container "unhealthy" before the app finishes
 # booting. So we do NOT rely on Docker's health status — we probe the HTTP
 # endpoint directly from the host and give it up to ~3 minutes total.
 TOTAL_WAIT=$((HEALTH_START_WAIT + HEALTH_RETRIES * HEALTH_INTERVAL))
 log "Will wait up to ${TOTAL_WAIT}s for backend to respond at $HEALTH_URL ..."
 sleep "$HEALTH_START_WAIT"
 HEALTHY=false
 for ((i=1; i<=HEALTH_RETRIES; i++)); do
-  # Direct HTTP check from the host using wget (available on Ubuntu)
+  if curl -sf "$HEALTH_URL" >/dev/null 2>&1; then
  if wget -qO- --timeout=5 "$HEALTH_URL" >/dev/null 2>&1; then
    HEALTHY=true
    break
  fi
-
+  log "  Health check attempt $i/$HEALTH_RETRIES failed, retrying in ${HEALTH_INTERVAL}s ..."
  # Also check Docker's container health for informational logging
  CONTAINER_HEALTH=$($COMPOSE_CMD ps backend --format '{{.Health}}' 2>/dev/null || echo "unknown")
  # If the container exited or was removed, fail immediately — no point waiting
  CONTAINER_STATUS=$($COMPOSE_CMD ps backend --format '{{.Status}}' 2>/dev/null || echo "unknown")
  if echo "$CONTAINER_STATUS" | grep -qi "exit\|dead\|removed"; then
    err "Backend container has stopped unexpectedly: $CONTAINER_STATUS"
    break
  fi
  log "  Health check attempt $i/$HEALTH_RETRIES — docker: ${CONTAINER_HEALTH}, retrying in ${HEALTH_INTERVAL}s ..."
  sleep "$HEALTH_INTERVAL"
 done
 if [ "$HEALTHY" = true ]; then
  ok "Backend is healthy and responding at $HEALTH_URL"
 else
-  # Log diagnostics before triggering rollback
+  err "Backend failed to respond after $((HEALTH_START_WAIT + HEALTH_RETRIES * HEALTH_INTERVAL))s"
  err "Backend failed to respond after ${TOTAL_WAIT}s"
  warn "Container status: $($COMPOSE_CMD ps backend 2>/dev/null || echo 'unknown')"
  warn "Recent backend logs:"
  $COMPOSE_CMD logs --tail=30 backend 2>/dev/null || true
  err "Triggering automatic rollback ..."
  exit 1  # trap will handle rollback
 fi
 # Also verify the container reports healthy via Docker
 if $COMPOSE_CMD ps backend 2>/dev/null | grep -q "healthy"; then
  ok "Backend container health check: healthy"
 else
  warn "Backend container health status is not 'healthy' yet (may still be within start_period)"
 fi
 # ====================================================================
 #  STEP 7: Post-upgrade database backup
 # ====================================================================
Author	SHA1	Message	Date
JoeBot	f5bea7cdc2	Merge pull request 'fix: remove bc dependency from db-backup.sh format_size function' (#16 ) from feature-deploy-script into main Reviewed-on: #16	2026-04-09 09:10:00 -04:00
JoeBot	5144da4680	Merge pull request 'feat: add production deploy script with auto-rollback and Gitea Actions workflow' (#15 ) from feature-deploy-script into main Reviewed-on: #15	2026-04-09 09:06:54 -04:00