Courier MFT

Deployment & Infrastructure

Deploy Courier with Docker, Aspire, or bare metal. CI/CD pipeline configuration.

Courier is deployed to Azure Container Apps across four environments. Docker images are built and pushed via GitHub Actions. Local development uses .NET Aspire for service orchestration.

14.1 Environments

EnvironmentPurposeInfrastructureDatabaseKey VaultDeployment Trigger
LocalDeveloper workstation.NET Aspire + Docker ComposePostgreSQL container (Testcontainers or Aspire)Azure CLI credentials to shared dev Key VaultManual (dotnet run / aspire run)
DevIntegration testing, feature branch validationAzure Container Apps (single replica each)Azure PG Flex (Basic tier, 1 vCore)Shared dev Key VaultPush to main or manually triggered
StagingPre-production validation, QA, performance testingAzure Container Apps (mirrors prod replica count)Azure PG Flex (General Purpose, mirrors prod tier)Staging Key Vault (separate from dev/prod)Promotion from Dev (manual approval gate)
ProductionLive systemAzure Container Apps (scaled per 14.4)Azure PG Flex (General Purpose, HA enabled)Production Key Vault (FIPS 140-2 Level 2+)Promotion from Staging (manual approval gate)

Environment isolation: Each environment (Dev, Staging, Production) has its own Azure resource group, Container Apps Environment, PostgreSQL instance, and Key Vault. No shared infrastructure between Staging and Production.

14.2 Docker Images

Three Docker images are built from the repository:

14.2.1 API Host

# Courier.Api.Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src

# Copy solution and restore
COPY Courier.sln .
COPY src/Courier.Api/*.csproj src/Courier.Api/
COPY src/Courier.Features/*.csproj src/Courier.Features/
COPY src/Courier.Infrastructure/*.csproj src/Courier.Infrastructure/
COPY src/Courier.Domain/*.csproj src/Courier.Domain/
RUN dotnet restore src/Courier.Api/Courier.Api.csproj

# Copy source and publish
COPY src/ src/
RUN dotnet publish src/Courier.Api/Courier.Api.csproj \
    -c Release -o /app --no-restore

# Runtime image
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS runtime
WORKDIR /app

# FIPS: Base image must include OpenSSL 3.x with FIPS provider (fips.so)
# installed and self-tested. See Section 12.10.2 for full requirements.
# This config file activates the provider — but the module must already exist.
COPY infra/docker/openssl-fips.cnf /etc/ssl/openssl.cnf
ENV OPENSSL_CONF=/etc/ssl/openssl.cnf
# Build-time validation (non-blocking — logged as warning if unavailable)
RUN openssl list -providers 2>/dev/null | grep -q "FIPS" \
    && echo "FIPS provider: available" \
    || echo "WARNING: FIPS provider not found in base image — see Section 12.10.2"

COPY --from=build /app .
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
    CMD curl -f http://localhost:8080/health || exit 1
ENTRYPOINT ["dotnet", "Courier.Api.dll"]

14.2.2 Worker Host

# Courier.Worker.Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src

COPY Courier.sln .
COPY src/Courier.Worker/*.csproj src/Courier.Worker/
COPY src/Courier.Features/*.csproj src/Courier.Features/
COPY src/Courier.Infrastructure/*.csproj src/Courier.Infrastructure/
COPY src/Courier.Domain/*.csproj src/Courier.Domain/
RUN dotnet restore src/Courier.Worker/Courier.Worker.csproj

COPY src/ src/
RUN dotnet publish src/Courier.Worker/Courier.Worker.csproj \
    -c Release -o /app --no-restore

FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS runtime
WORKDIR /app

# 7z CLI for archive operations
RUN apt-get update && apt-get install -y --no-install-recommends p7zip-full curl && \
    rm -rf /var/lib/apt/lists/*

# FIPS: Same requirements as API host — see Section 12.10.2
COPY infra/docker/openssl-fips.cnf /etc/ssl/openssl.cnf
ENV OPENSSL_CONF=/etc/ssl/openssl.cnf
RUN openssl list -providers 2>/dev/null | grep -q "FIPS" \
    && echo "FIPS provider: available" \
    || echo "WARNING: FIPS provider not found in base image — see Section 12.10.2"

# Temp directory for job execution (mount volume in production)
RUN mkdir -p /data/courier/temp && chown -R app:app /data/courier
VOLUME ["/data/courier/temp"]

COPY --from=build /app .
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
    CMD curl -f http://localhost:8081/health || exit 1
ENTRYPOINT ["dotnet", "Courier.Worker.dll"]

14.2.3 Frontend

Updated 2026-03-15: Changed from static export + nginx to standalone Node.js server. See Section 11.2 for rationale.

# Courier.Frontend.Dockerfile — Next.js Standalone
FROM node:22-alpine AS deps
WORKDIR /app
COPY src/Courier.Frontend/package.json src/Courier.Frontend/package-lock.json ./
RUN npm ci

FROM node:22-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY src/Courier.Frontend/ .

ARG NEXT_PUBLIC_API_URL=http://localhost:5000
ENV NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}

RUN npm run build    # next build → standalone server in .next/standalone/

FROM node:22-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production PORT=3000 HOSTNAME=0.0.0.0

COPY --from=build /app/.next/standalone ./
COPY --from=build /app/.next/static ./.next/static
COPY --from=build /app/public ./public

EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD wget -q --spider http://localhost:3000 || exit 1
CMD ["node", "server.js"]

The standalone output produces a self-contained Node.js server (~150MB image) that handles all routing natively — no nginx or SPA fallback configuration required. NEXT_PUBLIC_API_URL is baked into the JS bundle at build time, so the frontend image must be rebuilt per environment.

14.3 Azure Container Apps Configuration

Each environment deploys three Container Apps within a shared Container Apps Environment connected to a VNet.

14.3.1 Container App Definitions

# infra/container-apps/api.yaml
name: courier-api
properties:
  configuration:
    activeRevisionsMode: Single
    ingress:
      external: false          # Internal only — fronted by API gateway / Front Door
      targetPort: 8080
      transport: http
    secrets:
      - name: db-connection-string
        keyVaultUrl: https://{vault}.vault.azure.net/secrets/db-connection-string
        identity: system
      - name: appinsights-connection-string
        keyVaultUrl: https://{vault}.vault.azure.net/secrets/appinsights-connection-string
        identity: system
  template:
    containers:
      - name: courier-api
        image: couriercr.azurecr.io/courier-api:{tag}
        resources:
          cpu: 1.0
          memory: 2Gi
        env:
          - name: ConnectionStrings__CourierDb
            secretRef: db-connection-string
          - name: KeyVault__Uri
            value: https://{vault}.vault.azure.net
          - name: ApplicationInsights__ConnectionString
            secretRef: appinsights-connection-string
          - name: ASPNETCORE_ENVIRONMENT
            value: Production
        probes:
          - type: Liveness
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 30
          - type: Readiness
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
    scale:
      minReplicas: 2
      maxReplicas: 6
      rules:
        - name: http-scaling
          http:
            metadata:
              concurrentRequests: "50"
# infra/container-apps/worker.yaml
name: courier-worker
properties:
  configuration:
    activeRevisionsMode: Single
    # No ingress — worker has no external HTTP traffic
    secrets:
      - name: db-connection-string
        keyVaultUrl: https://{vault}.vault.azure.net/secrets/db-connection-string
        identity: system
      - name: appinsights-connection-string
        keyVaultUrl: https://{vault}.vault.azure.net/secrets/appinsights-connection-string
        identity: system
  template:
    containers:
      - name: courier-worker
        image: couriercr.azurecr.io/courier-worker:{tag}
        resources:
          cpu: 2.0
          memory: 4Gi
        env:
          - name: ConnectionStrings__CourierDb
            secretRef: db-connection-string
          - name: KeyVault__Uri
            value: https://{vault}.vault.azure.net
          - name: ApplicationInsights__ConnectionString
            secretRef: appinsights-connection-string
          - name: DOTNET_ENVIRONMENT
            value: Production
        volumeMounts:
          - volumeName: temp-storage
            mountPath: /data/courier/temp
        probes:
          - type: Liveness
            httpGet:
              path: /health
              port: 8081
            initialDelaySeconds: 15
            periodSeconds: 30
    volumes:
      - name: temp-storage
        storageType: AzureFile
        storageName: courier-temp
    scale:
      minReplicas: 1
      maxReplicas: 1          # Single instance in V1 (see Section 15)
# infra/container-apps/frontend.yaml
name: courier-frontend
properties:
  configuration:
    activeRevisionsMode: Single
    ingress:
      external: true          # User-facing
      targetPort: 80
      transport: http
  template:
    containers:
      - name: courier-frontend
        image: couriercr.azurecr.io/courier-frontend:{tag}
        resources:
          cpu: 0.25
          memory: 0.5Gi
    scale:
      minReplicas: 2
      maxReplicas: 4
      rules:
        - name: http-scaling
          http:
            metadata:
              concurrentRequests: "100"

14.3.2 Networking

┌──────────────────────────────────────────────────────────────┐
│                    Azure Container Apps Environment           │
│                    (VNet integrated)                          │
│                                                              │
│  ┌─────────────────┐                                         │
│  │  courier-frontend│◄──── Azure Front Door / CDN (HTTPS)    │
│  │  (external)      │      TLS termination, WAF              │
│  └────────┬────────┘                                         │
│           │ internal                                         │
│  ┌────────▼────────┐                                         │
│  │  courier-api     │◄──── Frontend calls via internal FQDN  │
│  │  (internal)      │      http://courier-api.internal.{env}  │
│  └────────┬────────┘                                         │
│           │                                                  │
│  ┌────────┴────────┐                                         │
│  │  courier-worker  │      No ingress — outbound only        │
│  │  (no ingress)    │                                         │
│  └─────────────────┘                                         │
│                                                              │
└──────────┬───────────────────┬───────────────────┬──────────┘
           │                   │                   │
    ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
    │  PostgreSQL  │    │  Key Vault   │    │  Partner    │
    │  Flex Server │    │  (private    │    │  SFTP/FTP   │
    │  (private    │    │   endpoint)  │    │  servers    │
    │   endpoint)  │    │              │    │  (outbound  │
    │              │    │              │    │   NSG rules) │
    └─────────────┘    └─────────────┘    └─────────────┘
  • PostgreSQL: Accessible only via private endpoint within the VNet. No public access.
  • Key Vault: Accessible via private endpoint or service endpoint within the VNet.
  • Partner servers: Outbound connections allowed through NSG rules with destination IP allowlists per partner.
  • Container Apps internal: API host is internal-only ingress. Frontend proxies to API via the Container Apps Environment internal DNS.
  • External access: Only the frontend has external ingress, fronted by Azure Front Door for TLS termination, WAF, and DDoS protection.

14.3.3 Managed Identity

All three Container Apps use system-assigned managed identities for Azure resource access:

ResourceAccess Method
Azure Key VaultManaged Identity with Key Vault Secrets User + Key Vault Crypto User roles
Azure Container RegistryManaged Identity with AcrPull role
Azure Blob Storage (archives)Managed Identity with Storage Blob Data Contributor role
Application InsightsConnection string from Key Vault (no identity needed)

No service principal passwords or connection strings on disk. DefaultAzureCredential in the .NET applications resolves to managed identity automatically in Container Apps.

14.4 Scaling Strategy

Container AppMin ReplicasMax ReplicasScaling RuleRationale
courier-api2650 concurrent requests per replicaStateless — scales horizontally. Two minimum for availability.
courier-worker11None (fixed)V1 single-instance. Quartz AdoJobStore supports clustered mode for V2.
courier-frontend24100 concurrent requests per replicaStatic files — very lightweight. Two minimum for availability.

Worker single-instance constraint: The Worker host runs Quartz.NET, file monitors, and maintenance jobs. In V1, a single instance simplifies work claiming — although FOR UPDATE SKIP LOCKED (Section 5.8) prevents duplicate pickup, single-instance avoids edge cases around monitor deduplication and partition maintenance concurrency. See Section 2.7 for the throughput ceiling this implies. Quartz's AdoJobStore clustered mode plus the V2 event-driven architecture (Section 15) enables horizontal Worker scaling.

14.5 CI/CD Pipeline (GitHub Actions)

14.5.1 Pipeline Overview

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   PR Check   │────►│  Build &    │────►│  Deploy to   │────►│  Deploy to   │
│              │     │  Push Images │     │  Staging     │     │  Production  │
│  • Build     │     │              │     │              │     │              │
│  • Unit tests│     │  Trigger:    │     │  Trigger:    │     │  Trigger:    │
│  • Lint      │     │  push to     │     │  manual      │     │  manual      │
│  • Arch tests│     │  main        │     │  approval    │     │  approval    │
│              │     │              │     │              │     │              │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

14.5.2 PR Check Workflow

Runs on every pull request targeting main:

# .github/workflows/pr-check.yml
name: PR Check
on:
  pull_request:
    branches: [main]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: test
          POSTGRES_DB: courier_test
        ports: ["5432:5432"]
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-dotnet@v4
        with:
          dotnet-version: "10.0.x"

      - name: Restore
        run: dotnet restore

      - name: Build
        run: dotnet build --no-restore -c Release

      - name: Unit Tests
        run: dotnet test tests/Courier.Tests.Unit --no-build -c Release

      - name: Architecture Tests
        run: dotnet test tests/Courier.Tests.Architecture --no-build -c Release

      - name: Integration Tests
        run: dotnet test tests/Courier.Tests.Integration --no-build -c Release
        env:
          ConnectionStrings__CourierDb: "Host=localhost;Database=courier_test;Username=postgres;Password=test"

      - name: Vulnerability Scan
        run: dotnet list package --vulnerable --include-transitive 2>&1 | tee vuln-report.txt
        continue-on-error: true

      - name: Frontend Lint & Type Check
        working-directory: src/Courier.Frontend
        run: |
          npm ci
          npm run lint
          npm run type-check

      - name: Frontend Build
        working-directory: src/Courier.Frontend
        run: npm run build
        env:
          NEXT_PUBLIC_API_BASE_URL: http://localhost:5000/api/v1
          NEXT_PUBLIC_ENTRA_CLIENT_ID: test-client-id
          NEXT_PUBLIC_ENTRA_TENANT_ID: test-tenant-id
          NEXT_PUBLIC_REDIRECT_URI: http://localhost:3000

14.5.3 Build & Deploy Workflow

Runs on push to main. Builds Docker images, pushes to Azure Container Registry, deploys to Dev automatically, then promotes to Staging and Production with manual approval gates:

# .github/workflows/deploy.yml
name: Build & Deploy
on:
  push:
    branches: [main]
  workflow_dispatch:          # Manual trigger

env:
  REGISTRY: couriercr.azurecr.io
  TAG: ${{ github.sha }}

jobs:
  build-images:
    runs-on: ubuntu-latest
    permissions:
      id-token: write         # OIDC for Azure login
      contents: read
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - uses: azure/docker-login@v2
        with:
          login-server: ${{ env.REGISTRY }}

      - name: Build & Push API
        run: |
          docker build -f infra/docker/Courier.Api.Dockerfile -t $REGISTRY/courier-api:$TAG .
          docker push $REGISTRY/courier-api:$TAG

      - name: Build & Push Worker
        run: |
          docker build -f infra/docker/Courier.Worker.Dockerfile -t $REGISTRY/courier-worker:$TAG .
          docker push $REGISTRY/courier-worker:$TAG

      - name: Build & Push Frontend
        run: |
          docker build -f infra/docker/Courier.Frontend.Dockerfile \
            --build-arg NEXT_PUBLIC_API_BASE_URL=${{ vars.DEV_API_URL }} \
            --build-arg NEXT_PUBLIC_ENTRA_CLIENT_ID=${{ vars.DEV_ENTRA_CLIENT_ID }} \
            --build-arg NEXT_PUBLIC_ENTRA_TENANT_ID=${{ vars.ENTRA_TENANT_ID }} \
            --build-arg NEXT_PUBLIC_REDIRECT_URI=${{ vars.DEV_REDIRECT_URI }} \
            -t $REGISTRY/courier-frontend:$TAG-dev .
          docker push $REGISTRY/courier-frontend:$TAG-dev

  deploy-dev:
    needs: build-images
    runs-on: ubuntu-latest
    environment: dev
    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Deploy to Dev
        uses: azure/container-apps-deploy-action@v2
        with:
          containerAppName: courier-api
          resourceGroup: courier-dev-rg
          imageToDeploy: ${{ env.REGISTRY }}/courier-api:${{ env.TAG }}

      - name: Deploy Worker to Dev
        uses: azure/container-apps-deploy-action@v2
        with:
          containerAppName: courier-worker
          resourceGroup: courier-dev-rg
          imageToDeploy: ${{ env.REGISTRY }}/courier-worker:${{ env.TAG }}

      - name: Deploy Frontend to Dev
        uses: azure/container-apps-deploy-action@v2
        with:
          containerAppName: courier-frontend
          resourceGroup: courier-dev-rg
          imageToDeploy: ${{ env.REGISTRY }}/courier-frontend:${{ env.TAG }}-dev

  deploy-staging:
    needs: deploy-dev
    runs-on: ubuntu-latest
    environment: staging       # Requires manual approval in GitHub
    steps:
      # Rebuild frontend with staging env vars, deploy all three apps
      # Same pattern as deploy-dev with staging resource group and vars
      - run: echo "Deploy to staging — same pattern with staging config"

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production    # Requires manual approval in GitHub
    steps:
      # Rebuild frontend with production env vars, deploy all three apps
      - run: echo "Deploy to production — same pattern with production config"

Frontend rebuilds per environment: Because NEXT_PUBLIC_* vars are baked in at build time, the frontend image is rebuilt for each environment with the correct API URL and Entra ID config. API and Worker images are identical across environments — only runtime env vars differ.

14.6 Database Migrations in CI/CD

DbUp migrations run automatically on API host startup only. The Worker does not run migrations — it validates the schema version on startup and refuses to start if the database is behind its expected version (see Section 13.1.1 for the full safety model).

The first API container to start acquires a PostgreSQL advisory lock, executes pending migrations, then releases the lock. If multiple API replicas start simultaneously (rolling deployment), the second replica blocks on the advisory lock until the first completes, then discovers all scripts are already applied and starts normally.

// Courier.Infrastructure/Migrations/MigrationRunner.cs
public class MigrationRunner : IHostedService
{
    public async Task StartAsync(CancellationToken cancellationToken)
    {
        await using var conn = new NpgsqlConnection(_connectionString);
        await conn.OpenAsync(cancellationToken);

        // Advisory lock prevents concurrent migration runs across replicas
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_advisory_lock(12345)", conn);
        await cmd.ExecuteNonQueryAsync(cancellationToken);

        try
        {
            var upgrader = DeployChanges.To
                .PostgresqlDatabase(_connectionString)
                .WithScriptsEmbeddedInAssembly(typeof(MigrationRunner).Assembly)
                .WithTransactionPerScript()
                .LogToConsole()
                .Build();

            var result = upgrader.PerformUpgrade();
            if (!result.Successful)
                throw new Exception($"Migration failed: {result.Error}");
        }
        finally
        {
            await using var unlock = new NpgsqlCommand(
                "SELECT pg_advisory_unlock(12345)", conn);
            await unlock.ExecuteNonQueryAsync(cancellationToken);
        }
    }

    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}

Deployment order: The CI/CD pipeline deploys API hosts first, then Worker hosts. This ensures the schema is migrated before the Worker validates it. If the order is reversed (Worker deployed before API), the Worker's SchemaVersionValidator detects the schema mismatch and enters a retry loop until the API migrates the database.

# In the GitHub Actions deploy job:
steps:
  - name: Deploy API (runs migrations on startup)
    uses: azure/container-apps-deploy-action@v2
    with:
      containerAppName: courier-api
      # API starts → acquires advisory lock → runs migrations → releases lock

  - name: Wait for API health check
    run: |
      for i in {1..30}; do
        if curl -sf https://courier-api.dev/health; then exit 0; fi
        sleep 5
      done
      exit 1

  - name: Deploy Worker (validates schema version on startup)
    uses: azure/container-apps-deploy-action@v2
    with:
      containerAppName: courier-worker
      # Worker starts → checks schema_versions → starts if compatible

Failure behavior: If a migration script fails, WithTransactionPerScript() rolls back that individual script. The API host crashes (refuses to start), the health check fails, and the deployment is halted before the Worker is deployed. The advisory lock is released via finally block (and PostgreSQL auto-releases session locks on disconnect). See Section 13.1.1 for the full failure recovery procedure.

Destructive migration safety: Migrations that drop columns or tables follow a two-release deprecation cycle. Release N marks the column as unused (application code stops reading/writing). Release N+1 drops the column. This is enforced by code review — DbUp does not have a built-in guard.

14.7 Local Development (.NET Aspire)

Local development uses .NET Aspire to orchestrate all services with a single command:

// Courier.AppHost/Program.cs (Aspire orchestrator)
var builder = DistributedApplication.CreateBuilder(args);

// PostgreSQL with persistent volume
var postgres = builder.AddPostgres("courier-db")
    .WithDataVolume("courier-pgdata")
    .AddDatabase("CourierDb");

// Seq for local structured logging
var seq = builder.AddContainer("seq", "datalust/seq")
    .WithEndpoint(port: 5341, targetPort: 80, name: "seq-ui")
    .WithEnvironment("ACCEPT_EULA", "Y");

// API Host
var api = builder.AddProject<Projects.Courier_Api>("courier-api")
    .WithReference(postgres)
    .WithEnvironment("KeyVault__Uri", "https://courier-dev.vault.azure.net")
    .WithEnvironment("Serilog__WriteTo__0__Args__serverUrl", "http://localhost:5341");

// Worker Host
var worker = builder.AddProject<Projects.Courier_Worker>("courier-worker")
    .WithReference(postgres)
    .WithEnvironment("KeyVault__Uri", "https://courier-dev.vault.azure.net")
    .WithEnvironment("Serilog__WriteTo__0__Args__serverUrl", "http://localhost:5341");

// Frontend (npm dev server)
builder.AddNpmApp("courier-frontend", "../Courier.Frontend", "dev")
    .WithReference(api)
    .WithEndpoint(port: 3000, scheme: "http");

builder.Build().Run();

Local dev flow:

# Start everything
cd src/Courier.AppHost
dotnet run

# Aspire dashboard at https://localhost:15888
# API at http://localhost:5000
# Frontend at http://localhost:3000
# Seq at http://localhost:5341
# PostgreSQL at localhost:5432

14.8 Health Checks

Both API and Worker hosts expose health check endpoints used by Container Apps liveness and readiness probes.

API Host (/health and /health/ready):

builder.Services.AddHealthChecks()
    .AddNpgSql(connectionString, name: "postgresql",
        failureStatus: HealthStatus.Unhealthy)
    .AddAzureKeyVault(new Uri(builder.Configuration["KeyVault:Uri"]!),
        new DefaultAzureCredential(),
        options => { options.AddSecret("db-connection-string"); },
        name: "keyvault")
    .AddCheck("self", () => HealthCheckResult.Healthy());

app.MapHealthChecks("/health", new HealthCheckOptions
{
    Predicate = check => check.Name == "self"    // Liveness — am I running?
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = _ => true    // Readiness — can I serve requests?
});

Worker Host (/health):

Checks PostgreSQL, Key Vault, Quartz scheduler status, and disk space on the temp volume:

builder.Services.AddHealthChecks()
    .AddNpgSql(connectionString, name: "postgresql")
    .AddAzureKeyVault(vaultUri, credential, options => { }, name: "keyvault")
    .AddCheck<QuartzHealthCheck>("quartz")
    .AddDiskStorageHealthCheck(options =>
        options.AddDrive("/data/courier/temp", 1024));  // Fail if < 1 GB free

14.9 Observability

SignalToolCoverage
Structured logsSerilog → Seq (local) / Application Insights (deployed)All services
Distributed tracesOpenTelemetry → Application InsightsAPI requests, DB queries, Key Vault calls, HTTP outbound
MetricsApplication Insights + Container Apps built-in metricsCPU, memory, request rate, response time, error rate
DashboardsAzure Portal + Application Insights workbooksExecution success rate, latency percentiles, active monitors, key expiry
AlertsApplication Insights alert rulesJob failure rate > threshold, Worker unhealthy, database connection failures, key expiry within 30 days

OpenTelemetry configuration:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .AddNpgsql()
               .AddSource("Courier.JobEngine")
               .AddSource("Courier.FileMonitor")
               .AddAzureMonitorTraceExporter(options =>
                   options.ConnectionString = appInsightsConnectionString);
    });

14.10 Backup & Disaster Recovery

ComponentBackup StrategyRPORTO
PostgreSQLAzure PG Flex automated backups (daily full + continuous WAL)< 5 minutes (point-in-time restore)< 1 hour
Key VaultAzure-managed soft delete (90-day retention) + purge protection0 (Azure-managed replication)< 15 minutes
Container imagesAzure Container Registry with geo-replication0 (immutable tags)< 5 minutes (redeploy)
Archived partitionsAzure Blob Storage with LRS (locally redundant)0 (written once, never modified)N/A (cold storage)
Application codeGitHub repository0 (Git history)< 30 minutes (rebuild + deploy)

Database disaster recovery: Azure PG Flex supports point-in-time restore to any second within the backup retention window (default: 7 days, configurable to 35). For cross-region DR, a read replica in a secondary region can be promoted.

14.11 Infrastructure Summary

┌──────────────────────────────────────────────────────────────┐
│                    AZURE RESOURCE GROUP                       │
│                    (per environment)                          │
│                                                              │
│  Container Apps Environment (VNet)                           │
│  ├── courier-api         (2–6 replicas, internal ingress)    │
│  ├── courier-worker      (1 replica, no ingress)             │
│  └── courier-frontend    (2–4 replicas, external ingress)    │
│                                                              │
│  Azure Database for PostgreSQL Flexible Server               │
│  ├── courier database                                        │
│  ├── Private endpoint in VNet                                │
│  └── Automated backups (7-day retention)                     │
│                                                              │
│  Azure Key Vault                                             │
│  ├── Master encryption key (KEK)                             │
│  ├── Application secrets                                     │
│  └── Private endpoint in VNet                                │
│                                                              │
│  Azure Container Registry (shared across environments)       │
│  ├── courier-api:{sha}                                       │
│  ├── courier-worker:{sha}                                    │
│  └── courier-frontend:{sha}-{env}                            │
│                                                              │
│  Azure Blob Storage (archive)                                │
│  └── courier-archives container                              │
│                                                              │
│  Azure Front Door (production only)                          │
│  ├── TLS termination                                         │
│  ├── WAF rules                                               │
│  └── Routes to courier-frontend                              │
│                                                              │
│  Application Insights                                        │
│  └── Logs, traces, metrics, alerts                           │
│                                                              │
└──────────────────────────────────────────────────────────────┘