Deployment & Infrastructure
Deploy Courier with Docker, Aspire, or bare metal. CI/CD pipeline configuration.
Courier is deployed to Azure Container Apps across four environments. Docker images are built and pushed via GitHub Actions. Local development uses .NET Aspire for service orchestration.
14.1 Environments
| Environment | Purpose | Infrastructure | Database | Key Vault | Deployment Trigger |
|---|---|---|---|---|---|
| Local | Developer workstation | .NET Aspire + Docker Compose | PostgreSQL container (Testcontainers or Aspire) | Azure CLI credentials to shared dev Key Vault | Manual (dotnet run / aspire run) |
| Dev | Integration testing, feature branch validation | Azure Container Apps (single replica each) | Azure PG Flex (Basic tier, 1 vCore) | Shared dev Key Vault | Push to main or manually triggered |
| Staging | Pre-production validation, QA, performance testing | Azure Container Apps (mirrors prod replica count) | Azure PG Flex (General Purpose, mirrors prod tier) | Staging Key Vault (separate from dev/prod) | Promotion from Dev (manual approval gate) |
| Production | Live system | Azure Container Apps (scaled per 14.4) | Azure PG Flex (General Purpose, HA enabled) | Production Key Vault (FIPS 140-2 Level 2+) | Promotion from Staging (manual approval gate) |
Environment isolation: Each environment (Dev, Staging, Production) has its own Azure resource group, Container Apps Environment, PostgreSQL instance, and Key Vault. No shared infrastructure between Staging and Production.
14.2 Docker Images
Three Docker images are built from the repository:
14.2.1 API Host
# Courier.Api.Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
# Copy solution and restore
COPY Courier.sln .
COPY src/Courier.Api/*.csproj src/Courier.Api/
COPY src/Courier.Features/*.csproj src/Courier.Features/
COPY src/Courier.Infrastructure/*.csproj src/Courier.Infrastructure/
COPY src/Courier.Domain/*.csproj src/Courier.Domain/
RUN dotnet restore src/Courier.Api/Courier.Api.csproj
# Copy source and publish
COPY src/ src/
RUN dotnet publish src/Courier.Api/Courier.Api.csproj \
-c Release -o /app --no-restore
# Runtime image
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS runtime
WORKDIR /app
# FIPS: Base image must include OpenSSL 3.x with FIPS provider (fips.so)
# installed and self-tested. See Section 12.10.2 for full requirements.
# This config file activates the provider — but the module must already exist.
COPY infra/docker/openssl-fips.cnf /etc/ssl/openssl.cnf
ENV OPENSSL_CONF=/etc/ssl/openssl.cnf
# Build-time validation (non-blocking — logged as warning if unavailable)
RUN openssl list -providers 2>/dev/null | grep -q "FIPS" \
&& echo "FIPS provider: available" \
|| echo "WARNING: FIPS provider not found in base image — see Section 12.10.2"
COPY --from=build /app .
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
CMD curl -f http://localhost:8080/health || exit 1
ENTRYPOINT ["dotnet", "Courier.Api.dll"]
14.2.2 Worker Host
# Courier.Worker.Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY Courier.sln .
COPY src/Courier.Worker/*.csproj src/Courier.Worker/
COPY src/Courier.Features/*.csproj src/Courier.Features/
COPY src/Courier.Infrastructure/*.csproj src/Courier.Infrastructure/
COPY src/Courier.Domain/*.csproj src/Courier.Domain/
RUN dotnet restore src/Courier.Worker/Courier.Worker.csproj
COPY src/ src/
RUN dotnet publish src/Courier.Worker/Courier.Worker.csproj \
-c Release -o /app --no-restore
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS runtime
WORKDIR /app
# 7z CLI for archive operations
RUN apt-get update && apt-get install -y --no-install-recommends p7zip-full curl && \
rm -rf /var/lib/apt/lists/*
# FIPS: Same requirements as API host — see Section 12.10.2
COPY infra/docker/openssl-fips.cnf /etc/ssl/openssl.cnf
ENV OPENSSL_CONF=/etc/ssl/openssl.cnf
RUN openssl list -providers 2>/dev/null | grep -q "FIPS" \
&& echo "FIPS provider: available" \
|| echo "WARNING: FIPS provider not found in base image — see Section 12.10.2"
# Temp directory for job execution (mount volume in production)
RUN mkdir -p /data/courier/temp && chown -R app:app /data/courier
VOLUME ["/data/courier/temp"]
COPY --from=build /app .
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
CMD curl -f http://localhost:8081/health || exit 1
ENTRYPOINT ["dotnet", "Courier.Worker.dll"]
14.2.3 Frontend
Updated 2026-03-15: Changed from static export + nginx to standalone Node.js server. See Section 11.2 for rationale.
# Courier.Frontend.Dockerfile — Next.js Standalone
FROM node:22-alpine AS deps
WORKDIR /app
COPY src/Courier.Frontend/package.json src/Courier.Frontend/package-lock.json ./
RUN npm ci
FROM node:22-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY src/Courier.Frontend/ .
ARG NEXT_PUBLIC_API_URL=http://localhost:5000
ENV NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}
RUN npm run build # next build → standalone server in .next/standalone/
FROM node:22-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production PORT=3000 HOSTNAME=0.0.0.0
COPY --from=build /app/.next/standalone ./
COPY --from=build /app/.next/static ./.next/static
COPY --from=build /app/public ./public
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD wget -q --spider http://localhost:3000 || exit 1
CMD ["node", "server.js"]
The standalone output produces a self-contained Node.js server (~150MB image) that handles all routing natively — no nginx or SPA fallback configuration required. NEXT_PUBLIC_API_URL is baked into the JS bundle at build time, so the frontend image must be rebuilt per environment.
14.3 Azure Container Apps Configuration
Each environment deploys three Container Apps within a shared Container Apps Environment connected to a VNet.
14.3.1 Container App Definitions
# infra/container-apps/api.yaml
name: courier-api
properties:
configuration:
activeRevisionsMode: Single
ingress:
external: false # Internal only — fronted by API gateway / Front Door
targetPort: 8080
transport: http
secrets:
- name: db-connection-string
keyVaultUrl: https://{vault}.vault.azure.net/secrets/db-connection-string
identity: system
- name: appinsights-connection-string
keyVaultUrl: https://{vault}.vault.azure.net/secrets/appinsights-connection-string
identity: system
template:
containers:
- name: courier-api
image: couriercr.azurecr.io/courier-api:{tag}
resources:
cpu: 1.0
memory: 2Gi
env:
- name: ConnectionStrings__CourierDb
secretRef: db-connection-string
- name: KeyVault__Uri
value: https://{vault}.vault.azure.net
- name: ApplicationInsights__ConnectionString
secretRef: appinsights-connection-string
- name: ASPNETCORE_ENVIRONMENT
value: Production
probes:
- type: Liveness
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
- type: Readiness
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
scale:
minReplicas: 2
maxReplicas: 6
rules:
- name: http-scaling
http:
metadata:
concurrentRequests: "50"
# infra/container-apps/worker.yaml
name: courier-worker
properties:
configuration:
activeRevisionsMode: Single
# No ingress — worker has no external HTTP traffic
secrets:
- name: db-connection-string
keyVaultUrl: https://{vault}.vault.azure.net/secrets/db-connection-string
identity: system
- name: appinsights-connection-string
keyVaultUrl: https://{vault}.vault.azure.net/secrets/appinsights-connection-string
identity: system
template:
containers:
- name: courier-worker
image: couriercr.azurecr.io/courier-worker:{tag}
resources:
cpu: 2.0
memory: 4Gi
env:
- name: ConnectionStrings__CourierDb
secretRef: db-connection-string
- name: KeyVault__Uri
value: https://{vault}.vault.azure.net
- name: ApplicationInsights__ConnectionString
secretRef: appinsights-connection-string
- name: DOTNET_ENVIRONMENT
value: Production
volumeMounts:
- volumeName: temp-storage
mountPath: /data/courier/temp
probes:
- type: Liveness
httpGet:
path: /health
port: 8081
initialDelaySeconds: 15
periodSeconds: 30
volumes:
- name: temp-storage
storageType: AzureFile
storageName: courier-temp
scale:
minReplicas: 1
maxReplicas: 1 # Single instance in V1 (see Section 15)
# infra/container-apps/frontend.yaml
name: courier-frontend
properties:
configuration:
activeRevisionsMode: Single
ingress:
external: true # User-facing
targetPort: 80
transport: http
template:
containers:
- name: courier-frontend
image: couriercr.azurecr.io/courier-frontend:{tag}
resources:
cpu: 0.25
memory: 0.5Gi
scale:
minReplicas: 2
maxReplicas: 4
rules:
- name: http-scaling
http:
metadata:
concurrentRequests: "100"
14.3.2 Networking
┌──────────────────────────────────────────────────────────────┐
│ Azure Container Apps Environment │
│ (VNet integrated) │
│ │
│ ┌─────────────────┐ │
│ │ courier-frontend│◄──── Azure Front Door / CDN (HTTPS) │
│ │ (external) │ TLS termination, WAF │
│ └────────┬────────┘ │
│ │ internal │
│ ┌────────▼────────┐ │
│ │ courier-api │◄──── Frontend calls via internal FQDN │
│ │ (internal) │ http://courier-api.internal.{env} │
│ └────────┬────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ courier-worker │ No ingress — outbound only │
│ │ (no ingress) │ │
│ └─────────────────┘ │
│ │
└──────────┬───────────────────┬───────────────────┬──────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ PostgreSQL │ │ Key Vault │ │ Partner │
│ Flex Server │ │ (private │ │ SFTP/FTP │
│ (private │ │ endpoint) │ │ servers │
│ endpoint) │ │ │ │ (outbound │
│ │ │ │ │ NSG rules) │
└─────────────┘ └─────────────┘ └─────────────┘
- PostgreSQL: Accessible only via private endpoint within the VNet. No public access.
- Key Vault: Accessible via private endpoint or service endpoint within the VNet.
- Partner servers: Outbound connections allowed through NSG rules with destination IP allowlists per partner.
- Container Apps internal: API host is internal-only ingress. Frontend proxies to API via the Container Apps Environment internal DNS.
- External access: Only the frontend has external ingress, fronted by Azure Front Door for TLS termination, WAF, and DDoS protection.
14.3.3 Managed Identity
All three Container Apps use system-assigned managed identities for Azure resource access:
| Resource | Access Method |
|---|---|
| Azure Key Vault | Managed Identity with Key Vault Secrets User + Key Vault Crypto User roles |
| Azure Container Registry | Managed Identity with AcrPull role |
| Azure Blob Storage (archives) | Managed Identity with Storage Blob Data Contributor role |
| Application Insights | Connection string from Key Vault (no identity needed) |
No service principal passwords or connection strings on disk. DefaultAzureCredential in the .NET applications resolves to managed identity automatically in Container Apps.
14.4 Scaling Strategy
| Container App | Min Replicas | Max Replicas | Scaling Rule | Rationale |
|---|---|---|---|---|
courier-api | 2 | 6 | 50 concurrent requests per replica | Stateless — scales horizontally. Two minimum for availability. |
courier-worker | 1 | 1 | None (fixed) | V1 single-instance. Quartz AdoJobStore supports clustered mode for V2. |
courier-frontend | 2 | 4 | 100 concurrent requests per replica | Static files — very lightweight. Two minimum for availability. |
Worker single-instance constraint: The Worker host runs Quartz.NET, file monitors, and maintenance jobs. In V1, a single instance simplifies work claiming — although FOR UPDATE SKIP LOCKED (Section 5.8) prevents duplicate pickup, single-instance avoids edge cases around monitor deduplication and partition maintenance concurrency. See Section 2.7 for the throughput ceiling this implies. Quartz's AdoJobStore clustered mode plus the V2 event-driven architecture (Section 15) enables horizontal Worker scaling.
14.5 CI/CD Pipeline (GitHub Actions)
14.5.1 Pipeline Overview
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ PR Check │────►│ Build & │────►│ Deploy to │────►│ Deploy to │
│ │ │ Push Images │ │ Staging │ │ Production │
│ • Build │ │ │ │ │ │ │
│ • Unit tests│ │ Trigger: │ │ Trigger: │ │ Trigger: │
│ • Lint │ │ push to │ │ manual │ │ manual │
│ • Arch tests│ │ main │ │ approval │ │ approval │
│ │ │ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
14.5.2 PR Check Workflow
Runs on every pull request targeting main:
# .github/workflows/pr-check.yml
name: PR Check
on:
pull_request:
branches: [main]
jobs:
build-and-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: courier_test
ports: ["5432:5432"]
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: "10.0.x"
- name: Restore
run: dotnet restore
- name: Build
run: dotnet build --no-restore -c Release
- name: Unit Tests
run: dotnet test tests/Courier.Tests.Unit --no-build -c Release
- name: Architecture Tests
run: dotnet test tests/Courier.Tests.Architecture --no-build -c Release
- name: Integration Tests
run: dotnet test tests/Courier.Tests.Integration --no-build -c Release
env:
ConnectionStrings__CourierDb: "Host=localhost;Database=courier_test;Username=postgres;Password=test"
- name: Vulnerability Scan
run: dotnet list package --vulnerable --include-transitive 2>&1 | tee vuln-report.txt
continue-on-error: true
- name: Frontend Lint & Type Check
working-directory: src/Courier.Frontend
run: |
npm ci
npm run lint
npm run type-check
- name: Frontend Build
working-directory: src/Courier.Frontend
run: npm run build
env:
NEXT_PUBLIC_API_BASE_URL: http://localhost:5000/api/v1
NEXT_PUBLIC_ENTRA_CLIENT_ID: test-client-id
NEXT_PUBLIC_ENTRA_TENANT_ID: test-tenant-id
NEXT_PUBLIC_REDIRECT_URI: http://localhost:3000
14.5.3 Build & Deploy Workflow
Runs on push to main. Builds Docker images, pushes to Azure Container Registry, deploys to Dev automatically, then promotes to Staging and Production with manual approval gates:
# .github/workflows/deploy.yml
name: Build & Deploy
on:
push:
branches: [main]
workflow_dispatch: # Manual trigger
env:
REGISTRY: couriercr.azurecr.io
TAG: ${{ github.sha }}
jobs:
build-images:
runs-on: ubuntu-latest
permissions:
id-token: write # OIDC for Azure login
contents: read
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- uses: azure/docker-login@v2
with:
login-server: ${{ env.REGISTRY }}
- name: Build & Push API
run: |
docker build -f infra/docker/Courier.Api.Dockerfile -t $REGISTRY/courier-api:$TAG .
docker push $REGISTRY/courier-api:$TAG
- name: Build & Push Worker
run: |
docker build -f infra/docker/Courier.Worker.Dockerfile -t $REGISTRY/courier-worker:$TAG .
docker push $REGISTRY/courier-worker:$TAG
- name: Build & Push Frontend
run: |
docker build -f infra/docker/Courier.Frontend.Dockerfile \
--build-arg NEXT_PUBLIC_API_BASE_URL=${{ vars.DEV_API_URL }} \
--build-arg NEXT_PUBLIC_ENTRA_CLIENT_ID=${{ vars.DEV_ENTRA_CLIENT_ID }} \
--build-arg NEXT_PUBLIC_ENTRA_TENANT_ID=${{ vars.ENTRA_TENANT_ID }} \
--build-arg NEXT_PUBLIC_REDIRECT_URI=${{ vars.DEV_REDIRECT_URI }} \
-t $REGISTRY/courier-frontend:$TAG-dev .
docker push $REGISTRY/courier-frontend:$TAG-dev
deploy-dev:
needs: build-images
runs-on: ubuntu-latest
environment: dev
steps:
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Deploy to Dev
uses: azure/container-apps-deploy-action@v2
with:
containerAppName: courier-api
resourceGroup: courier-dev-rg
imageToDeploy: ${{ env.REGISTRY }}/courier-api:${{ env.TAG }}
- name: Deploy Worker to Dev
uses: azure/container-apps-deploy-action@v2
with:
containerAppName: courier-worker
resourceGroup: courier-dev-rg
imageToDeploy: ${{ env.REGISTRY }}/courier-worker:${{ env.TAG }}
- name: Deploy Frontend to Dev
uses: azure/container-apps-deploy-action@v2
with:
containerAppName: courier-frontend
resourceGroup: courier-dev-rg
imageToDeploy: ${{ env.REGISTRY }}/courier-frontend:${{ env.TAG }}-dev
deploy-staging:
needs: deploy-dev
runs-on: ubuntu-latest
environment: staging # Requires manual approval in GitHub
steps:
# Rebuild frontend with staging env vars, deploy all three apps
# Same pattern as deploy-dev with staging resource group and vars
- run: echo "Deploy to staging — same pattern with staging config"
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production # Requires manual approval in GitHub
steps:
# Rebuild frontend with production env vars, deploy all three apps
- run: echo "Deploy to production — same pattern with production config"
Frontend rebuilds per environment: Because NEXT_PUBLIC_* vars are baked in at build time, the frontend image is rebuilt for each environment with the correct API URL and Entra ID config. API and Worker images are identical across environments — only runtime env vars differ.
14.6 Database Migrations in CI/CD
DbUp migrations run automatically on API host startup only. The Worker does not run migrations — it validates the schema version on startup and refuses to start if the database is behind its expected version (see Section 13.1.1 for the full safety model).
The first API container to start acquires a PostgreSQL advisory lock, executes pending migrations, then releases the lock. If multiple API replicas start simultaneously (rolling deployment), the second replica blocks on the advisory lock until the first completes, then discovers all scripts are already applied and starts normally.
// Courier.Infrastructure/Migrations/MigrationRunner.cs
public class MigrationRunner : IHostedService
{
public async Task StartAsync(CancellationToken cancellationToken)
{
await using var conn = new NpgsqlConnection(_connectionString);
await conn.OpenAsync(cancellationToken);
// Advisory lock prevents concurrent migration runs across replicas
await using var cmd = new NpgsqlCommand(
"SELECT pg_advisory_lock(12345)", conn);
await cmd.ExecuteNonQueryAsync(cancellationToken);
try
{
var upgrader = DeployChanges.To
.PostgresqlDatabase(_connectionString)
.WithScriptsEmbeddedInAssembly(typeof(MigrationRunner).Assembly)
.WithTransactionPerScript()
.LogToConsole()
.Build();
var result = upgrader.PerformUpgrade();
if (!result.Successful)
throw new Exception($"Migration failed: {result.Error}");
}
finally
{
await using var unlock = new NpgsqlCommand(
"SELECT pg_advisory_unlock(12345)", conn);
await unlock.ExecuteNonQueryAsync(cancellationToken);
}
}
public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}
Deployment order: The CI/CD pipeline deploys API hosts first, then Worker hosts. This ensures the schema is migrated before the Worker validates it. If the order is reversed (Worker deployed before API), the Worker's SchemaVersionValidator detects the schema mismatch and enters a retry loop until the API migrates the database.
# In the GitHub Actions deploy job:
steps:
- name: Deploy API (runs migrations on startup)
uses: azure/container-apps-deploy-action@v2
with:
containerAppName: courier-api
# API starts → acquires advisory lock → runs migrations → releases lock
- name: Wait for API health check
run: |
for i in {1..30}; do
if curl -sf https://courier-api.dev/health; then exit 0; fi
sleep 5
done
exit 1
- name: Deploy Worker (validates schema version on startup)
uses: azure/container-apps-deploy-action@v2
with:
containerAppName: courier-worker
# Worker starts → checks schema_versions → starts if compatible
Failure behavior: If a migration script fails, WithTransactionPerScript() rolls back that individual script. The API host crashes (refuses to start), the health check fails, and the deployment is halted before the Worker is deployed. The advisory lock is released via finally block (and PostgreSQL auto-releases session locks on disconnect). See Section 13.1.1 for the full failure recovery procedure.
Destructive migration safety: Migrations that drop columns or tables follow a two-release deprecation cycle. Release N marks the column as unused (application code stops reading/writing). Release N+1 drops the column. This is enforced by code review — DbUp does not have a built-in guard.
14.7 Local Development (.NET Aspire)
Local development uses .NET Aspire to orchestrate all services with a single command:
// Courier.AppHost/Program.cs (Aspire orchestrator)
var builder = DistributedApplication.CreateBuilder(args);
// PostgreSQL with persistent volume
var postgres = builder.AddPostgres("courier-db")
.WithDataVolume("courier-pgdata")
.AddDatabase("CourierDb");
// Seq for local structured logging
var seq = builder.AddContainer("seq", "datalust/seq")
.WithEndpoint(port: 5341, targetPort: 80, name: "seq-ui")
.WithEnvironment("ACCEPT_EULA", "Y");
// API Host
var api = builder.AddProject<Projects.Courier_Api>("courier-api")
.WithReference(postgres)
.WithEnvironment("KeyVault__Uri", "https://courier-dev.vault.azure.net")
.WithEnvironment("Serilog__WriteTo__0__Args__serverUrl", "http://localhost:5341");
// Worker Host
var worker = builder.AddProject<Projects.Courier_Worker>("courier-worker")
.WithReference(postgres)
.WithEnvironment("KeyVault__Uri", "https://courier-dev.vault.azure.net")
.WithEnvironment("Serilog__WriteTo__0__Args__serverUrl", "http://localhost:5341");
// Frontend (npm dev server)
builder.AddNpmApp("courier-frontend", "../Courier.Frontend", "dev")
.WithReference(api)
.WithEndpoint(port: 3000, scheme: "http");
builder.Build().Run();
Local dev flow:
# Start everything
cd src/Courier.AppHost
dotnet run
# Aspire dashboard at https://localhost:15888
# API at http://localhost:5000
# Frontend at http://localhost:3000
# Seq at http://localhost:5341
# PostgreSQL at localhost:5432
14.8 Health Checks
Both API and Worker hosts expose health check endpoints used by Container Apps liveness and readiness probes.
API Host (/health and /health/ready):
builder.Services.AddHealthChecks()
.AddNpgSql(connectionString, name: "postgresql",
failureStatus: HealthStatus.Unhealthy)
.AddAzureKeyVault(new Uri(builder.Configuration["KeyVault:Uri"]!),
new DefaultAzureCredential(),
options => { options.AddSecret("db-connection-string"); },
name: "keyvault")
.AddCheck("self", () => HealthCheckResult.Healthy());
app.MapHealthChecks("/health", new HealthCheckOptions
{
Predicate = check => check.Name == "self" // Liveness — am I running?
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = _ => true // Readiness — can I serve requests?
});
Worker Host (/health):
Checks PostgreSQL, Key Vault, Quartz scheduler status, and disk space on the temp volume:
builder.Services.AddHealthChecks()
.AddNpgSql(connectionString, name: "postgresql")
.AddAzureKeyVault(vaultUri, credential, options => { }, name: "keyvault")
.AddCheck<QuartzHealthCheck>("quartz")
.AddDiskStorageHealthCheck(options =>
options.AddDrive("/data/courier/temp", 1024)); // Fail if < 1 GB free
14.9 Observability
| Signal | Tool | Coverage |
|---|---|---|
| Structured logs | Serilog → Seq (local) / Application Insights (deployed) | All services |
| Distributed traces | OpenTelemetry → Application Insights | API requests, DB queries, Key Vault calls, HTTP outbound |
| Metrics | Application Insights + Container Apps built-in metrics | CPU, memory, request rate, response time, error rate |
| Dashboards | Azure Portal + Application Insights workbooks | Execution success rate, latency percentiles, active monitors, key expiry |
| Alerts | Application Insights alert rules | Job failure rate > threshold, Worker unhealthy, database connection failures, key expiry within 30 days |
OpenTelemetry configuration:
builder.Services.AddOpenTelemetry()
.WithTracing(tracing =>
{
tracing.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddNpgsql()
.AddSource("Courier.JobEngine")
.AddSource("Courier.FileMonitor")
.AddAzureMonitorTraceExporter(options =>
options.ConnectionString = appInsightsConnectionString);
});
14.10 Backup & Disaster Recovery
| Component | Backup Strategy | RPO | RTO |
|---|---|---|---|
| PostgreSQL | Azure PG Flex automated backups (daily full + continuous WAL) | < 5 minutes (point-in-time restore) | < 1 hour |
| Key Vault | Azure-managed soft delete (90-day retention) + purge protection | 0 (Azure-managed replication) | < 15 minutes |
| Container images | Azure Container Registry with geo-replication | 0 (immutable tags) | < 5 minutes (redeploy) |
| Archived partitions | Azure Blob Storage with LRS (locally redundant) | 0 (written once, never modified) | N/A (cold storage) |
| Application code | GitHub repository | 0 (Git history) | < 30 minutes (rebuild + deploy) |
Database disaster recovery: Azure PG Flex supports point-in-time restore to any second within the backup retention window (default: 7 days, configurable to 35). For cross-region DR, a read replica in a secondary region can be promoted.
14.11 Infrastructure Summary
┌──────────────────────────────────────────────────────────────┐
│ AZURE RESOURCE GROUP │
│ (per environment) │
│ │
│ Container Apps Environment (VNet) │
│ ├── courier-api (2–6 replicas, internal ingress) │
│ ├── courier-worker (1 replica, no ingress) │
│ └── courier-frontend (2–4 replicas, external ingress) │
│ │
│ Azure Database for PostgreSQL Flexible Server │
│ ├── courier database │
│ ├── Private endpoint in VNet │
│ └── Automated backups (7-day retention) │
│ │
│ Azure Key Vault │
│ ├── Master encryption key (KEK) │
│ ├── Application secrets │
│ └── Private endpoint in VNet │
│ │
│ Azure Container Registry (shared across environments) │
│ ├── courier-api:{sha} │
│ ├── courier-worker:{sha} │
│ └── courier-frontend:{sha}-{env} │
│ │
│ Azure Blob Storage (archive) │
│ └── courier-archives container │
│ │
│ Azure Front Door (production only) │
│ ├── TLS termination │
│ ├── WAF rules │
│ └── Routes to courier-frontend │
│ │
│ Application Insights │
│ └── Logs, traces, metrics, alerts │
│ │
└──────────────────────────────────────────────────────────────┘