Infrastructure & Monitoring

GeoLens exposes Prometheus-compatible metrics at /metrics and a connectivity health endpoint at /health. The admin web UI surfaces both at /admin/overview along with catalog statistics. This page covers the operator-facing monitoring surface; service-level diagnostics (Docker logs, database size queries) are at the bottom.

Replace https://geolens.example.com with your GeoLens instance’s URL in every example below.

Metrics endpoint

The /metrics endpoint serves Prometheus-format metrics, gzipped, response-buffered, with scrape paths excluded from histogram contamination. The endpoint is unauthenticated by default — restrict it with a reverse-proxy IP allowlist or basic auth in production.

Metric	Type	Labels	Description
`http_requests_total`	counter	method, status, handler	Total HTTP requests served
`http_request_duration_seconds`	histogram	method, status, handler	Request latency distribution
`http_requests_inprogress`	gauge	method, handler	In-flight requests
`geolens_jobs_queue_depth`	gauge	queue	Pending jobs (Procrastinate `status=todo`)
`geolens_jobs_active`	gauge	queue	Running jobs (Procrastinate `status=doing`)
`geolens_jobs_completed_total`	counter	queue	Completed jobs (since process start)
`geolens_jobs_failed_total`	counter	queue	Failed jobs (since process start)
`geolens_db_pool_checkedout`	gauge	—	Connections currently checked out
`geolens_db_pool_checkedin`	gauge	—	Connections currently available in pool
`geolens_db_pool_overflow`	gauge	—	Overflow connections currently open
`geolens_db_pool_size`	gauge	—	Configured pool size

Sample Prometheus scrape configuration:

scrape_configs:
  - job_name: geolens
    metrics_path: /metrics
    static_configs:
      - targets: ['geolens.example.com']

For Grafana dashboards, the geolens_jobs_queue_depth and geolens_db_pool_checkedout series are the most actionable — sustained queue depth above 50 indicates worker undersizing; sustained pool checkout near pool_size indicates DB-connection contention.

Health endpoint

GET /health returns 200 (healthy) or 503 (degraded), with a JSON body covering each provider:

{
  "status": "healthy",
  "providers": {
    "database": { "status": "ok", "latency_ms": 12.3 },
    "storage": { "status": "ok", "latency_ms": 45.2 },
    "cache": { "status": "ok", "latency_ms": 1.1 }
  }
}

The probes:

database — exercises a live SELECT to_regclass('catalog.datasets') (catches hung DB, broken search_path).
storage — calls the configured storage provider’s health_check() (S3 HeadBucket or local writability test).
cache — calls Valkey/Redis PING.

Use this endpoint as the upstream health check for load balancers and Kubernetes liveness/readiness probes. The 503 response is intentional — it signals “do not route traffic here” without 5xx-class application errors that would page on-call.

# Basic check
curl -fsS https://geolens.example.com/health || echo "unhealthy"

# Detailed JSON with latency breakdown
curl -s https://geolens.example.com/health | jq

For internal/private endpoints, the FastAPI process exposes the same health check directly at /health on the API port (default :8001 inside the Docker network). Use this when nginx/the frontend container is itself the failure point.

OIDC connectivity validation

OIDC connectivity validation runs separately from the standard /health probe — IdP discovery URLs are checked on demand rather than on every health poll, since cold-cache IdP fetches add 200–500 ms latency.

Trigger validation via the admin UI:

Navigate to Admin → Settings → Authentication.
Click Validate Connectivity in the toolbar.
The panel reports per-provider latency, status, and any error details (e.g., DNS failure, expired discovery cache, certificate mismatch).

Or via the API:

curl -X POST https://geolens.example.com/api/admin/validate-connectivity \
  -H "Authorization: Bearer $TOKEN"

The response includes one entry per enabled OIDC provider:

{
  "providers": [
    { "slug": "google", "status": "ok", "latency_ms": 142.7 },
    { "slug": "keycloak", "status": "error", "error": "Connection refused" }
  ]
}

Run validation after any of: (1) adding a new OAuth provider, (2) rotating client secrets, (3) network changes affecting outbound HTTPS to IdP endpoints, (4) certificate renewals on self-hosted IdPs.

Admin overview UI

/admin/overview shows real-time health badges for database, storage, cache, and each enabled OIDC provider, alongside catalog statistics:

Total datasets, total bytes, total feature count
Recent additions (last 30 days)
By-geometry-type breakdown (Point, LineString, Polygon, Raster, etc.)
By-visibility breakdown (public, private, restricted)
Most-active users (top 10 by upload count, last 30 days)

The health badges poll /health every 30 seconds; the catalog statistics are computed from a materialized view refreshed hourly. For real-time queue/worker metrics, use Prometheus + Grafana (the badges are intentionally coarse-grained).

Catalog statistics endpoint

For programmatic access to the same statistics:

curl https://geolens.example.com/api/admin/stats \
  -H "Authorization: Bearer $TOKEN"

Returns total datasets, recent additions (30 days), total storage bytes, datasets by geometry type, and datasets by visibility.

For database-level size queries:

docker compose exec db psql -U geolens -d geolens -c "
  SELECT pg_size_pretty(pg_database_size('geolens')) AS db_size;
"

Per-table sizes (largest datasets first):

docker compose exec db psql -U geolens -d geolens -c "
  SELECT table_name,
         pg_size_pretty(pg_total_relation_size('data.' || table_name)) AS size
  FROM catalog.datasets
  ORDER BY pg_total_relation_size('data.' || table_name) DESC;
"

Audit log

Every admin action is recorded in the audit log table. Inspect via the UI at Admin → Audit Log (filterable by action, user, resource, date range; supports CSV/JSON export) or via the API:

# All audit logs
curl https://geolens.example.com/api/admin/audit-logs \
  -H "Authorization: Bearer $TOKEN"

# Filter by action
curl "https://geolens.example.com/api/admin/audit-logs?action=dataset.export" \
  -H "Authorization: Bearer $TOKEN"

# Filter by user and date range
curl "https://geolens.example.com/api/admin/audit-logs?user_id={user_id}&date_from=2024-01-01" \
  -H "Authorization: Bearer $TOKEN"

Available audit actions cover datasets (dataset.view, dataset.export, metadata.edit), collections (collection.create, collection.update, collection.delete), maps (map.create, map.share, map.revoke_share), features (feature.insert, feature.update, feature.delete), embed tokens (embed_token.create, embed_token.revoke), OAuth providers (oauth_provider.create, oauth_provider.update), and config operations (config_import, update, reset, probe_service).

For long-term retention, archive audit logs to S3 nightly — there is no built-in retention/archival policy.

Docker logs & debugging

For service-level debugging beyond the metrics endpoint, use Docker Compose log streaming:

# Follow all logs
docker compose logs -f

# Follow specific service logs
docker compose logs -f api
docker compose logs -f db
docker compose logs -f worker

# Last 100 lines
docker compose logs --tail=100 api

For service health (Docker-level, not application-level):

# View all service statuses
docker compose ps

# Check specific service
docker compose ps db
docker compose ps api

Service health here reflects container restart status and entrypoint health checks — it does not exercise the application’s own provider probes. Use /health for application-level connectivity checks; use docker compose ps for “is the container running.”