From cc8c1f7d31424a69c2e27076e4c05155877ca2f6 Mon Sep 17 00:00:00 2001 From: baarkerlounger Date: Mon, 27 Jun 2022 17:18:49 +0100 Subject: [PATCH] Add monitoring and logging --- docs/monitoring.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/monitoring.md b/docs/monitoring.md index e69de29bb..e646a773f 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -0,0 +1,13 @@ +# Infrastructure Metric monitoring + +We use self-hosted Prometheus and Grafana for monitoring infrastructure metrics. These are run in a dedicated Gov PaaS space called "monitoring" and are deployed as Docker images using Github action pipelines. The repository for these and more information is here: [dluhc-data-collection-monitoring](https://github.com/communitiesuk/dluhc-data-collection-monitoring). + +# Application & Performance monitoring & alerting + +For application error and performance monitoring we use managed [Sentry](https://sentry.io/organizations/dluhc-core). You will need to be added to the DLUHC account to access this. It triggers slack notifications to the #team-data-collection-alerts channel for all application errors in staging and production and for any controller endpoints that have a P95 transaction duration > 250ms over a 24 hour period. + +# Logs + +For log persistence we use a managed ELK (Elasticsearch, Logstash, Kibana) stack provided by [Logit](https://logit.io/). You will need to be added to the DLUHC account to access this. Longs are retained for 14 days with a daily limit of 2GB. + +Logs are also available from Gov PaaS directly via cli `cf logs --recent`.