Skip to content

DataDog monitoring

Derek Fitchett edited this page Oct 22, 2024 · 17 revisions

How to get access

The Enterprise Command Center (ECC) group is responsible for Datadog issues such as adding or modifying user roles Enterprise monitoring, dashboards, or alerts for an Application Service, or to add or remove services, network devices, or administering other equipment to/from monitoring tasks.
Note: the DOTS team is no longer responsible for Datadog and will not be resolving Datadog requests.

Requesting an account entails submitting a ticket on VA's Enterprise Service Desk ServiceNow Portal at yourit.va.gov (must be on the VA network). Instructions for how to fill out a ticket for Datadog access: Datadog: Datadog Access.

Sample business justification:

As a member of the VA Virtual Regional Office (VRO) team, I am requesting access to existing Datadog dashboards on Lighthouse Delivery Infrastructure. An example of a dashboard I need to access: [url to dashboard].

To check the status of your tickets: My Tickets.

This is a revised version of an LHDI announcement in Dec 2023

DataDog

VRO Maintained Dashboards

Our deprecated DataDog account:

Partner Team Maintained Dashboards

LHDI

Va.gov

Custom Metrics

We've received conflicting feedback regarding use of custom metrics and Datadog's REST API. Please see the clarified use case info below, knowing that yes, the REST API is available for judicious use, provided awareness:

  1. Get the Datadog API and APP Key Environment Variables:
  • VRO Tenants are encouraged to use the shared global helm template that has been populated in each LHDI deployment of VRO. To reference this shared gloabl template: _datadog.tpl in your project's helm, you would add the following in the "env" section of your deployment.yaml: {{- include "vro.datadog.envVars" . | nindent 12 }}
  • use EP Merge app as an example
  • Please note that the environment variables as expected by the Datadog Python SDK are as follows:
  1. For more relevant documentation and additional API example code, please access the following docs:
Example call:
## Dynamic Points
# Post time-series data that can be graphed on Datadog's dashboards.
# Curl command
curl -X POST "https://api.ddog-gov.com/api/v2/series" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-d @- << EOF
{
  "series": [
    {
      "metric": "system.load.1",
      "type": 0,
      "points": [
        {
          "timestamp": 1703868203,
          "value": 0.6
        }
      ],
      "resources": [
        {
          "name": "dummyhost",
          "type": "host"
        }
      ]
    }
  ]
}
EOF

Be Mindful - When Using Datadog Custom Metrics

If used incorrectly, custom metrics can become prohibitively expensive in Datadog.

The main issue is when custom metrics are combined with highly variable tags (such as an ICN), which can greatly increase the cost. This is because we are charged for the all the metrics and tags combinations used during a billing period. For example, if we had a single failure metric but tagged with ICN, and there were failures in an a month for 1000 different users, we would be charged for 1000 metric/tag combinations. So, in general we just need to be mindful to not add unnecessary tags to any metrics we create.

Postgres RDS Metrics

LHDI now supports RDS metrics for Postgres Once enabled you can see Postgres metrics in Datadog using the metrics explorer.

Clone this wiki locally