How to instrument a nodejs application to emit process metrics

Node.js Metrics Exporter

To collect metrics from our Node.js application and expose it to Prometheus we can use the prom-client npm library.

What is prom-client?

It is a core instrumentation package. It provides metrics primitives to instrument code for monitoring. It also offers a registry for metrics.

Basic Version

In the following example, we will expose the process metrics of a nodejs application.

First we need to setup a express project with prom-client

$ npm init && npm install prom-client express
$ mkdir -p src
$ cd src && touch server.js
$ tree .
├── package-lock.json
├── package.json
├── readme.md
└── src
    └── server.js

Use your favourite editor and follow along the below steps

Initialize express server

const express = require('express');
const app = express();
const port = 8765;

Initialize prom-client
```
const client = require('prom-client');
```
Declare a prefix. This can be your app name
```
const prefix = 'basic_example_'
```
Enable prom-client to expose process metrics. You can read more about these metrics here
```
const collectDefaultMetrics = client.collectDefaultMetrics;
```
Setup a registry. This is the global registry that all metrics will be added to.
```
const Registry = client.Registry;
const register = new Registry();
```
Register the default process metrics to the registry along with the prefix
```
collectDefaultMetrics({register, prefix});
```

Declare a /metrics endpoint and it should return the metrics from the registry

app.get('/metrics', async (req, res) => {
    try {
        res.set('Content-Type', register.contentType);
        res.end(await register.metrics());
    } catch (ex) {
        res.status(500).end(ex);
    }
});

Fire up the server 🚀

app.listen(port, () => {
    console.log(`Example app listening at <http://localhost>:${port}, metrics exposed on /metrics endpoint`);
});

Explore the nodejs process metrics 🧐

$ curl <http://localhost:8765/metrics>

# HELP basic_example_process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE basic_example_process_cpu_user_seconds_total counter
basic_example_process_cpu_user_seconds_total 0.019112

# HELP basic_example_process_cpu_system_seconds_total Total system CPU time spent in seconds.
# TYPE basic_example_process_cpu_system_seconds_total counter
basic_example_process_cpu_system_seconds_total 0.006514

# HELP basic_example_process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE basic_example_process_cpu_seconds_total counter
basic_example_process_cpu_seconds_total 0.025626

# HELP basic_example_process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE basic_example_process_start_time_seconds gauge
basic_example_process_start_time_seconds 1670403812

# HELP basic_example_process_resident_memory_bytes Resident memory size in bytes.
# TYPE basic_example_process_resident_memory_bytes gauge
basic_example_process_resident_memory_bytes 51560448

# HELP basic_example_nodejs_eventloop_lag_seconds Lag of event loop in seconds.
# TYPE basic_example_nodejs_eventloop_lag_seconds gauge
basic_example_nodejs_eventloop_lag_seconds 0

# HELP basic_example_nodejs_eventloop_lag_min_seconds The minimum recorded event loop delay.
# TYPE basic_example_nodejs_eventloop_lag_min_seconds gauge
basic_example_nodejs_eventloop_lag_min_seconds 0.00946176

# HELP basic_example_nodejs_eventloop_lag_max_seconds The maximum recorded event loop delay.
# TYPE basic_example_nodejs_eventloop_lag_max_seconds gauge
basic_example_nodejs_eventloop_lag_max_seconds 0.011362303

# HELP basic_example_nodejs_eventloop_lag_mean_seconds The mean of the recorded event loop delays.
# TYPE basic_example_nodejs_eventloop_lag_mean_seconds gauge
basic_example_nodejs_eventloop_lag_mean_seconds 0.010913208630303031

# HELP basic_example_nodejs_eventloop_lag_stddev_seconds The standard deviation of the recorded event loop delays.
# TYPE basic_example_nodejs_eventloop_lag_stddev_seconds gauge
basic_example_nodejs_eventloop_lag_stddev_seconds 0.0003142510682862047

# HELP basic_example_nodejs_eventloop_lag_p50_seconds The 50th percentile of the recorded event loop delays.
# TYPE basic_example_nodejs_eventloop_lag_p50_seconds gauge
basic_example_nodejs_eventloop_lag_p50_seconds 0.011059199

# HELP basic_example_nodejs_eventloop_lag_p90_seconds The 90th percentile of the recorded event loop delays.
# TYPE basic_example_nodejs_eventloop_lag_p90_seconds gauge
basic_example_nodejs_eventloop_lag_p90_seconds 0.011091967

This is how we can start instrumenting a nodejs application. Possibilities are endless. Depending upon how you are running this in production (VM or Containers), you can configure prometheus to scrape this endpoint and send this metric data to your Levitate Cluster.