Code Major

Application Performance Metrics with Spring Boot

Introduction

In this article we will have a look at what metrics are, why we need them and how to add them to a spring boot application. The assumption here is that we have basic knowledge of spring boot as well as spring boot actuator. To get started with spring boot actuator we can have a look at the spring boot actuator part 1 article and at part 2 for more depth.

What are application performance metrics?

Application performance metrics refers to the data collected from an application under usage which reveal the overall health of the application. The measurements will usually include CPU usage, average response time, error rates and throughput.

Why do we need application performance metrics?

The application performance metrics help us understand our applications better and consequently help us make informed decisions. Some of the things the data can reveal include:

  • CPU usage
  • RAM usage
  • Disk usage
  • Response times
  • Peak traffic hours
  • Error rates

Knowing your CPU, RAM and disk usage for example can help you avoid system failure or even help reduce costs.

Metric Types

We will have a look at common metric types and their uses.

Counter

A counter represents the count of an occurrence. Its value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of times an endpoint was invoked.

Gauge

A gauge is similar to a count in the sense that it is backed by a mutable numerical value. The backing value of a gauge can be incremented or decremented. Gauges can be used to measure things like the real-time sizes of collections, temperatures, memory usage, thread pool sizes.

Timer

A timer will measure the total time an event takes as well as the frequency to the event. It is important to note that the data from the timer is sent to prometheus partly as a summary or histogram and partly as a gauge.

Histogram

A histogram provides a distribution of measurements , with complete percentile data available. Data is distributed into equally sized intervals or “buckets”.

Summary

This is similar to a histogram and is preferred when an accurate latency value is desired without configuration of histogram buckets (They are used when the buckets of a metric are not known beforehand). They are calculated on the application level hence aggregation of metrics from multiple instances of the same process is not possible. It is important to note that histograms are the preferred type since they are more flexible and allow for aggregated percentiles.

Setup

We will use the micrometer API to create the metrics and configure our app for prometheus to pull the data. Here is a list of the various monitoring systems that micrometer supports. For this tutorial, we will be using prometheus. We will need the following dependencies:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>

<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
  <scope>runtime</scope>
</dependency>

Then add the following to your application.properties file:

management.metrics.export.prometheus.enabled=true
management.endpoints.web.exposure.include=*

Above, the exportation of prometheus metrics is enabled and all actuator endpoints are exposed. Now we can start creating and exposing metrics. Prometheus metrics can be accessed using the following endpoint.

Creating Metrics with Spring Boot

If using spring MVC, all the endpoints of your controller will have metrics automatically created for them. Each endpoint will have by default a Summary (latency sum and endpoint access count) or can be configured to emit a Histogram (holding the latency buckets, count of all request to the endpoint, and sum of the latency of that endpoint) and a Gauge (holding the latest latency value of the endpoint). However, if the endpoint of a controller that calls several services or several methods of a service has a latency issue, having metrics of the service methods it calls could help pinpoint the particular method that is the bottleneck.

Creating a counter

We will create a UserService class, add metrics to it and then invoke its methods from a controller. Below is the demonstration of the creation of a counter:

@Component
public class UserService {
    private final     Random  random = new Random();
    private final     Counter counter1;
    private final     Counter counter2;

    public UserService(MeterRegistry registry) {
        counter1 = Counter.builder("user_cnt")
                           .tags("method", "get_score")
                           .description("Measures request counts")
                           .register(registry);
        counter2 = Metrics.counter("usr_cnt", "method", "get_bonus");

    }

    public double getUserScore() {
        counter1.increment();
        return counter1.count();
    }

    public double getUserBonus() {
        counter2.increment();
        return counter2.count();
    }
}

In the code above, spring boot injects the PrometheusMeterRegistry bean into the constructor. We then create 2 counters using 2 different methods. “counter1” is created using the Counter interface. This style of counter creation is the way to go when an instance of type MeterRegistry can be injected. If dependency injection is not possible then use the Metrics class.

We can now invoke the service methods as shown in our controller below.

@RestController
@RequestMapping("/v1/user")
public class UserResource {
    private final UserService userService;
    private final Random random = new Random();

    public UserResource(UserService userService, ProductService productService) {
        this.userService = userService;
    }

    @GetMapping("/increment")
    public double incrementCounter() {
        return userService.getUserScore();
    }

    @GetMapping("/increment2")
    public double incre() {
        return userService.getUserBonus();
    }
}

Invoking the /increment endpoint and the /increment2 endpoint will emit metrics that can be seen in the raw form at the prometheus endpoint. The metrics below is what is emitted for prometheus

user_cnt_total{method="get_bonus"} 4.0
user_cnt_total{method="get_score"} 6.0

The above metrics indicate that the getBonus method of the UserService was invoked 4 times and its getScore method 6 times.

As already mentioned before, spring creates metrics for the invoked controller endpoints. Below, we have the counter metrics created for both endpoints.

http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/v1/user/increment2"} 4
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/v1/user/increment"} 6

We see that the controller metrics correspond to the service metrics since both endpoints just delegate to the UserService.

The third way to create a counter is by using annotations. In order to use the annotations, we need to add aspectj as a dependency and enable the annotations in our application.properties file:

		<dependency>
			<!-- needed for micrometer annotations -->
			<groupId>org.aspectj</groupId>
			<artifactId>aspectjweaver</artifactId>
			<version>1.9.22.1</version>
			<scope>runtime</scope>
		</dependency>

Below is the line to be added in the application.properties file:

micrometer.observations.annotations.enabled=true

Lets now create another counter using an annotation:

    @Counted(value = "usr_score", extraTags = {"user", "some_user"})
    public double getUserScore() {
        counter1.increment();
        return counter1.count();
    }

Above, a counter with the name “usr_score” and the tag “user” is created. The metric below is emitted.

usr_score_total{class="com.example.boot_metrics.user.UserService",exception="none",method="getUserScore",result="success",user="some_user"} 3.0

Creating a Gauge

We will create a gauge in the UserService class then access a method (refreshLoggedUsers) that modifies the underlying collection of the gauge so that we can watch the gauge value change as the underlying collection size changes as well.

Below, the Gauge interface is used to create the gauge, however the Metrics class can also be used.

    private final ConcurrentLinkedQueue<String> loggedUsers = new ConcurrentLinkedQueue<>(new ArrayList<>());

    public UserService(MeterRegistry registry) {
        counter1 = Counter.builder("user_cnt")
                           .tags("method", "get_score")
                           .description("Measures request counts")
                           .register(registry);
        counter2 = Metrics.counter("user_cnt", "method", "get_bonus");
        Gauge.builder("logged_users", loggedUsers::size).register(registry);
    }

    public long refreshLoggedUsers() {
        final int newSize = random.nextInt(10);
        final byte[] bytes = new byte[7];
        loggedUsers.clear();
        String user;
        for (int i = 0; i < newSize; i++) {
            random.nextBytes(bytes);
            user = new String(bytes, Charset.forName("UTF-8"));
            loggedUsers.add(user);
        }
        return loggedUsers.size();
    }


We also create the endpoint in the controller.

    @GetMapping("/online")
    public long onlineUsers() {
        return userService.refreshLoggedUsers();
    }

Each time we access the endpoint, the loggedUsers queue is resized and the gauge records the change. Below is a sample metric we get from the prometheus endpoint.

logged_users 8.0

Spring also creates a gauge for the /online endpoint. The value of the gauge is latest value of the latency of the /online endpoint.

http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/v1/user/online"} 0.079862932

Creating a Histogram or a Summary

We will create a new service (ProductService) with a method (updateProduct) that has a Summary, to measure the method latency. Likewise we will create a method in the UserService that also has a Summary that measures its latency. We will then create an endpoint in our controller that invokes both endpoints.

@Component
public class ProductService {

    private final Random random = new Random();

    @Timed(value="product", extraTags = {"node", "default"}) 
    public long updateProduct() throws InterruptedException {
        final long millis = System.currentTimeMillis();
        TimeUnit.of(ChronoUnit.MILLIS).sleep(random.nextInt(1000));
        return System.currentTimeMillis() - millis;
    }
}

Above, we use the @Timed annotation, which creates a Summary by default. Adding histogram = true to the annotation will make a histogram metric to be emitted.

Our UserService has an identical method

    @Timed(value="usr_search", extraTags = {"node", "default"}) 
    public long fetchUser() throws InterruptedException {
        final long millis = System.currentTimeMillis();
        TimeUnit.of(ChronoUnit.MILLIS).sleep(random.nextInt(1000));
        return System.currentTimeMillis() - millis;
    }

Now, lets invoke both methods from our controller

    @GetMapping("/latency") 
    public long latency() throws InterruptedException {
        final long fetchUserLatency = userService.fetchUser();
        return productService.updateProduct() + fetchUserLatency;
    }

We can now invoke the /latency endpoint in order to get its metrics emitted. The summary metrics emitted are shown below.

product_seconds_count{class="com.example.boot_metrics.user.ProductService",exception="none",method="updateProduct",node="default"} 6
product_seconds_sum{class="com.example.boot_metrics.user.ProductService",exception="none",method="updateProduct",node="default"} 2.159236843
usr_search_seconds_count{class="com.example.boot_metrics.user.UserService",exception="none",method="fetchUser",node="default"} 6
usr_search_seconds_sum{class="com.example.boot_metrics.user.UserService",exception="none",method="fetchUser",node="default"} 3.090635951

Spring will also create the following summary for the /latency endpoint.

http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/v1/user/latency"} 6
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/v1/user/latency"} 5.309029031

The sum of the latency of both service methods is roughly equal to the latency of the /latency endpoint since those are the only two methods invoked.

The Summary metric does not provide details of the latency but a summary. The histogram provides better insides. To change the emitted metric type of the /latency endpoint from Summary to Histogram add the following configuration to the application.properties file.

management.metrics.distribution.slo.http.server.requests=100ms,250ms,500ms,750ms,1s

The above configuration will also limit the buckets to 100ms, 250ms,500ms, and 1s.

Also change the @Timed annotation to have histogram = true. Unfortunately the buckets cannot be limited (spring intentionally discourages it) without possibly having side effects.

Conclusion

We have seen how to enable, create and emit metrics using spring boot. This tutorial also has a repository that can be accessed.

2 responses to “Application Performance Metrics with Spring Boot”

  1. […] If not you can refer to actuator article 1 and actuator article part 2. There is also an article on spring boot metrics. That said spring boot observability sends out logging, metrics and traces in such a way that they […]

    Like

  2. […] actuator, and observability. If not you can refer to actuator article 1, actuator article part 2, spring boot metrics and the observability […]

    Like

Leave a reply to Data Access Observability – Code Major Cancel reply

Navigation

About

Writing on the Wall is a newsletter for freelance writers seeking inspiration, advice, and support on their creative journey.