Overhead of Service Mesh–measuring and comparing Istio, Linkerd, Kuma and Consul image 31

Overhead of Service Mesh–measuring and comparing Istio, Linkerd, Kuma and Consul

A Service Mesh can provide many facilities regarding traffic management between microservices – from simple routing and load balancing, producing telemetry to applying complex routing logic and security policies. This layer of logic that is applied on top of what services are doing themselves is not free: additional latency is added and additional resources (notably memory and CPU) are used.

This article is an introduction and reference to another article that I just came across. That article – Service Mesh Performance Evaluation — Istio, Linkerd, Kuma and Consul by Dahn Youssefi, Florent Martin – investigates in detail and in a rigorous and fully documented manner what the observed overhead and its effect is regarding both response times and resource usage for four popular service mesh products running on top of a Kubernetes cluster. It is an excellent article and I suggest you read it for yourself.

image

I will “borrow” the authors’ main conclusions: the added latency is found to be quite small – a few ms in most cases. Only when OPA (Open Policy Agent) was introduce did the latency increase somewhat more, but activating other features had no noticable effect. There were cases where the mean response time decreased by adding the service mesh – presumably because it did better load balancing than Kubernetes’ round robin algorithm at high volumes, reducing the duration of the longest running requests at peak loads.

The authors suggest that the RAM and CPU usage stay low – compared to the service under test which is a Java Spring Boot Service which of course is not super lightweight itself. Important is that the resources used by the Service Mesh sidecars did not increase substantially with increased load. They did point out that Istio used quite a bit more CPU than the other service meshes. The most light weight service mesh turned out to be Linkerd. At higher loads, the authors suggest that Consul lagged a bit behind the other service meshes.

The article is worth a read – because of the rigor of the approach and the balanced evaluation of the findings. And of course because of the actual conclusions.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.