Comparing JVM performance; Zulu OpenJDK, OpenJDK, Oracle JDK, GraalVM CE

Maarten Smeets November 23, 2018 Performance and tuning, Java 7 Comments

There are many different choices for a JVM for your Java application. Which would be the best to use? This depends on various factors. Performance being an important one. Solid performance research however is difficult. In this blog I’ll describe a setup I created to perform tests on different JVMs at the same time. I also looked at the effect of resource isolation (assigning specific CPUs and memory to the process). This effect was negligible. My test application consisted of a reactive (non-blocking) Spring Boot REST application and I’ve used Prometheus to poll the JVMs and Grafana for visualization. Below is an image of the used setup. Everything was running in Docker containers except SoapUI.

Isolated measures

How can you be sure there is not something interfering with your measures? Of course you can’t be absolutely sure but you can try and isolate resources assigned to processes. For example assign a dedicated CPU and a fixed amount of memory. I also did several tests which put resource constraints on the load generating software, monitoring software and visualization software (assign different CPUs and memory to those resources). Assigning specific resources to the processes (using docker-compose v2 cpuset and memory parameters) did not seem to greatly influence the measures of individual process load and response times. I also compared startup, under load and without load situations. The findings did not change under these different circumstances.

Assigning a specific CPU and memory to a process

Using docker-compose to configure a specific CPU for a process is challenging. The version 3 docker-compose format does not support assigning a specific CPU to a process. In addition, the version 3 format does not support assigning resource constraints at all when you use docker-compose to run it. This is because the people working on Docker appear to want to get rid of docker-compose (which is a separately maintained Python wrapper around Docker commands) in favor of docker stack deploy which uses Docker Swarm and maybe Kubernetes in the future. You can imagine assigning a specific CPU in a potentially multi host environment is not trivial. Thus I migrated my docker-compose file back to version 2 format which does allow assigning specific CPUs to test this. The software to generate load and monitor the JVMs I assigned to specific CPUs not shared by the JVMs processing the load. I used the taskset command for this.

Measures under the same circumstances

How can you be sure that all measures are conducted under exactly the same circumstances? When I run a test against a JVM and run the same test scenario again tomorrow, my results will differ. This can have various causes such as different CPUs pickup the workload and those CPUs are also busy with other things or I’m running different background processes inside my host or guest OS. Even when first testing a single JVM and after the test, test another single JVM, the results will not be comparable since you cannot role out something has changed. For example I’m using Prometheus to gather measures. During the second run, the Prometheus database might be filled with more data. This might cause adding new data might be slower and this could influence the second JVM performance measures. This example might be rather far fetched though but you can think of other reasons by measures taken at different times can differ. That’s why I choose to perform all measures simultaneously.

Setup

My setup consisted of a docker-compose file which allowed me to easily start 4 times a reactive Spring Boot application running on the different JVMs. In front of the 4 JDKs I’ve put an haproxy instance to load balance requests. Why did I do this? To make sure there was no difference between the different tests by time related differences I did not account for; all JVMs were put under the same load at the same time.

In order to monitor results I’ve used Micrometer to provide and endpoint to enable Prometheus to read JVM metrics. I’ve used Grafana to visualize the data using the following dashboard: https://grafana.com/dashboards/4701

Since GraalVM is only available currently as a JDK 8 version, I’ve also used a JDK 8 version for the other JVMs also. When the container is running, the JVM version can be checked by accessing the actuator url: localhost:8080/actuator/env

or with for example

docker exec -it store/oracle/serverjre:8 java -version

I’ve used the following versions:

GraalVM CE rc9 (8u192)
OpenJDK 8u191
Zulu 8u192
Oracle JDK 8u181

Why the difference in versions? These were the versions which were available to me at the moment of writing this blog on hub.docker.com.

Getting started

You can download the code here from the complete folder. You can run the setup with:

sh ./buildjdkcontainers.sh</pre>
<pre class="graf graf--pre">docker-compose -f docker-compose-jdks.yml up

Next you can access

the haproxy (which routes to different the JVMs) at localhost:8080
Prometheus at localhost:9090
Grafana at localhost:3000

You need to configure Grafana to access Prometheus;

Next you need to import the dashboard in Grafana;

Next you can do a load test on http://localhost:8080/hello (HTTP GET) and see the results in the Grafana dashboard. Prometheus itself can also feed information to Grafana and HAproxy can also by using an exporter. I did not configure this in my setup.

Different OSs

A difference between the different Docker images was the OS used within the image. The OS can be determined with:

docker exec -it store/oracle/serverjre:8 cat /etc/*-release

azul/zulu-openjdk:8 used Ubuntu 18.04
oracle/graalvm-ce:1.0.0-rc9 used Oracle Linux Server 7.5
openjdk:8 used Debian GNU/Linux 9
store/oracle/serverjre:8 used Oracle Linux Server 7.5

I don’t think this would have had much effect on the JVMs running inside (with Alpine I would have expected an effect). At least Oracle JDK and GraalVM use the same OS.

Results

Using the JVM micrometer dashboard, it was easy to distinguish specific areas of difference in order to investigate them further.

CPU usage

GraalVM had the highest CPU usage overall during the test. Oracle JDK the lowest CPU usage.

Response times

Overall GraalVM had the worst response times and OpenJDK the best followed closely by Oracle JDK and Zulu. On average the difference was about 30% between OpenJDK and GraalVM.

Garbage collection

Interesting to see is that GraalVM loads way more classes then the other JDKs. OpenJDK loads least classes. The difference between GraalVM and OpenJDK is about 25%. I have not yet determined if this is a fixed amount of additional classes overhead for GraalVM or if this scales with the amount of classes used and this is a fixed percentage.

Of course these additional classes could cause delays during garbage collection (although this correlation might not necessarily be a causation). Longer GC pause times for GraalVM is exactly what we see below though.

Below is a graph of the sum of the GC pause times. The longest pause times (the one line on top) are GC pause times due to allocation failures in GraalVM.

Memory usage

JVM memory usage is interesting to look at. As you can see in the above graph, the OpenJDK JVM uses most memory. The garbage collection behavior of GraalVM and Zulu appears to be similar, but GraalVM has a higher base memory usage. Oracle JDK appears to do garbage collection less often. When looking at averages the OpenJDK JVM uses most memory while Zulu uses the least. When looking at a zoomed out graph over a longer period, the behavior of Oracle JDK and OpenJDK seem erratic and can spike to relatively high values while Zulu and GraalVM seem more stable.

Summary

Overview I’ve conducted a load test using SOAP UI with a reactive Spring Boot REST application running on 4 different JVMs behind a round robin haproxy load balancer. I’ve used Prometheus to poll the JVM instances (which used Micrometer to produce data) every 5 seconds and used Grafana and Prometheus to visualize the data. The results would suggest GraalVM is not a suitable drop-in replacement JVM for for example OpenJDK since it performs worse, uses more resources, loads more classes and spends more time in garbage collection.

GraalVM loads more classes for the same application
GraalVM causes the slowest response times for the application
GraalVM uses most CPU (to achieve the slowest response time)
GraalVM uses most time on garbage collection
Zulu OpenJDK uses least memory of the compared JVMs. Zulu OpenJDK and GraalVM are more stable in their memory usage when comparing to Oracle JDK and OpenJDK.

Of course since GraalVM is relatively new, it could be the metrics provided by Micrometer do not give a correct indication of actual throughput and resource usage. Also it could be my setup has liabilities which causes this difference. I tried to rule out the second though by looking at the metrics in different situations.

If you want to use the polyglot features of GraalVM, of course the other JVMs do not provide a suitable alternative. Also GraalVM provides a native compilation option which I did not test (I performed the test om the same JAR). This feature can potentially greatly increase performance.

Further research

Native executables?

GraalVM allows code to be compiled to a native executable. I’ve not looked at performance of these native files but potentially this could make GraalVM a lot more interesting. Also it would be interesting to see how the Prometheus metrics would behave in a native executable since there is no real JVM anymore in this case.

Blocking calls

The application used was simple; a reactive Spring Boot REST service. The behavior under load might differ with more complex applications or for example when using blocking calls in Spring Boot.

Tweaking the JVM parameters

I’ve not specifically tweaked the JVM performance. This was out of the box without any specific tweaks. I’ve not looked at defaults for parameters or specific parameters for certain JVMs. It might be tweaked parameters cause very different results.

GraalVM EE and Java 11 (or 12 or …)

It would be interesting to check out GraalVM EE since it is compiled with Oracle JDK instead of OpenJDK. I’ve not found a Docker image available of this yet. Also comparing Java 11 with Java 8 would be interesting. More to come!

Tags:GraalVM, grafana, haproxy, jvm, openjdk, prometheus, zulu

About The Author

Maarten Smeets

Maarten is a Software Architect at AMIS Conclusion. Over the past years he has worked for numerous customers in the Netherlands in developer, analyst and architect roles on topics like software delivery, performance, security and other integration related challenges. Maarten is passionate about his job and likes to share his knowledge through publications, frequent blogging and presentations.

7 Comments

Bk July 27, 2019
Please include Corretto in your next test, I’d love to see how it compares. Thank you for the analysis above.
- Maarten Smeets July 27, 2019
  I’ve looked at various OpenJDK distributions and did not find solid reproducible differences. That is why I decided to not focus OpenJDK distribution differences but instead look at a single OpenJDK distribution and compare it to other JVM implementations. See for example http://bit.ly/JSpring2019PerformanceSlides.
Carmine November 25, 2018
I’m surprised to see Zulu perform so much better than OpenJDK in terms of memory consumption. From what I can, tell Zulu is essentially a build of OpenJDK with various support guarantees, i.e. no technical differences. Here is the list of Zuli key features: https://www.azul.com/products/zulu-enterprise/key-features-benefits/
- Andrew Dinn December 3, 2018
  This is not rocket science if you understand a few important details of how the OpenJDK GCs work. Many ‘like-for-like’ benchmarks can be highly misleading because of small but important details and that is the case here. The default settings for OpenJDK do not configure the GC to reclaim or minimize use of allocated heap. By default OpenJDK uses all the heap you allow it to (you get what you ask for) and if you don’t specify a heap limit it assumes it can a large slice of available memory.
  What is the consequence of using the OpenJDK defaults? As your program creates more and more objects OpenJDK simply maps in more heap memory in preference to performing a GC. It continues to do so right up to the configured -Xmx limit (heap max). It also retains all that memory after GC, even if the working set will easily fit in a much smaller amount of heap. Why? Well, the larger the heap the less often OpenJDK has to perform GC. So, retaining memory avoids work. Also, it takes work to deallocate and reallocate memory so not bothering to release memory means more work saved. Of course, the cost of GC is extremely low but why add work when the configuration says that there is memory available?
  The result is that memory performance is entirely determined by the size of the -Xmx setting. When a smaller -Xmx is configured OpenJDK lives within that smaller budget by GCing more often. Alternatively, if you need memory elasticity to cope with peaks and troughs in working set size you can configure a larger heap limit and switch on heap reclamation. In that case OpenJDK will try to keep heap size down by GCing more frequently. However, with the right configuration it will expand the heap to cope with a growing working set and shrink it again when the working set size declines. That may cost slightly more time in GC and memory mapping/unmapping but the cost is still marginal.
  So, the memory use shown here for OpenJDK is not an indication of what OpenJDK is capable of but rather of its default behaviour (i.e. greedy mode). That is evidenced by the fact that Zulu — essentially the same software — achieves very different memory performance. One can only assume that Zulu has changed the GC defaults (or, perhaps, tweaked the GC algorithms) so as to i) to auto-select a smaller -Xmx default or ii) to switch on heap reclamation by default.
  Note that the OpenJDK defaults mean that an unscrupulous tester can arrange to make OpenJDK look arbitrarily bad against alternative implementations simply by picking a large enough -Xmx setting (or running a benchmark wiht a small enough working set size). I am not suggesting that this was the intention in the experiment reported here. However, do be aware that ‘like-for-like’ benchmarks often ignore important details and be careful to look into the details to ensure that someone is not pulling the wool over your eyes.
  - Gil Tene January 18, 2019
    > “…One can only assume that Zulu has changed the GC defaults (or, perhaps, tweaked the GC algorithms)…”
    Nope. Zulu doesn’t change anything in this regard.
    But 8u192 did.
    I think the reason is simply https://bugs.openjdk.java.net/browse/JDK-8196595 (which may be wrongly labeled for the backfix version, and the backport to 8 appeared in the 8u192 SPU build, and not in the 8u191 CPU). The other (non-Graal) OpenJDKs compared were 8u191.
    Cheers.
Thomas Wuerthinger November 24, 2018
GraalVM when running on OpenJDK is in its current configuration using the application heap for the compiler’s data structures and also loads the compiler’s classes. Your measurement methodology hides the memory used by the native compilers, because it only shows the memory occupied in the heap and natively allocated memory is not counted at all. We are currently working on a solution to AOT compile the Graal compiler to make it look more like a native compiler; but generally total process memory usage would be a more appropriate metric.
One other aspect: The important metric for long-running services is performance of the system after reaching a steady state. Therefore one would usually have a warmup period and only then begin the measurements.
GraalVM native images provide significantly lower process memory usage and instant startup. Not sure whether they are already applicable for your demo though.
- Maarten Smeets November 25, 2018
  Thank you for the feedback! I will perform additional measurements in order to compare different runs using a single JVM at a time. I did look at startup, short term and longer term though. The conclusions are valid for those 3 situations. In a next blog post I’ll look at the process memory instead of heap since as you indicated, this will be a better measure of total memory used. Also I will update the setup to allow exporting the Prometheus data (probably by using InfluxDB or by exporting from Grafana) and make the raw data available. This will take some time though. I will get back to it!