There are many different choices for a JVM for your Java application. Which would be the best to use? This depends on various factors. Performance being an important one. Solid performance research however is difficult. In this blog I’ll describe a setup I created to perform tests on different JVMs at the same time. I also looked at the effect of resource isolation (assigning specific CPUs and memory to the process). This effect was negligible. My test application consisted of a reactive (non-blocking) Spring Boot REST application and I’ve used Prometheus to poll the JVMs and Grafana for visualization. Below is an image of the used setup. Everything was running in Docker containers except SoapUI.
How can you be sure there is not something interfering with your measures? Of course you can’t be absolutely sure but you can try and isolate resources assigned to processes. For example assign a dedicated CPU and a fixed amount of memory. I also did several tests which put resource constraints on the load generating software, monitoring software and visualization software (assign different CPUs and memory to those resources). Assigning specific resources to the processes (using docker-compose v2 cpuset and memory parameters) did not seem to greatly influence the measures of individual process load and response times. I also compared startup, under load and without load situations. The findings did not change under these different circumstances.
Assigning a specific CPU and memory to a process
Using docker-compose to configure a specific CPU for a process is challenging. The version 3 docker-compose format does not support assigning a specific CPU to a process. In addition, the version 3 format does not support assigning resource constraints at all when you use docker-compose to run it. This is because the people working on Docker appear to want to get rid of docker-compose (which is a separately maintained Python wrapper around Docker commands) in favor of docker stack deploy which uses Docker Swarm and maybe Kubernetes in the future. You can imagine assigning a specific CPU in a potentially multi host environment is not trivial. Thus I migrated my docker-compose file back to version 2 format which does allow assigning specific CPUs to test this. The software to generate load and monitor the JVMs I assigned to specific CPUs not shared by the JVMs processing the load. I used the taskset command for this.
Measures under the same circumstances
How can you be sure that all measures are conducted under exactly the same circumstances? When I run a test against a JVM and run the same test scenario again tomorrow, my results will differ. This can have various causes such as different CPUs pickup the workload and those CPUs are also busy with other things or I’m running different background processes inside my host or guest OS. Even when first testing a single JVM and after the test, test another single JVM, the results will not be comparable since you cannot role out something has changed. For example I’m using Prometheus to gather measures. During the second run, the Prometheus database might be filled with more data. This might cause adding new data might be slower and this could influence the second JVM performance measures. This example might be rather far fetched though but you can think of other reasons by measures taken at different times can differ. That’s why I choose to perform all measures simultaneously.
My setup consisted of a docker-compose file which allowed me to easily start 4 times a reactive Spring Boot application running on the different JVMs. In front of the 4 JDKs I’ve put an haproxy instance to load balance requests. Why did I do this? To make sure there was no difference between the different tests by time related differences I did not account for; all JVMs were put under the same load at the same time.
In order to monitor results I’ve used Micrometer to provide and endpoint to enable Prometheus to read JVM metrics. I’ve used Grafana to visualize the data using the following dashboard: https://grafana.com/dashboards/4701
Since GraalVM is only available currently as a JDK 8 version, I’ve also used a JDK 8 version for the other JVMs also. When the container is running, the JVM version can be checked by accessing the actuator url: localhost:8080/actuator/env
or with for example
docker exec -it store/oracle/serverjre:8 java -version
I’ve used the following versions:
- GraalVM CE rc9 (8u192)
- OpenJDK 8u191
- Zulu 8u192
- Oracle JDK 8u181
Why the difference in versions? These were the versions which were available to me at the moment of writing this blog on hub.docker.com.
You can download the code here from the complete folder. You can run the setup with:
sh ./buildjdkcontainers.sh</pre> <pre class="graf graf--pre">docker-compose -f docker-compose-jdks.yml up
Next you can access
- the haproxy (which routes to different the JVMs) at localhost:8080
- Prometheus at localhost:9090
- Grafana at localhost:3000
You need to configure Grafana to access Prometheus;
Next you need to import the dashboard in Grafana;
Next you can do a load test on http://localhost:8080/hello (HTTP GET) and see the results in the Grafana dashboard. Prometheus itself can also feed information to Grafana and HAproxy can also by using an exporter. I did not configure this in my setup.
A difference between the different Docker images was the OS used within the image. The OS can be determined with:
docker exec -it store/oracle/serverjre:8 cat /etc/*-release
- azul/zulu-openjdk:8 used Ubuntu 18.04
- oracle/graalvm-ce:1.0.0-rc9 used Oracle Linux Server 7.5
- openjdk:8 used Debian GNU/Linux 9
- store/oracle/serverjre:8 used Oracle Linux Server 7.5
I don’t think this would have had much effect on the JVMs running inside (with Alpine I would have expected an effect). At least Oracle JDK and GraalVM use the same OS.
Using the JVM micrometer dashboard, it was easy to distinguish specific areas of difference in order to investigate them further.
GraalVM had the highest CPU usage overall during the test. Oracle JDK the lowest CPU usage.
Overall GraalVM had the worst response times and OpenJDK the best followed closely by Oracle JDK and Zulu. On average the difference was about 30% between OpenJDK and GraalVM.
Interesting to see is that GraalVM loads way more classes then the other JDKs. OpenJDK loads least classes. The difference between GraalVM and OpenJDK is about 25%. I have not yet determined if this is a fixed amount of additional classes overhead for GraalVM or if this scales with the amount of classes used and this is a fixed percentage.
Of course these additional classes could cause delays during garbage collection (although this correlation might not necessarily be a causation). Longer GC pause times for GraalVM is exactly what we see below though.
Below is a graph of the sum of the GC pause times. The longest pause times (the one line on top) are GC pause times due to allocation failures in GraalVM.
JVM memory usage is interesting to look at. As you can see in the above graph, the OpenJDK JVM uses most memory. The garbage collection behavior of GraalVM and Zulu appears to be similar, but GraalVM has a higher base memory usage. Oracle JDK appears to do garbage collection less often. When looking at averages the OpenJDK JVM uses most memory while Zulu uses the least. When looking at a zoomed out graph over a longer period, the behavior of Oracle JDK and OpenJDK seem erratic and can spike to relatively high values while Zulu and GraalVM seem more stable.
Overview I’ve conducted a load test using SOAP UI with a reactive Spring Boot REST application running on 4 different JVMs behind a round robin haproxy load balancer. I’ve used Prometheus to poll the JVM instances (which used Micrometer to produce data) every 5 seconds and used Grafana and Prometheus to visualize the data. The results would suggest GraalVM is not a suitable drop-in replacement JVM for for example OpenJDK since it performs worse, uses more resources, loads more classes and spends more time in garbage collection.
- GraalVM loads more classes for the same application
- GraalVM causes the slowest response times for the application
- GraalVM uses most CPU (to achieve the slowest response time)
- GraalVM uses most time on garbage collection
- Zulu OpenJDK uses least memory of the compared JVMs. Zulu OpenJDK and GraalVM are more stable in their memory usage when comparing to Oracle JDK and OpenJDK.
Of course since GraalVM is relatively new, it could be the metrics provided by Micrometer do not give a correct indication of actual throughput and resource usage. Also it could be my setup has liabilities which causes this difference. I tried to rule out the second though by looking at the metrics in different situations.
If you want to use the polyglot features of GraalVM, of course the other JVMs do not provide a suitable alternative. Also GraalVM provides a native compilation option which I did not test (I performed the test om the same JAR). This feature can potentially greatly increase performance.
GraalVM allows code to be compiled to a native executable. I’ve not looked at performance of these native files but potentially this could make GraalVM a lot more interesting. Also it would be interesting to see how the Prometheus metrics would behave in a native executable since there is no real JVM anymore in this case.
The application used was simple; a reactive Spring Boot REST service. The behavior under load might differ with more complex applications or for example when using blocking calls in Spring Boot.
Tweaking the JVM parameters
I’ve not specifically tweaked the JVM performance. This was out of the box without any specific tweaks. I’ve not looked at defaults for parameters or specific parameters for certain JVMs. It might be tweaked parameters cause very different results.
GraalVM EE and Java 11 (or 12 or …)
It would be interesting to check out GraalVM EE since it is compiled with Oracle JDK instead of OpenJDK. I’ve not found a Docker image available of this yet. Also comparing Java 11 with Java 8 would be interesting. More to come!