Monitor Oracle SOA Suite service response times with Splunk

0

Measuring performance of services can be done in various ways. In this blog article I will describe a method of measuring Oracle SOA service response times with Splunk a popular monitoring tool. In order to monitor service response times with Splunk, Splunk needs to obtain its data from somewhere. In this example I’ll use the HTTP access log which I expand with a time-taken field. Disclaimer; my experience with Splunk is something like 2 hours. This might also be an indication of what can quickly be achieved with Splunk with little knowledge.

2015-12-19 08_36_57-PowerPoint Slide Show - [Presentation1]

Making service response times available

log_policy

At first I thought about using the OWSM policy oracle/log_policy. This policy can be applied to an endpoint. The policy logs request and response messages. There is however not (without alteration of the policy) a way to correlate the request with the response message. See the image below. An ECID is logged, but the ECID can differ for a request and response. The ECID can also be the same for different requests in a call chain. Several HTTP or SOAP headers could potentially be used for this, but they have to be present in the request and in the response. This can require some work, especially if you want to add them in the request, since all consumers need to use these headers.

Correlation of request and response messages is not possible OOTB

Correlation of request and response messages is not possible OOTB with the log_policy

access.log

The access.log file stores by default some data on HTTP requests such as timestamp, HTTP status code, url and the size of the request. This can easily be expanded with the time which is required to process a request. You can configure this from the WebLogic Console. Go to your server, Logging, HTTP and click the Advanced button. Here you can add time-taken as a field to log in the Extended Logging Format Fields.

Configure the WebLogic HTTP log

Configure the WebLogic HTTP log

Add the time-taken field

Add the time-taken field

Please mind that the access.log file is buffered. See here on how you can change the buffering behavior. Your results will thus not immediately be available in Splunk. A single request will not show up until the access.log has exceeded the buffer size.

Processing the response times in Splunk

In order to make service response times available in Splunk, you first have to tell it how it can obtain the data. Splunk can use a lot of different sources of data. The TCP/UDP and HTTP options open up a port on the local machine, which is then monitored. This port has to be available. Monitoring the WebLogic port is thus not an option since WebLogic has claimed it already.

Many sources of data can be used in Splunk

Many sources of data can be used in Splunk

In this case I have used Files & Directories and pointed it to the location where the access.log file was stored. In my case <DOMAINDIR>\servers\<SERVERDIR>\logs\access.log.

Browsing a log file

Browsing a log file

After you have selected your source, you can query the file. In order to allow querying on specific fields, it is useful to extract them from the log line. Splunk makes it easy to create regular expressions for this by simply selecting the fields.

Capture05Capture06 After you have done this, you can use a query like

| eval series=url | search url=”/soa-infra/*_ep” | xyseries _time series time_taken | makecontinuous _time

To create a nice visualization, I create a series based on the url. Next I only want specific HTTP requests belonging to interesting services. I create an xyseries based on these values and make the time part continuous. The result of this is visible below;

Capture07

Based on these results you can of course create alerts to check against SLA’s. A query to trigger an alert can be for example;

| search time_taken > 1 | search url=”/soa-infra/services/default/HelloWorld/helloworldprocess_client_ep”

Capture08 Capture09

Performing a query which can trigger alerts can be done continuously (real-time) or once in a while.

Finally

The method used does not allow you to measure service response times of services using other protocols as HTTP(S). Direct invocations / local optimized / SOA-Direct requests for example will not be logged or JMS request/reply patterns. Alternatives (described below) such as IWS and the DMS Spy servlet do allow you to monitor these services. What does appear in the access.log is if you stay on the same host but tell a composite not to use local invocation on a binding. This can be done by setting two properties in the reference binding in the composite.xml file. These properties are also useful when running on a cluster to allow load balancing of requests.

Capture10

Alternatives

There are of course many alternatives for monitoring performance and alerting. For (performance) monitoring of service response times you can look at the DMS Spy servlet or use Integration Workload Statistics (IWS). Other Oracle options are RUEI and BTM. For performance testing there are many tools available such as SmartBear SOAP-UI or Apache JMeter. For alerting, there are also several options available. You can use Service Bus default dashboards or for example configure alerts in the Enterprise Manager. It depends on your requirements which solution is the best fit. A more recent solution provided by Oracle is the Application Performance Monitoring Cloud Service which is definitely something worth to take a look at.

About Author

Maarten is a software architect and Oracle ACE. Over the past years he has worked for numerous customers in the Netherlands in developer, analyst and architect roles on topics like software delivery, performance, security and other integration related challenges. Maarten is passionate about his job and likes to share his knowledge through publications, frequent blogging and presentations.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.