IT operations without a doubt is the most important aspect of IT. Operations is where all analysis, planning, design and implementation should come to fruition. That is where the real action is, where the real business value is achieved. The DevOps movement strives to bringing the design time and the runtime of IT closer together, making it a joint responsibility. Achieving truly agile IT seems only realistic when DevOps principles are applied. These include a high degree of automation for many aspects of IT development and operations – from build, test and deploy to provision, monitor, configure, patch, scale. Additionally, operational responsibility includes trends analysis and capacity planning: looking at historical performance, what proactive measures are required to achieve continued or improved performance?
The Oracle Management Cloud is an umbrella over a series of services that all support IT operations through monitoring and analysis at various levels in the stack and on various aspects– performance, infrastructure health, compliance and security – and across a wide spectrum of technologies and spanning on premises and cloud based systems.
Through local agents and public REST APIs, metrics and log files are pushed to Oracle Management Cloud (OMC) from potentially a large number of systems – including applications, virtual machines, operating system and hardware. All data is centrally stored and analyzed. OMC is a great example of the ‘eat your own dogfood’ doctrine at Oracle. It leverages machine learning and predictive analytics, text mining, visualization and data discovery facilities from various Oracle PaaS offerings to offer insights, alerts and recommendations applied to run time metrics describing the IT operational state of affairs.
In addition Application Performance Monitoring (see below for some details), IT Analytics and Log Analytics that were launched before, Oracle announced several new OMC offerings:
- Infrastructure Monitoring – monitors the status and health of IT infrastructure – on-premises or on the cloud — from a single platform. Proactive monitoring across tiers enables administrators to be alerted on issues, troubleshoot and resolve these before they impact end users. It is similar to Application Performance Monitoring, but focusing on the infrastructure components and their performance and behavior
- Compliance – ongoing configuration assessments of cloud and on-premises deployments against industry (for example STIG), best practice, and custom standards, monitor whether system patch levels are according to the guidelines and SLAs, configuration drift across systems does not occur; when deviations are found, remediation tasks are created; compliance results are retained and scores are reported across time.
In short: rules, policies, SLAs are defined and a link is made between these and the metrics that are collected and from which compliance with these rules is assessed. Then the channels for collecting required data are set up and the gathering of metrics can start in anger. From the collected data about actual performance, behavior and configuration, an assessment is made against the rules and policies. Scores are calculated and for deviations there will be alerts, recommendations and even automated actions.
The following screenshot shows some of the results from compliance evaluation being surfaced:
- Security Monitoring & Analytics (SMA) –protects modern enterprises by enabling early detection of threats across on-premises and cloud assets.
By analyzing activity and the context for those activities, SMA can identify questionable situations and out right security breaches.
SMA provides rapid forensics with cyber attack chain discovery and visualization, proactive network, host, or cloud service layer remediation, across cloud environments and on premises assets, in a world of shadow IT, unmanaged devices, and a fading network perimeter. SMA offers SIEM (Security Information and Event Management) and UEBA (User and Entity Behavior Analytics) based on metrics collected and analyzed centrally from a wide range of systems across clouds and on premises. Here is an ‘attack chain’ visualized by SMA:
Some questions SMA will proactively answer: “Has this user (across identities) taken other anomalous actions?”, “What vulnerabilities is a server exposed to / not patched for?” and “What category of sites pose the most risk given user browser behavior?”. Note: read my previous article “Your security will be breached” if you need to be convinced of the value of Security Monitoring.
- Orchestration – a very new addition in OMC. It is positioned to execute tasks across the infrastructure, platforms and applications monitored by OMC. It is used to run scheduled jobs or impromptu proactive or corrective actions that arise from monitoring and analysis of operational metrics in the other OMC services.
Orchestration can call REST, scripts, or 3rd party automation frameworks, on both on-premises (through an agent) and cloud infrastructure, and will keep track of job progress and report on it. Orchestration is seen by Oracle as “Cloud CRON” and also as a central engine for provisioning environments and deployment jobs. How orchestration hangs together with services such as Developer Cloud Service, Oracle Functions, Enterprise Scheduling Services in JCS and SOA CS, Scheduled integrations in ICS and whether it is positioned by Oracle as the one and only central scheduling engine on the Oracle PaaS Cloud is not yet clear.
Application Performance Monitoring
Functional monitoring, focused on business value, is available on the Oracle Public Cloud through CIECS (Customer Insight and Engagement) and with Integration Analytics (Real-Time Business Integration Insight and Business Activity Monitoring). Non-functional monitoring – for aspects like performance, availability and technical errors – is offered through Application Performance Monitoring (APM) under the Oracle Management Cloud umbrella.
APM collects data on the actual experience of the end user – in the browser and on mobile devices– and also metrics for all user request related activities in the server. Through light weight agents on the JVM and in web applications, data is collected and fed to the APM cloud – from on premises applications or cloud based applications in either Oracle Public Cloud or a 3rd party cloud. APM has REST APIs that allow metrics to be forwarded from any application anywhere as long as it can speak REST.
APM was built from the ground up –offering the next generation of products such as BTM (Business Transaction Management) and RUEI (Real User Experience Insight). APM leverages rich visualizations and predictive analytics (aka machine learning) – to present findings in a meaningful way and guide the user to root causes and recommendations. Metrics are collected per [http] request and correlated throughout the stack – client, services, microservices and database – to give the end-to-end picture. This next screenshot shows how a request is visualized – across services, database calls and other actions:
These metrics can be aggregated and analyzed along various dimensions such as user location, time of day [or week or month], client browser or device, server host. Aggregated metrics can be used for trend analysis and capacity planning.
APM compares actual measured performance to predefined KPIs and thresholds, as well as current performance to past performance to spot changes and exceptions. It will publish alerts, notify operators and recommend actions. Operators can quickly drill down on the visual reports, finding details on a failed transaction – with all steps that make up that transaction – or a problematic application component – with all its actions over time and their metrics.
APM also provides synthetic tests: periodically running automated tests from specified locations to verify the health of application functions – both continued availability and required performance – even if currently no users are accessing those functions from those locations.
Monitoring End to End flows across Microservices
Even today it is sometimes very difficult to trace requests – either to APIs or services or to web applications – all the way across the application and platform & infrastructure landscape. If a request is slow or fails altogether, it can be quite difficult to find out where the problem really is. At least in most of today’s environments, the dependencies between components are fairly clear and the application components typically share the same platform – which makes tracing, logging and problem analyzing a lot simpler.
As we are moving to a world of microservices, that are by definition very independent – also at platform and potentially even infrastructure level – the end to end tracing for failing requests is getting even harder. And this is where APM can be of great significance. If all microservices – no matter where they are located – sent their metrics to APM and across all microservices we have a common way of identifying a request or flow instance, then it APM (and Log Analytics) will allow us to perform that cross microservices tracing, monitoring and problem analysis. The microservices are as independent as can be – but when the push comes to shove we can still see them for the logical collective they are.
Download the AMIS OOW16 Highlights for an overview of OOW16.