Reflections after Oracle OpenWorld 2015: IT Operations & Management image60

Reflections after Oracle OpenWorld 2015: IT Operations & Management

image

This article gives an overview of some of the most eye catching stories from Oracle OpenWorld 2015 around IT Ops and management. It discusses the ‘single pane of glass’ across on premises and cloud – Enterprise Manager – and introduces the new cloud service: Oracle Management Cloud Service.

IT Operations (IT Ops for short) obviously play a crucial role in providing any kind of IT service to any organization. IT Ops is about provisioning the environment to start with as well as keeping that environment up and running according to the agreed upon specifications (usually laid down in an SLA or Service Level Agreement).

IT Ops deal with facilities at various levels of the stack – with different specialists involved at different levels. IT Ops could said to start at a basic, physical level with dry, level floors, power supply and temperature control and move across the stack through infrastructure (servers, storage and network) and platform (operating system, LDAP, database, application server, content management, search engine, memory grid). The functional aspects of applications are not IT Ops’ cup of tea, but ensuring the applications are deployed and configured and keep on running usually is.

Over recent years, a lot has changed around IT Ops. Virtualization has been and still is major change driver – first of servers and increasingly of networks (SDN or software defined networking) as well. Consolidation has been an important theme too, first at the infrastructure (engineered systems) and operating system level (hypervisors and more recently with Linux containers) and increasingly within the platform as well: in the database (pluggable database architecture) and the application server (WebLogic partitions). The cry for agility has had a huge impact on IT Ops: the ability to quickly and flexibly ramp up systems, provision environments and adjust configurations is desirable if not required. Scalability in compute capacity, memory and storage – up and down – is expected. Rapid, repeatable deployment of custom developed artifacts (web applications, integration flows, automated business processes) on various platform-components is the current holy grail, along with the Ideal of self-service provisioning of new environments: initiated from a self-service user interface – a portal where a catalog of services is presented – users click an order together with appropriate shaped of desired services, that are then provisioned by the IT Ops team.

Automation of the as many aspects of IT Ops as possible is necessary to meet the demands – to achieve the required response time as well as the repeatability and also deal with limited availability of skilled staff [and our unlimited capacity for human error] and the associated cost of human labor. Many tools for automating provisioning environments (platform components such as virtual machine with operating system, database, application server, enterprise service bus, BPM engine etc) have blossomed. Likewise, facilities for automated software delivery on top of these environments are rapidly evolving.

IT Ops takes care of initially provisioning environments as well as their subsequent patching. In addition, IT Ops has a responsibility for keeping these environments running at the required levels of non-functional characteristics such as performance and availability. To that end, real time motoring of all environments – at various levels in the stack – is essential, including streaming analysis of the information gathered during monitoring to find trends in system behavior that may lead to violation of SLA based KPIs or learn about faults, violations and other exceptional situations that already have arisen.

Monitoring is required at generic infrastructure level for things like remaining free storage, memory capacity, network traffic and CPU load. Rising through the stack, monitoring will also encompass platform characteristics specific to databases (query execution, password validity, tablespace usage) and different middleware components (for example web service response time, unavailable end points, failed jobs). Ultimately, IT Ops will monitor almost through the eyes of business users or owners of business applications, for end user response times, end-to-end process duration, impaired functionality (‘simply does not work’, ’nothing happens when I press this button’, ‘I get strange error messages’) and so on. Monitoring at these vastly different levels requires versatile tooling and a wide array of skills for interpretation.

A number of predefined situations determined through monitoring should be dealt with in an automated fashion. Examples are dynamic scaling of clusters by starting or stopping nodes, throttling incoming requests and failing over to a healthy node. Other findings will result in notifications to human staff to alert them to further inspection and subsequent actions.

In addition to real time processing, IT operations typically have a need for periodic or incident driven analysis of system behavior – again, at various levels of stack components. Such analysis will help uncover longer term trends in system activity and performance – and the ensuing planning that these call for. Root cause analysis of incidents to find out and resolve the underlying cause of a run time incident is periodically required; such analysis involves mining and analyzing log files and other records of system and application behavior.

The adoption of the public cloud is the next big factor influencing IT Ops. First, operations will stretch across an hybrid environment that is composed of private entities – which we call on premises but that may actually be in a 3rd party data center where some level of operations support is provided – and public cloud services – perhaps from multiple cloud vendors. Provisioning environments and platform components that may stretch across this landscape, managing them and monitoring them, can obviously be quite a challenge. Additionally, depending on the [level of] services consumed from the cloud, the 3rd party cloud vendor may take on a number of operations responsibilities, typically at the lower end of the stack. This means that the IT Ops department has to carefully design procedures that make clear where its own responsibilities start and end and how it interacts with the 3rd parties and their tooling and APIs as well as their reporting and communication infrastructure. Usually, an enterprise’s IT Operations department has the full responsibility for all operations aspects towards the organizations, even though some or much of the execution has been delegated to a cloud vendor.

IT Ops also has to be able move applications around across the hybrid landscape – lifting and shifting workloads from on premises system to cloud platform and vice versa. It has to monitor across that hybrid world, collect logging from at least all environments and components it has a direct responsibility for – and do so in part in real time.

The cloud can benefit the IT Ops department as well. Web scale cloud vendors can offer certain operations with service levels and price points well beyond anything the department can ever aspire to provide – due to economies of scale, level of experience and automation etc. Additionally, cloud based facilities come into existence that help with monitoring run time analytics produced across the stack (and across the landscape) and with analyzing trends and root causes, for example with big data approaches. These facilities are quite rich in functionality and are easy to hook into and to make use of – and would be quite challenging for most organizations to implement themselves, at competing service levels and cost of ownership.

Oracle has several major stories when it comes to IT Operations:

· Enterprise Manager 12c – and now for the first time 13c as well – which they have dubbed the single pane of glass across cloud and on premises or the unified window through which all monitoring and administration can be performed across the entire stack, with now support for hardware as well.

image

· Management Cloud Service – a brand new cloud service that provides real time Application Performance Monitoring and analytics based on Logging and other metrics

image

· Dynamic scalability and portability at platform level (‘lift & shift of workloads’) with container support (Docker and Application Container Cloud Service, with pluggable databases (PDBs) and WebLogic partitions

Note: Oracle as upcoming cloud vendor has a major challenge itself when it comes to IT Operations. In just a few years time, Oracle has to scale from IT operations for its own 120k employees to IT Ops for tens of millions of users. The facilities it requires for dealing with this challenge are similar to or beyond what most enterprises need. The strategic importance of these facilities to Oracle are reassuring to any organization contemplating investing in these.

Enterprise Manager 13c – The Single Pane of Glass

Enterprise Manager runs on premises and connects to multiple on premises (and private cloud) systems as well as to the Oracle Public Cloud. It is the Single Pane of Glass across all systems.

image

New new release of Enterprise Manager – already demonstrated extensively during Oracle OpenWorld 2015 – is release 13c, the first product from Oracle with the number 13 in its release label.

image

The new release of Oracle Enterprise Manager introduces real-time monitoring, a scalable job system that can integrate with the industry’s leading automation frameworks like Chef, and a comprehensive reporting infrastructure. It offers command line interfaces and REST APIs that help further automate operations.

image

EM combines infrastructure management (for virtualization, servers, storage, and network) with existing software management making it a provider of comprehensive app-to-disk systems management. It provides support for monitoring and management of Oracle’s SPARC and x86 physical and virtualized environments. It also unleashes a wide range of capabilities for managing Oracle engineered systems like Oracle SuperCluster (X, T, and M series), Oracle Virtual Machines and Oracle Exadata, all in the context of an application.

In release 13c, EM will include an upgraded Java Workload Explorer for in depth diagnostics and problem resolving around workloads on the JVM.

image

This includes dynamic instrumentation and improved memory diagnostics and an improved UI. The WebLogic Administration Console will be absorbed into Enterprise Manager Cloud Control (aka EMC2), that will offer centralized WLS Administration of multiple domains – on premises and in the public cloud. This includes scheduling and tracking process control operations and WLST script execution through predefined jobs, integrated credential management and a complete System MBean explorer.

The next figure shows a screenshot from the 13c release – best practices are included in EM to help administrators perform actions according to proven ways of working.

image

Release 13c also introduces the ‘gold agent’ for “cloud scale deployment of agents”.

image

This allows you to have a gold image of an OEM Agent and then rollout that gold image to all agents. You can then perform drift analysis and in a single dashboard see how many agents are out of step with your gold image. Any patches and customizations should be included in the gold configuration. In the event that a Management Agent is lost, it should be installed from the reference gold image.

EM 13c has 24 x 365 monitoring for Enterprise Manager targets (and in leap years even more I presume), regardless of planned or unplanned downtime. When targets are down – for example because of planned maintenance – then monitoring continues but no alerts are sent (brownouts vs blackouts). Event compression and aggregation reduces the ‘noise’ without loss of meaningful information.

EM has out of the box support for a substantial number of core Oracle products. Additionally, there is a vast collection of over 150 extensions that connect EM to different products – from MySQL and MongoDB to IBM DB2 and MS SQL Server. You will find these extensions on the EM 12c Extensibility Exchange. Here you will find resource to create your own resources as well. For example MySQL Plugin version 3 that has just been released at OOW 2015; it follows, according to the MySQL team, “a conscious effort to improve the monitoring capabilities for MySQL via OEM”. With this new version of the plugin there are around 500 potential metric items which are collected, about 100 enabled by default and 30 of them with some kind of ‘threshold’.

The new Enterprise Manager even has a graphical representation of the hardware – which is used to indicate the health of components and provide alerts in context when problems have been diagnosed.

image

Note: in addition to all new features and functionality, the look and feel of EM 13c is also much enhanced. Oracle JET components have been used to create an appealing and smooth user experience.

Oracle Management Cloud Service – Oracle’s next-generation monitoring and analytics solution

The Management Cloud Service is a complete suite of integrated IT operations management solutions, born in the cloud and built for the cloud. This Cloud Service gathers real time data from applications and systems on premises, in the Oracle Public Cloud

imageor on any 3rd party cloud. This service is read-only: it provides insight, some of it in real time. It does not actually perform actions, apart from raising alerts.

image

All data is collected in large volumes at line speed in a unified data platform that stores all types of machine data, and automatically correlates the data – similar to what for example is done in many organizations using Splunk. Local agents are installed with the systems that are monitored; these agents send data to the Management CS, for monitoring and analysis in near real time – across technical and business events. APIs are provided to design custom stakeholder dashboards for DevOps, LoB Executive

Oracle Management Cloud provides application performance monitoring that helps to monitor applications and rapidly diagnose issues by integrating and using log data, log analytics that uses topology-aware search and exploration to troubleshoot problems across applications and infrastructure, and IT analytics that helps optimize capacity and maximize performance for database and application infrastructure.

The Log Analytics cloud service can retrieve log files from many different, configurable locations – across Oracle products and 3rd party technology. When Enterprise Manager is used, there is automated configuration of log locations and log types for added EM Targets and out-of-the-box configuration, collection, rules, and saved searches on known target types. Topology information in Enterprise Manager is used for application aware log search. Log searches are scoped to the components running a specific application and Topology Elasticity is automatically factored in.

image

From the Application Performance Management section in the Management CS, there a drill down to Log Analytics, whereAPM is able to locate right log file based on collected topology information.

image

IT Analytics Cloud Service is designed to help DBAs and middleware administrators to plan for capacity based on real workloads and identify and remediate common problems across their database and application servers.

EM Repository data is loaded into the Management CS data store on an ongoing basis, including metrics, metric extensions, configuration, Target Model,Topology, Association, Events, Availability etc. Data can be loaded from multiple EM instances. The longer retention period for large data set in cloud allows meaningful analytics. The Analytics Engine provide capabilities like forecasting, trending, capacity planning.

If some of this sounds a little familiar from other offerings from Oracle, that would make sense. Although this service built from the ground up, it has been inspired by Enterprise Manager options such as Real User Experience Insight and Business Transaction Management; some of the brains behind those products are closely involved with the design and realization of the Management Cloud Service.

To me, this service makes a lot of sense. Especially in a distributed world, there is no logical physical location for a central, end-to-end monitoring facility as data is collected from all over the place. This cloud service can easily scale to large data volumes and can easily offer advanced analytical capabilities – based on big data technologies including machine learning – and a rapid evolution of functionality. The Management Cloud should be generally available in early 2016 – and is one of the very nice finds for me from OOW 2015.