Cloud Control Agent 12c managing many objects clouds 295695 1920

Cloud Control Agent 12c managing many objects

An Oracle Enterprise Management Agent 12c that has to manage hundreds of objects needs extra tweaking. If not, it will not start, or it will die soon, leaving you with an unmanaged infrastructure.

Situation

The customer runs over 200 databases on 1 host. Intel bases host with 384GB RAM and over 4TB of storage. Linux OS. Works as a charm.
Enterprise Manager 12c Release 4, Agent Version 12.1.0.4.0.

The databases should be managed with Oracle Cloud Control 12c. Installing an agent on the database host was done as a routine job. The next step would be to (auto)discover the databases and adding them to the repository. Based on some previous experiences I decided to add them in random chunks of about 20 per time, wait until all databases were managed (the green arrow pointing upward) and proceed with the next random chunk.

This went very well for the first 100-ish databases but then suddenly the newly added databases did not appear in the main databases screen as managed. Even worse: all databases eventually became unmanaged.

Investigation

It looked like the agent was not running so I logged on to the database host to investigate further. The agent was down indeed. So, I tried to start it again. After waiting and waiting and seeing the number of dotted lines grow to five or six (!) the startup eventually failed.

Time for deeper investigation. I’ll save you the failed attempts (and my frustration about that) and skip to: I completely removed the agent installation from the host and from the Enterprise repository and redid it. That went smooth as ever. I started adding databases to the repository and in the mean time frantically checked the status of the agent.

[oracle@s-xxxx-db-11 oracle]$ /u01/app/oracle/Agent12c/agent_inst/bin/emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.4.0
OMS Version : 12.1.0.4.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oracle/Agent12c/agent_inst
Agent Log Directory : /u01/app/oracle/Agent12c/agent_inst/sysman/log
Agent Binaries : /u01/app/oracle/Agent12c/core/12.1.0.4.0
Agent Process ID : 41118
Parent Process ID : 41039
Agent URL : https://s-xxxx-db-11:3872/emd/main/
Local Agent URL in NAT : https://s-xxxx-db-11:3872/emd/main/
Repository URL : https://s-xxxx-em-01.xxx.com:4903/empbs/upload
Started at : 2014-11-03 16:07:51
Started by user : oracle
Operating System : Linux version 2.6.32-504.el6.x86_64 (amd64)
Last Reload : (none)
Last successful upload : 2014-11-03 16:49:59
Last attempted upload : 2014-11-03 16:49:59
Total Megabytes of XML files uploaded so far : 10
Number of XML files pending upload : 280
Size of XML files pending upload(MB) : 0.25
Available disk space on upload filesystem : 32.81%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2014-11-03 16:48:58
Last successful heartbeat to OMS : 2014-11-03 16:48:58
Next scheduled heartbeat to OMS : 2014-11-03 16:49:59
Receivelet Interaction Manager Current Activity: Outstanding receivelet event tasks
----------------------------------
TargetID = oracle_database.VALIDD.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:49 CET
TargetID = oracle_database.PSP.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:54 CET
TargetID = oracle_database.VACQD.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:45 CET
TargetID = oracle_database.TSM14SP.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:37 CET
TargetID = oracle_database.ZOND.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:54 CET
TargetID = oracle_database.TSM13RT.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:34 CET
TargetID = oracle_database.skbc.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:51 CET
TargetID = oracle_database.TSM17RT.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:48:50 CET
TargetID = oracle_database.MGRMP.xxx.com - EventType - TARGET_EVENT for operation SAVE_TARGET submitted at 2014-11-03 16:46:10 CET

Target Manager Current Activity : Compute Dynamic Properties (total operations: 37, active: 9, finished: 28)

Current target operations in progress
-------------------------------------
oracle_database.ZOND.xxx.com - ADD_TARGET running for 261 seconds
oracle_database.TSM14SP.xxx.com - ADD_TARGET running for 261 seconds
oracle_database.PSP.xxx.com - ADD_TARGET running for 261 seconds
oracle_database.skbc.xxx.com - ADD_TARGET running for 261 seconds
oracle_database.VALIDD.xxx.com - ADD_TARGET running for 262 seconds
oracle_database.MGRMP.xxx.com - ADD_TARGET running for 262 seconds
oracle_database.TSM13RT.xxx.com - ADD_TARGET running for 262 seconds
oracle_database.VACQD.xxx.com - ADD_TARGET running for 262 seconds
oracle_database.TSM17RT.xxx.com - ADD_TARGET running for 262 seconds

Dynamic property executor tasks running
------------------------------
---------------------------------------------------------------
Agent is Running and Ready

This was new to me. And allthough the agent claimed to be running and ready, in reality it was useless.
Eventually we raised an SR at Oracle. Their response was to the point but no solution; I’ll quote the important part here:

Agent will be configured to start with minimum memory allocation.
Whenever the memory configured is not enough for the agent opereations, agent will restart and during restart auto tune and increase the default memory allocated to higher value so as to fulfil the requirement.
The memory allocated will not be sufficient when the number of targets monitored are high and hence need to set more memory for agent .

Also when starting the Agent, agent will collect and load metedata of all its targets and then only report the status as RUNNING and READY. When the targets are high, this may take few minutes and hence if the status is checked immediately after start up, it report ‘Agent is Running but not ready’. It will report status as Ready after the collection is completed and this is not an issue.

Perform the steps below to increase the Agent memory settings

1.Stop the Agent

2.take a backup and edit /sysman/config/emd.properties file
Change
agentJavaDefines=-Xmx673M -XX:MaxPermSize=96M
to
agentJavaDefines=-Xmx1024M -XX:MaxPermSize=96M

Save the file

3.Start the Agent and monitor its status
/bin>./emctl start agent
after 5- 10 minutes
/bin>./emctl status agent
/bin>./emctl upload

This can be found in the manual and I had already tried it. I even went as far as Xmx10240M, in small steps, but there was no noticable difference.
Apart from the SR I managed to contact Kellyn Pot’Vin, the author of an interesting Oracle blog. She seemed very knowlegable when it comes to Enterprise Manager related stuff. Her first attempt was to have me check a new option in Enterprise Manager to view the data from the collections graph and look at that performance analysis to see what may be backlogging collections and impacting the performance. Sadly I just got an empty graph.

Solution

Upon studying my logfiles she came up with a simple question:

Could you send me the results of:

ulimit -Su
ulimit -Hu

Okay:

[oracle@s-xxxx-db-11 bin]$ ulimit -Su
8192
[oracle@s-xxxx-db-11 bin]$ ulimit -Hu
3100271

Her next request was da bomb:

Could you set both of these to unlimited and restart the agent?

Bingo! The agent started in a jiffy, not even half a line of dots. Apart from that, it was running smooth and I could easily add another 100 databases to the repository. Problem solved!
Off course, afterward you need to have your system administrator find the right setting for those ulimits, which in our case turned out to be only slightly higher than the above published.

Not only did the agent start fast, it kept running as it’s supposed to do, for weeks now.
This information was not in the manual. Maybe it will eventually. Many thanks to Kellyn!