EBS advanced topologies: Sharing APPL_TOP, ORACLE_HOMEs, err… heck, why don't we share it all?

4

Implementing a shared APPL_TOP with
E-Business Suite is common practice these days. Since 11.5.10, there
is also support for a shared technology stack (8.0.6 and iAS
ORACLE_HOMEs).
But what if you want to share the whole lot, also the ORACLE_HOME for the database, the ORACLE_HOME for Clusterware? Would you
want to do this, or is it better to have the ORACLE_HOMEs for the
database dedicated per node, on local storage?
Oracle supports sharing the ORACLE_HOME
for the database since 9i release 2 (I am not sure about earlier
releases).
I will try to explain a bit about this
feature in this article.....

Sharing the APPL_TOP and Technology
Stack

Sharing the APPL_TOP and APPS
Technology Stack is possible, because Oracle has designed both
filesystems in such a way that every node-specific configuration
files are stored under unique directories (I will call this
‘context-sensitive’). For example, the SQLnet configuration files
reside under different directories within the same ORACLE_HOME:
$ORACLE_HOME/network/admin/<context_name>. Only in this way,
you have the possibility to differentiate the configuration between
the nodes.
You can find lots of documents on
Metalink about implementing and/or configuring shared APPL_TOPs and
Technology Stacks for E-Business Suite.
This feature is also implemented into
Rapid Install, from 11.5.10 onwards.
Note: The
configuration built from a natively installed shared Technology Stack
(i.e. from rapid install) differs from a migrated shared Technology
Stack. When installing a shared TechStack, the TNS_ADMIN directory
will be located under
$COMMON_TOP/conf/<context_name>/8.0.6|iAS/network/admin/<context_name>,
while the documentation for migrating to a shared APPL_TOP will
advise you to put the these directories into a local file system.

Sharing the RDBMS ORACLE_HOME
Things are
different with the RDBMS ORACLE_HOME. Oracle puts a specific
directory into the ORACLE_HOME: appsutil. In this directory,
configuration and shell-scripts are stored specific to Oracle
E-Business Suite. In many cases, these scripts and configuration
files are stored in context sensitive directories. But not allways.
And it is specifically for this reason that I have my doubts whether
it is possible to share the ORACLE_HOME, let alone the question
whether you would or should want it.

I couldn’t find
documentation on the subject, so I asked Oracle for a support
statement.

According to the
provided support statement there is a number of issues that we have
to deal with:

  1. There is no
    documentation available at Oracle which describes such an
    architecture

  2. Oracle
    assumes (no, unfortunately I cannot make anything more of this…)
    that the feature of a shared ORACLE_HOME should
    be certified with E-Business Suite, since Oracle Server 10G release
    2 – as a product – is certified with E-Business Suite and Oracle
    Server 10G supports a shared ORACLE_HOME.

  3. So, Oracle
    states, because the RDBMS team supports the shared ORACLE_HOME, we
    may consider this possible to try with E-Business Suite (yes, this
    is what Oracle states…literally).

  4. Autoconfig
    will behave differently. When planning to convert a single instance
    database to RAC using rapidclone, when running adcfgclone on the
    second database tier, adcfgclone will clean up the configuration
    files related to previous nodes. Though this is behaviour to be
    expected, I wouldn’t want autoconfig to do this. So, this requires
    some workarounds, however still to be determined (Oracle didn’t
    specify them).

  5. Support
    doesn’t have a nice cookbook to look at, so requesting support can
    be a lengthy process, and there is a good chance that Oracle Support
    will tell you that it isn’t even supported (though it is! See Bullet
    1).

Personally, I would
feel quite excited to play around with a configuration like this and
to gain some experience in this apparently uncultivated area, simply
because of the fact that it is fun.
However, I would
not feel so comfortable anymore having to implement this for a
(pre)production environment right now, without the experience.
However, this is actually what I did.

Having said all
this, I have tried to implement the above described architecture.
We took an Oracle
E-Business Suite 11.5.10.2 environment, single instance, multi-tier,
which needed to be converted to a two-node RAC with ASM and two Application
Servers.
By the way, when I
said “share the whole lot” I really meant “share the whole lot”.
Everything possible on the two database nodes is shared (except, ofcourse, the
OS): beside the Oracle Clusterware, Oracle ASM and Oracle Database ORACLE_HOMEs, even the user’s home
directories are shared.

Sharing the Clusterware ORACLE_HOME
One of the things
that we encountered, was that the implementation of Clusterware
didn’t run as it should. This ORACLE_HOME was also installed in a
shared location.
The first issue
that we encountered: The Oracle Notification Service was running on
one node, but reported as down on the other (using crs_stat -t).
Strange behaviour was noticed, the process for ons went down almost
immediately after it was spawned, and after the process died, it was
respawned. The result: running “onsctl ping” intermittently
reported ONS to be running and not running. Even though “crs_stat
-t” reported ONS to be running.

In the end we used
a workaround as described in Metalink note 304767.1. We moved ORACLE_HOME/opmn/conf and ORACLE_HOME/opmn/logs to a local
directory, and linked them back into ORACLE_HOME/opmn. Only in
this way we were able to get a stable running Oracle Notification
Service. Apparently, the configuration and logfiles for two
concurrent nodes cannot be located in the same directory. Problem
worked around.

Managing the database from a shared ORACLE_HOME
We also wanted to
manage the database using the Server Control Utility srvctl.
So, we registered
the database in the cluster registry:
$ srvctl add database -d <DB_NAME> -o <ORACLE_HOME> -p
<location of spfile – in ASM>.
After this, we
registered the database instances:
$ srvctl add instance -d <DB_NAME> -i <INSTANCE_NAME> -n
<NODE_NAME>
However, under
these circumstances a typical E-Business Suite database cannot be
started using srvctl. Srvctl will expect the SQLnet configuration
files (listener.ora and tnsnames.ora) to reside under
$ORACLE_HOME/network/admin. However, when you are dealing with an
E-Business Suite database these files are typically stored in another
location:

Assuming that my
EBS database is called PROD and my database server nodes are called
node1 and node2, the TNS_ADMIN directory will be defined as follows:

Nodename

TNS_ADMIN Location

Node1

$ORACLE_HOME/network/admin/PROD1_node1

Node2

$ORACLE_HOME/network/admin/PROD2_node2.

In order to fix
this, according to metalink note 362135.1 (Configuring Oracle
Applications Release 11i with 10gR2 RAC and ASM, Chapter 3.8, third
bullet) we have to edit $ORACLE_HOME/bin/racgwrap and set TNS_ADMIN
to the right location in this script (TNS_ADMIN is not set at all in
this script, so we had to add this setting to the script).
Now, here we hit an
issue with the shared ORACLE_HOME.
When confronted
with a shared racgwrap file, we need some additional scripting in
order to determine the TNS_ADMIN location.
This is what we
added in racgwrap:

if [ “$(hostname)” = “node1” ]
then
  TNS_ADMIN=$ORACLE_HOME/network/admin/PROD1_node1
fi
if [ “$(hostname)” = “node2” ]
then
  TNS_ADMIN=$ORACLE_HOME/network/admin/PROD2_node2
fi
export TNS_ADMIN

Now we were able to
startup the database using srvctl.
This is how far we have come until today. We were able to startup the whole environment, E-Business seemed to be working fine for the moment, but I am curious whether it will hold. 

The Practical Questions
Ok, we have "kind" of proven that you can install Oracle E-Business Suite with 10gR2 RAC and ASM with a shared APPL_TOP, CRS  ORACLE_HOME and RDBMS ORACLE_HOME. Now, the practical question: "Would or should you want to have this?"
My heart would probably say yes, because of my technical backgound. My rational answer would probably be no. One of the primary reasons is that if something bad happens to your file systems, you will lose your entire application.
I know what you will say now: "If you have done your design well, you will have your disks mirrored!" -Right! but most cases of corruption are the result of human errors, being people accidentally deleting files, modifications made to configurations that turn out not the way expected, etcetera. These corruptions will happily be taken by the mirroring mechanism in place. Disk mirroring only prevents data-loss and only in case of technical failures.
Why would you implement RAC? Probably to provide high availability. In other cases scalability. The first reason – high availability – cannot be achieved when you share everything there is to share. All of your shared resources become Single Points of Failure, thereby increasing the risk of downtime and in the same time decreasing the availability that you may want to achieve.

But, you will respond, in such a case, RAC in itself is not a high availability tool. In certain perspectives this is true. However, there is a difference in having only the database shared (with a proper backup) in stead of everything. The more you share, the higher the chances that a failure will cause your entire system to become unavailable.

Share.

About Author

4 Comments

  1. jason smith on

    Have you been successful in using rapid clone to convert Oracle E-Business Suite 11.5.10.2 environment, single instance, multi-tier, which needed to be converted to a two-node RAC? Could you elaborate on the detailed steps that you followed- I have been trying to do the same with little luck.
    Thanks

  2. Nigel,
    I am almost sure this won’t do (setting the TNS_NAMES the way you propose). As far as I know, racgwrap is only used by srvctl, which initiates a secure shell connection to wichever machine, even the local machine. Regardless, what if you would start instance PROD2 from node1? This will start the database instance from node2, but you didn’t login to node2, you are on node1. Under the surface, a secure shell (ssh) is initiated that immediately runs the scripts to start the instance. This ssh connection will not source the environment, so ORACLE_SID will not be set. I am not even convinced of this when it comes down to starting instance PROD1 from node1. Srvctl will initiate ssh even for the local node, bypassing the sourcing of the environment file. Adding the environment file to your .profile or equivalent I believe will not do, because .profile is bypassed when you run a command using ssh. Additionally, what if you have multiple databases running on your nodes (PROD, TEST, etc)? How to determine the ORACLE_SID? You might have dedicated users per environment, but you cannot have multiple Clusterware (used by srvctl) installations per node.
    About your “bigger” question: My opinion is that with an application like EBS, downtime should be minimized as much as possible. That implies minimizing the risks for (unplanned) downtime as well. Especially with upgrades or patches (as already stated in the post), when having shared everything, the risks of downtime due to damage being done to the code is significant. When you would dedicate the code to the node it is running on, there is the possibility to upgrade the code locally, test it out, and if necessary fall back to the not yet patched node.
    There is one major issue here (as allways is the case when you are using EBS). Patching the EBS software (APPL_TOP, etc), most of the times changes are made to the database as well. So it is allways necessary to have a proper backup of the database (DUH?).

  3. Arnaud

    Small point; you should be able to simplify your setting of TNS_ADMIN to a single line:

    TNS_ADMIN=$ORACLE_HOME/network/admin/${ORACLE_SID}_`hostname`

    I’m assuming of course that you’ve already set ORACLE_SID in a node-specific way. Bigger question: what happens when you upgrade (Oracle, Apps, CRS, etc) in this scenario? that’s maybe as big an issue as human errors (don’t most human errors happen during upgrades?)

    Regards Nigel