Continuous Delivery and the Oracle database (III) clip image0022

Continuous Delivery and the Oracle database (III)

In this series of blogs about Continuous Delivery and the Oracle database, I describe how to automate deployments (installations). In the previous two Blogs I have described the tools and techniques used to create and install migration scripts.

In this Blog I will describe the ‘(un)happy flow’ for a database deployment: what to do if the deployment (installation) is correct (the ‘happy flow’) and what to do if it goes wrong (the ‘unhappy flow’)?

Introduction
One might ask what could be the problem if the deployment succeeds. But under normal circumstances you are not the only one on the database. Other sessions must be able to continue during a deployment. For instance it may be impossible to deploy packages that are used by another session. The installation process will have to wait on the running session and that may cause a time-out and thus a failed installation, even though the previous deployment in test went well. And even if you’re lucky and you can compile a package without problems, other sessions may encounter this very well known error:

ORA-04068: existing state of packages has been discarded

A solution is of course not to use ‘package state’ but I do not think management is very happy with such a statement because it will take too much development and test effort to remove all package state. Later in this Blog I will show you how to solve this problem without recoding.

And what if the deployment just fails? Restoring a backup is very drastic and not always a solution: other sessions may have changed data and/or code that should not be restored at all. And Oracle Flashback technology is not sufficient because it can only restore tables (data). It cannot restore code.

What we actually need is to go back to a previous version of the code. The data should generally stay as it is. Since Oracle 11g release 2 there is a nice solution: Edition Based Redefinition (EBR). Tom Kyte called it already a killer app.

Edition Based Redefinition
I quote Lucas Jellema from his Blog Quick introduction of what and why of Oracle Database Edition Based Redefinition:

“For the 11g release of the database Oracle went beyond the reduction of downtime and the ability to redefine database objects with a minimum of unavailability. It implemented in the database a mechanism that is very similar to WebLogic’s production redeployment: the ability to have two versions of the application (or in this case: a coherent set of database objects) live at the same time, the old version servicing longer running sessions that started before the new version went live. This mechanism is called Edition Based Redefinition. It is illustrated in the next figure: the two versions 1 and 2 of Application X – temporarily running in parallel on WebLogic just after production redeployment – have need different versions of the database objects. Version 1 relies on the base release of the database objects while version 2 of the application uses the versions of database objects shipped in Release 2, complemented with the objects from version 1 that were not changed for version 2.

EBR

The notion of a ‘release of database objects’ was introduced in Oracle Database 11gR2 and is called ‘an edition’. Editions are almost parallel universes in which the same database objects – such as views, packages, types – from the same schema can exist in with different definitions. For example: package SALARY_MANAGEMENT in Schema SCOTT can exist multiple times – for example once in the Base Edition, once in Edition Release 2 and once in Edition Release 4.”

Tom Kyte has written several articles about EBR that I can recommend very much:

  1. A Closer Look at the New Edition, Oracle magazine, January 2010
  2. Edition-Based Redefinition, Part 2, Oracle magazine, March 2010
  3. Looking at Edition-Based Redefinition, Part 3, Oracle magazine, May 2010

And this is the Oracle White Paper about EBR:
http://www.oracle.com/technetwork/articles/grid/edition-based-redefinition-1-133045.pdf

This Oracle White Paper shows that ‘package state’ is linked to a session in combination with an edition. So if you compile a package in a new edition, old sessions do not suffer from this error anymore:

ORA-04068: existing state of packages has been discarded

Requirements
The following requirements should be sufficient for a reliable deployment based on EBR:

  1. A deployment must be regarded as one transaction; either everything goes well and the (last) new edition becomes active (i.e. the new database default edition) or there is a problem and the database default edition remains the same, i.e. no difference with before.
  2. A deployment that is aborted may not influence other simultaneous deployments.
  3. It is not necessary to be able to undo a deployment after it has succeeded. Why would you want to undo a successful deployment anyway? And given the fact that database code and/or data may have changed after the deployment I do not see a simple solution. So undoing a deployment is only possible during or just after a failing deployment.

Design
Like I already described in my previous Blogs, the deployment is executed by Flyway that runs a series of migration scripts in the correct order (and only if they have never been executed before on that database).

Flyway administration
Every successful migration step (one script) is registered by Flyway in a metadata table called “schema_version”. If a step fails, it is registered as such (the step is flagged as an error) and the deployment is aborted. Now you can solve the error and restart Flyway. Flyway first has to repair its metadata table (i.e. remove the faulty step, see http://flywaydb.org/documentation/command/repair.html) and then you can continue with the failing step. So a migration script that caused an error will be re-executed later on.

Flyway call-backs
Flyway has functionality that is essential in combination with EBR: SQL call-backs (see http://flywaydb.org/documentation/callbacks.html). These SQL call-backs are SQL scripts that are executed before/after individual migration steps and the complete migration (deployment). The programmer can add code to support or check the deployment.

Happy flow
Using Flyway this will become the ‘happy flow’:

  1. Flyway executes the migration scripts (DDL and/or DML) for one schema.
  2. After every migration script you can verify the deployment, for instance by recompiling the schema and checking that all schema objects are valid.

In case of deployment errors this will not suffice.

Possible errors
What kind of errors can occur and how can Flyway help?

  • The execution of a migration script fails with an (Oracle) error.
    Flyway will issue a rollback but this will only rollback DML that has not been committed. Please note that Oracle implicitly commits a DDL statement. This fact means that it you should first execute all your DDL migration scripts and only then just one DML migration script (since Flyway commits after every script). Now when something goes wrong during the last DML step, a rollback will restore the data (provided you do not commit in the DML script yourself).
  • A verification step fails.
    Maybe you have installed an invalid package. Or maybe the code quality is not as it should be. You can use the SQL call-backs afterEachMigrate (after every step) or afterMigrate (at the end) to verify.

(Un)happy flow
How can you undo a failed deployment? Or to be more exact, how is it possible that the new version does not become active after a failure. In the introduction I already said that EBR could help. But how can it help?

Let us investigate some examples to determine the best strategy.
The target database schema S1 is empty and we want to let Flyway install release R1 containing scripts V1.sql and V2.sql creating views V1 and V2 respectively. Script V2.sql contains an error.

Example 1: one edition per migration and second script fails
At the start Flyway creates a new edition E0 using the beforeMigration call-back. Next Flyway runs the scripts V1.sql and V2.sql. Now when no script would fail, edition E0 will become the new edition (set in call-back afterMigration). But, unfortunately the second script fails. Well, we can just drop edition E0 and restart Flyway after correcting the error, isn’t it? No luck, Flyway has already recorded that V1.sql was okay, so it will never rerun that script. Hence a rerun will now create edition E0 again however it will contain just view V2. View V1 has disappeared due to dropping edition E0 and the cleverness of Flyway.

Example 2: one edition per migration and second script fails
Now Flyway will create an edition before each migration step (using call-back beforeEachMigrate), E1 before step 1 and E2 before step 2. Now when the second script fails, you just drop edition E2 and correct the error and rerun Flyway. That will do the trick. The administration of Flyway is in sync with the administration of the database. Please note that for other sessions the database still looks the same when the migration fails: the default database edition they use has not changed because it is only changed if the whole migration succeeds.

Example 3: as example 2 but now another release is installed after the failure
The deployment to S1 fails and someone else installs release R2 to schema S2 using the same approach as in example 2. That release contains scripts V3.sql and V4.sql also creating views. These scripts contain no errors so the installation succeeds. Flyway has now created editions E1, E2 (containing an error), E3 and E4 and we started with the Oracle default ORA$BASE.

This is the list: ORA$BASE -> E1 -> E2 -> E3 -> E4

Since you can only drop editions at the head or the tail, edition E2 can not be dropped. So release R1 can never be corrected.

In order to overcome all these problems we implement the following rules:

  1. before every migration step a new edition is created (if the edition already exists due to a previous error it will be dropped first);
  2. if the migration fails, the error must be corrected and the same migration must restart (Flyway will restart it at the point where it failed);
  3. there may be only one migration active in order to keep the editions created linked to each other;
  4. only when a migration succeeds:
    • the default database edition changes to the last edition created;
    • another migration may commence.

This list of detailed actions describes how to handle the ‘unhappy flow’:

Note: the developer has to ensure that DDL is executed first and only then migration with DML. In case of an error Flyway will never have committed the data.

# Action Phase Remark
1 Make sure the deployment schema becomes ’editions enabled’. Once in a lifetime You cannot undo this!
2 Log on using the deployment schema. Logon Only the deployment user needs the right to change his objects.
3 Ensure there is only one deployment session active (per database). Before deployment (using Flyway call-back beforeMigrate) Use an exclusive installation lock (via DBMS_LOCK). This is necessary to maintain integrity. If another deployment session has the lock, wait for a minute before a time-out.
4 Check whether the last deployment went well or whether you continue for the same schema. If not, abort. Idem If you do not correct a faulty deployment for schema A and you start a deployment for another schema B that succeeds, you end up with an incomplete deployment for schema A (because the latest edition becomes active). This is wrong.
 
5 Create an edition and grant it to the deployment schema. For every migration (using Flyway call-back beforeEachMigrate) The edition name depends on the schema name and installation sequence. For example APPL_RELATIE$100 for migration number 100 in schema APPL_RELATIE.
6 If the previous step fails because the edition already exists, you have to drop the edition and repeat the previous step. Idem Apparently the previous deployment failed and it is corrected now. Please note that when a deployment session fails, it cannot always recover itself (think of a database shutdown abort). So you have to do it later.
7 Activate the edition for this deployment session. Idem Now you can create ‘editionable’ objects in the deployment schema (if the schema is ’editions enabled’).
8 Execute migration scripts. Each migration (using Flyway) Objects are created in the new edition (they become actual in that edition). EBR will ensure that package X becomes actual if package Y is installed and package X depends on Y.
9 Check the deployment. After each migration (using Flyway call-back afterEachMigrate)
  1. Compile all invalid schema objects.
  2. Raise an exception if there is any invalid object that depends on this schema.

Invalid objects are not allowed, so abort the deployment.

 
10 Grant the current session edition to PUBLIC. After deployment (using Flyway call-back afterMigrate)
11 Make the current edition the database default. Idem The deployment has succeeded.
12 Release the installation lock. Idem


So only when everything goes well, the latest edition (see view ALL_EDITIONS) is equal to the database default edition (see view DATABASE_PROPERTIES).
If anything goes wrong (even when the deployment session is aborted unexpectedly), that will not be the case and you can recover by executing the same deployment that – thanks to Flyway – will continue with the failed step.

Please note once more that it is not possible to deploy to another schema after a faulty deployment: you have to correct the error first.

Conclusion
The combination of Oracle Edition Based Redefinition and the open source tool Flyway enables us to execute a deployment without affecting other users. And as a big bonus it enables us to undo the deployment in case of errors. That really sounds like 24×7, 🙂