We all know partitioning for performance issues or for storing historical data. But partitioning can also be used for functional purposes such as creating a savepoint in batchrun. Such a savepoint can be used to undo a specific part of a batch without the use of home made software.
Lets say that we have a payroll system for multiple companies:
Company ----------- Company_id number(9) Name varchar2(100) Addres varchar2(100) Place varchar2(100) Last_date date Total_salary number(13) Employee ------------ Employee_id number(9) Emp_comp_id number(9) Name varchar2(100) Address varchar2(100) Place varchar2(100) Salary varchar2(100) Last_date date
The table ‘Company’ contains several companies and each company has several entries in the table ‘Employee’. Now we have to pay and calculate the salaries for all the employees from all the companies. No problem, we just run a script that calculates the sum of all the salaries per company, we update the last_date field in both tables as well as the total_salary field in the company-table and we are ready.
But what if there were mutations for one or more companies and we have to rollback the run for those companies. We don’t want to rollback all the companies so normally we build our own software to make a rollback possible. Disadvantage of that solution is the fact that you have to maintain that software and that a change in the database most certainly will result in a change in the software. Which means extra testing and so on….
With partitioning it is possible to use RMAN or the Oracle Datapump (or imp/exp in a pre-10g situation) for this purpose. First we list-partition the tables ‘Company’ on ‘Company_id’ and ‘Employee’ on ‘Emp_comp_id’. We put the partitions with corresponding company_ids in the same tablespace
Company Company_id number(9) Name varchar2(100) Address varchar2(100) Place varchar2(100) Last_date date Total_salary number(13) Partition by list (Company_id) (partition p1 values ('1') tablespace t1 partition p2 values ('2') tablespace t2 ... partition pn values ('n') tablespace tn partition others values (default) tablespace users ); Employee Employee_id number(9) Emp_comp_id number(9) Name varchar2(100) Address varchar2(100) Place varchar2(100) Salary varchar2(100) Last_date date Partition by list (Emp_comp_id) (partition p1 values ('1') tablespace t1 partition p2 values ('2') tablespace t2 ... partition pn values ('n') tablespace tn partition others values (default) tablespace users );
Before we start the batchrun we make a backup of all the tablespaces. If, for some reason we have to make a rerun we can simply restore the required tablespace.
If we have to add or remove tables or alter columns we only have to make sure that we watch the partitioning schema. Each new table with a direct (1:n) relation to ‘Company’ has to be partitioned the same way as ‘Employee’. And for each new company we have to add a new partition as well as a new tablespace.
Create tablespace tx ...; ALTER TABLE Company ADD PARTITION px VALUES ('x') TABLESPACE tx; ALTER TABLE Employee ADD PARTITION px VALUES ('x') TABLESPACE tx;
In real-life the datamodel will be much more complex of course. So it can happen that we don’t want to rollback all the data of a company but keep some stuff for auditing purposes. In that case a tablespace restore is too much since it will overwrite all the partitions. Now that’s where we can use the Datapump. With the Datapump we can export an entire partition but import only a few tables from that export so we can keep the audit data from the last run but reset the rest of the data. If we keep the names of the tables is in our database and mark them as ‘Always import’ or ‘Audit data’ we can simply generate parfiles for both kinds of imports thus making maintenance a bit easier.
By the way, i pointed out your solution to Mark Rittman as an item in the discussion on “Column Based Databases” (http://www.rittman.net/archives/001368.html)
Sometimes a tradeoff, isn’t always a bad thing 😉
Bert-Jan, thanks for sharing, great idea, but i couldnt help myself…I think there is a limitation build in here.
After reading stuff on Mark Rittmans blog about Temporal databases, Column Based databases, and lessons learned from the Steve Adams seminar this week (including some stuff i learned Oracle internals given a long wile ago by Anjo Kolk), i was wondering…
Correct me if i am wrong. The way i read is -> company : tablespace = 1 : 1
This means that the maximum amount of tablespaces equals the maximum amount declarable companies. The maximum amount of tablespaces you can declare is the maximum amount of datafiles you can declare. The amount of datafiles is set during database creation by the DB_FILES parameter and the MAXDATAFILES clause (which in the end is OS specific).
From the manuals (10g):
MAXDATAFILES Clause
Specify the initial sizing of the datafiles section of the control file at CREATE DATABASE or CREATE CONTROLFILE time. An attempt to add a file whose number is greater than MAXDATAFILES, but less than or equal to DB_FILES, causes the Oracle Database control file to expand automatically so that the datafiles section can accommodate more files.
The number of datafiles accessible to your instance is also limited by the initialization parameter DB_FILES.
DB_FILES specifies the maximum number of database files that can be opened for this database. The maximum valid value is the maximum number of files, subject to operating system constraint, that will ever be specified for the database, including files to be added by ADD DATAFILE statements.
I am not really worried by the restriction, it could be suffice and otherwise goto the overflow partition. What i am worried about is that you will increase the controlfile, AND as we know (again), will introduce extra IO when the controlfile size will increase – beyond the max bytes per single read action of the OS (is it still 64K?). So if we are unlucky we introduced a double read and/or write action to implement this solution for every change in the controlfile (eg. update on SCN numbers…).
In other words, say the contents of the controlfile was in 2 clusters and by specifing a lot datafiles, the controlfile will increase to 3 clusters, this will cause more IO per update controlfile action.