Model driven Design of ETL functions

Analysis and functional design for regular application development has been standardised over the years. Now working on Business Intelligence (BI) projects for the last one and a half years, it seems to me that these kind of standards do not yet apply to Data Warehouse environments and more specific to ETL-processes (Extract Transform Load). Last week I read an interesting article about “Model driven Design of ETL functions” by Mark Zwijsen in Database Magazine. It interested me because I both know the author and the Data Warehouse environment he uses as an example. His thesis is that by structuring the functional design of the ETL process, it should be possible to automatically generate about 80% of the actual ETL code. The underlying model shows a possible structure.

ETL Designmodel3

This model is a usefull structure for functional design documents of ETL functions. Most of the model is self explanatory. I had some difficulty with the difference between Enrichment Rules and Coupling Rules.

Enrichment Rules add extra related data from the source model, that are necessary for further processing. For instance, if you would process Customers of a Telecom Network Operator and these Customers could be either Customer of an Internal Service Provider or an External Service Provider, then you would have to Enrich the Customer data with data of the related Service Provider in order to Select the Customers of the internal Service Provider.

Coupling Rules are necessary to relate the Source data to Target data. If for instance the Source contains a Source specific Service Provider code and the Target contains a source independent Dimension for Service Provider, then you have to couple the Source specific Service Provider code to a source independent Service Provider code in order to be able to relate the Customer to the Service Provider Dimension of the Target Model.

To conclude with, the History Processing is defined as the process to handle Slowly Changing Dimensions (SCD) in case of the SCD-2 mechanism. If you work in the field of BI and Data Warehousing, this should be enough information to interpret the above Model.

5 Comments

  1. Lucas Jellema March 26, 2006
  2. Manikantha varma March 26, 2006
  3. Mark Zwijsen March 2, 2006
  4. kherat November 23, 2005
  5. wanis March 23, 2005