‘Starship Nasa16 enters orbit of the planet Kepler-452b. The captain orders his crew to gain as much intelligence as possible on the population of this planet. The engineers manage to establish a link to one of the main data centers of Kepler-452b. Now all focus shifts to the BI team who have to extract and interpret the data. While the whole crew is anxiously waiting for the BI team, the stress level in the BI team is rising to the top.
The captain listens to the desperate utterances of the the BI team:
O no, they are using Oracle and we have Microsoft.
Something went wrong in the daily job.
The job hangs because of a few strange records in our source data.
They have no clue what the domain looks like, which entities to look for, so they are constantly making assumptions based on short fragments of data. Several times they have to start over from scratch because they made the wrong assumptions, causing a lot of frustration because work has to be thrown away. After a few weeks the captains decides to go back to earth and come back when the BI team has learned enough about the alien population.‘
I wrote this anecdote to illustrate that in my view we are still living in the stone age of business intelligence. Every time we build a new datawarehouse to gain knowledge on a particular domain we start from scratch building extract, transform and load code. When problems are encountered we build a solution. But solutions are rarely reused because they contain domain specific code. In short: there is too little reuse of ETL code.
In an ideal BI environment BI developers are only working on interpreting the domain and configuring the ETL building blocks to Extract, Tranform and Load the source data into an Enterpise data model. They should not be troubled by programming code to build historic tables, to lookup foreign keys, to read source files, to de-normalize tables, to handle errors, logging, etc. If a new situation is encountered, the BI developer should think about designing a generic reusable building block, instead of fixing the situation at hand.
In this article I will discuss these generic reusable building blocks, or design patterns. I also welcome your feedback, so that we can improve them or identify them as best practices.