Multiple schemas (MS)

Suppose we have 2 source systems A and B. Both systems contain information regarding employees. System A contains login_name and email address, System B contains department.

In our datawarehouse we like to merge this information. My preferred way of doing is is the following.

System A has a schema called A in My_DWH.

System B has a schema called B in My_DWH.

So in this example My_DWH contains the table A.user and B.user.
We can also create a table or view called dbo.user in My_DWH that contains a merged view of the two.

It’s important to have A.user and B.user for lineage. You need to be able to go back to the source of records in dbo.user.

* It’s better to use this pattern also when you only have System A in your data warehouse (i.e. there is only 1 user entity). Otherwise you will have a lot to do when System B is added at a later time.

When using this pattern in Staging, we avoid issues when different sources have the same table names.

[MS_src_schema_id]

When in the above example records from schema A and B are merged in My_DWH.dbo.user, we might get uniqueness issues of natural keys. To solve these issues we always add a column named src_schema_id to the natural key.

e.g. dbo.user
sur_key , src_schema_id          , natural_key ,   synonym_id 
1              , A                                   , ‘bvdberg’        , null
2              , B                                   , ‘bvdberg’        , 1

The combination of src_schema_id and natural_key is unique and we can use the [Syn] pattern to de-duplicate these records in DWH. Note that we can choose to only show sur_key 1 in the datamart. (surrogate foreign keys will not point to entities that have synonyms).

 

Leave a Reply

Your email address will not be published. Required fields are marked *