Suppose we have 2 source systems A and B. Both systems contain information regarding employees. System A contains login_name and email address, System B contains department.
In our datawarehouse we like to merge this information. My preferred way of doing is is the following.
System A has a schema called A in My_DWH.
System B has a schema called B in My_DWH.
So in this example My_DWH contains the table A.user and B.user.
We can also create a table or view called dbo.user in My_DWH that contains a merged view of the two.
It’s important to have A.user and B.user for lineage. You need to be able to go back to the source of records in dbo.user.
* It’s better to use this pattern also when you only have System A in your data warehouse (i.e. there is only 1 user entity). Otherwise you will have a lot to do when System B is added at a later time.
When using this pattern in Staging, we avoid issues when different sources have the same table names.
When in the above example records from schema A and B are merged in My_DWH.dbo.user, we might get uniqueness issues of natural keys. To solve these issues we always add a column named src_schema_id to the natural key.
sur_key , src_schema_id , natural_key , synonym_id
1 , A , ‘bvdberg’ , null
2 , B , ‘bvdberg’ , 1
The combination of src_schema_id and natural_key is unique and we can use the [Syn] pattern to de-duplicate these records in DWH. Note that we can choose to only show sur_key 1 in the datamart. (surrogate foreign keys will not point to entities that have synonyms).