Source systems store facts that are valid at the moment you look. Right after you retrieved a dataset, it might already be outdated because some process performed an update. So we have to keep it mind that we take snapshots of a dataset that are valid at snapshot time T.
According to a snapshot taken at march 11 09:11 we might calculcate an active customer base at end of period (EOP) Februari of 10.320 customers.
The next day we calculcate the exact same metric (base EOP februari), but now based on a snapshot taken at march 12 09:03.
This time the active base at EOP februari is 10319. How can it be possible that changes between march 11 and 12 have impact on the base of februari?
Various reasons can be hold responsible:
- updates with backwards effect: A cancellation was made undone at march 11. A customer called explaining that he did not want to cancel in the first place, so the order status was incorrect. Using some dirty trick the user made sure that the status was changed from aborted into successfully delivered. When looking at the snapshot of march 12 it looks like this customer never aborted in the first place
- Late arriving facts. At March 11 the supplier sends the information to the source system that some user was delivered at Februari 29.
2 concepts of time:
- Effective time ( synonyms: transaction time, recording time, snapshot time) is the time period during which a fact stored in the database is considered to be true. [https://en.wikipedia.org/wiki/Temporal_database]
Using above example we can say that the fact that base EOP februari = 10320 was true from march 11 09:11 till march 12 09:03
- Valid time is the time period during which a fact is true with respect to the real world. [https://en.wikipedia.org/wiki/Temporal_database]
The valid time for the above scenario would be never for EOP februari = 10320 and always for EOP februari = 10319. However this was known to us only at march 12 09:03.
Usïng two notions of time we can answer questions like: why has the active base EOP februari changed since the last time I looked.
Implementation[table “” not found /]
Using historization for an object or not should not influence the way it is used. When we don’t apply historization dbo.<object> just points to a table with no history. In this case there is no his.<object> table.
99 % of the cases we just want to query the latest view. But in case we want to answer questions like above we can query the his.[object] table. It will show the changes that happened on March 11. Usually this is enough to answer the question.
[His-1] Datamarts only contain latest data.
Depending on your domain history might not be required in the Datamart and might only slow things down and make querying more complex. That’s why we choose not to include history in the datamart.
[His-2] Delete detection
We want to identify deleted records. Instead of deleting these records in his.[object] we choose to set the delete_dt record. The latest view filters out these records. But when performing an investigation we can look into his.[object] to find deleted records.