Should we not treat dimension tables in a similar fashion? We must know exactly what a row in a dimension table represents, in business terms. Once the business definition is clear, the dimensional keys used in the fact become obvious. Further, while the grain may be equivalent to the primary key of the fact table, the grain is properly declared in business terms first. He writes that the most common error he sees is to not declare the grain at the beginning of the design process. Kimball makes a strong point that one must declare the grain of the fact table, stating precisely what a fact table record represents. That leads directly to user confusion and incorrect results.Ĭonsider how we treat fact tables. Storing Type 2 history in a dimension table fundamentally changes what that dimension contains. Since no structural changes are required, it should be able to drop right in, but this is not the case. A designer might get the sense that he could present the first version of the warehouse using SCD Type 1 overwrites and add Type 2 SCD history in a later version. Type 2 SCD is usually presented as one of several choices as to how history is stored – a somewhat technical distinction that can be hidden from the business users. Usually, dimensions containing Type 2 history have effective and expiration dates, as well as a current indicator, which must be maintained as Type 2 SCD rows are inserted. Historical fact rows are linked through the surrogate key to the version of the dimension row that was current when the fact was recorded. It is made possible by the use of a surrogate key on the dimension rather than the natural key. The workhorse of dimension history is, therefore, SCD Type 2. SCD Type 1 is a simple overwrite, and SCD Type 3 is somewhat special-purpose and limited. For each column in the dimension table, a determination should be made to 1) overwrite the old value, 2) insert a new row in the dimension with a new dimension key to record the new value, preserving the old, or 3) copy the old value to a previous value column in the row. Kimball’s general answer is to choose between the standard slowly changing dimension (SCD) Types 1, 2, and 3. When dimensions change, how should it be handled? Not every dimension change needs to be recorded as history, but many do. The values of Dimensions are either static (date and time, limited code sets) or change slowly. Warehouse facts are inherently historical since transactions happen on a transaction date, balances are kept on a balance date, and so on. As we will see, clearly distinguishing between current and past dimension values pays off in clarity of design, flexibility of presentation, and ease of ETL maintenance. That said, can the mainstay Type 2 slowly changing dimension be improved? I here present the concept of historical dimensions as a way to solve some issues with the basic Type 2 slowly changing dimension promoted by Kimball. Kimball’s practical approach focuses squarely on clarity and ease of use for the business users of the warehouse. Business users can understand and query these warehouses directly and gain valuable insights into the business. His practical warehouse design and conformed-dimension bus architecture are the industry standard. We owe a lot to Ralph Kimball and friends.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |