<aside> 💡 In SCD Type 1, if an item is updated, the existing record is overwritten. If an item is added, the existing record is added
</aside>
SCD type 1 ensures that there are no duplicate records in the table and that the data reflects the most recent current dimension.
Initial Table
| Product | Sales | Revenue |
|---|---|---|
| Widget A | 1,250 | $12,500 |
| Widget B | 875 | $8,750 |
| Widget C | 1,500 | $15,000 |
| Widget D | 2,100 | $21,000 |
| Widget E | 1,375 | $13,750 |
Transactions
After Upserting
| Product | Sales | Revenue |
|---|---|---|
| Widget A | 1,250 | $12,500 |
| Widget B | 900 | $10,000 |
| Widget C | 1,500 | $15,000 |
| Widget D | 2,100 | $21,000 |
| Widget E | 1,375 | $13,750 |
| Widget H | 250 | $2500 |
Implementation
This can be done efficiently in Databricks using the Merge Command
MERGE INTO old_sales
USING new_sales
ON old_sales.Product = new_sales.Product
WHEN MATCHED THEN
UPDATE SET
old_sales.Sales = new_sales.Sales,
old_sales.Revenue = new_sales.Revenue
WHEN NOT MATCHED THEN
INSERT (old_sales.Product,
old_sales.Sales,
old_sales.Revenue)
VALUES (
new_sales.Product,
new_sales.Sales,
new_sales.Revenue
)