Recently I had to find a problem in one of my constantly running SSIS packages. This package is transforming OLTP order-related data from a near real-time DataMart into an OLTP DataWarehouse ready for querying star schema dimensional data (see image below).
As you can see, I’m trying to convert OrderCreateDateTime value using LKP CreateDateKey and OrderCompleteDateTime value using LKP CompleteDateKey to 4 values from DimDate and DimTime dimension tables to generate two DateKeys and two TimeKeys respectively (not completed orders are fine and will be autocorrected via a CDC Update). While DateKey signifies the order date created/completed, TimeKey signifies the exact time to a minute level of order time created/completed. The problem is that while full-cache on DimDate that occurring twice for both order creation and order completion was running smoothly, DimTime was erroring out (generating TimeKey-=1) once in a while for the order time completed. Just to clarify, while not completed orders setting both DateKey and TimeKey to an unknown (aka -1) are totally fine, TimeKey can’t be unknown while DateKey has a value.
After a very in-depth analysis of both LKPCreateTimeKey and LKPCompeteTimeKey lookups, I came to the conclusion that I don’t know enough about full-cache (who knew right?). I don’t remember the last time that I did a lookup on the same table twice inside the same DataFlow, but apparently, it might generate a wrong value.
Yep, if you do a full-cache lookup for both order-related CreatedDateTime and CompletedDateTime columns, generation of CompletedTimeKey for some records might contain data that is non-cached yet as it was caching the values for CreateTimeKey!
Once I replaced full-cache lookup option with a no-cache option, the amount of TimeKey=-1 with DateKey!=-1 dropped to zero. What that means is every incoming order time created/completed data runs with no-cache and evaluated every time.
It’s important to keep in mind that lookup with no-cache works fairly well with a relatively small set of data, but most likely will fail to perform with larger datasets.
Here is a great resource (written by Tim Mitchell) on SSIS Lookup that helped me to slowly but quickly digest the caching mechanism – https://blog.sqlauthority.com/2014/05/15/sql-server-ssis-look-up-component-cache-mode-notes-from-the-field-028/)
This package is based on Reza Rad’s Incremental Load: Change Data Capture in SSIS article found here – https://radacad.com/incremental-load-change-data-capture-in-ssis.