ssis-logo.jpg

Introduction

Recently I had to find a problem in one of my constantly running SSIS packages. This package is transforming OLTP order-related data from a near real-time DataMart into an OLTP DataWarehouse ready for querying star schema dimensional data (see image below).

Annotation 2019-06-02 194258

Problem

As you can see, I’m trying to convert OrderCreateDateTime value using LKP CreateDateKey and OrderCompleteDateTime value using LKP CompleteDateKey to 4 values from DimDate and DimTime dimension tables to generate two DateKeys and two TimeKeys respectively (not completed orders are fine and will be autocorrected via a CDC Update). While DateKey signifies the order date created/completed, TimeKey signifies the exact time to a minute level of order time created/completed. The problem is that while full-cache on DimDate that occurring twice for both order creation and order completion was running smoothly, DimTime was erroring out (generating TimeKey-=1) once in a while for the order time completed. Just to clarify, while not completed orders setting both DateKey and TimeKey to an unknown (aka -1) are totally fine, TimeKey can’t be unknown while DateKey has a value.

After a very in-depth analysis of both LKPCreateTimeKey and LKPCompeteTimeKey lookups, I came to the conclusion that I don’t know enough about full-cache (who knew right?). I don’t remember the last time that I did a lookup on the same table twice inside the same DataFlow, but apparently, it might generate a wrong value.

Yep, if you do a full-cache lookup for both order-related CreatedDateTime and CompletedDateTime columns, generation of CompletedTimeKey for some records might contain data that is non-cached yet as it was caching the values for CreateTimeKey!

Solution

Once I replaced full-cache lookup option with a no-cache option, the amount of TimeKey=-1 with DateKey!=-1 dropped to zero. What that means is every incoming order time created/completed data runs with no-cache and evaluated every time.

Disclaimer

It’s important to keep in mind that lookup with no-cache works fairly well with a relatively small set of data, but most likely will fail to perform with larger datasets.

Additional information

Here is a great resource (written by Tim Mitchell) on SSIS Lookup that helped me to slowly but quickly digest the caching mechanism – https://blog.sqlauthority.com/2014/05/15/sql-server-ssis-look-up-component-cache-mode-notes-from-the-field-028/)

This package is based on Reza Rad’s Incremental Load: Change Data Capture in SSIS article found here – https://radacad.com/incremental-load-change-data-capture-in-ssis.