12 April 2012

SSIS Derived Column - US to SQL date

CSV is great but painful sometimes. In one of my SSIS work-flows I had to import data from a CSV file that used US date format without leading zeros:
It is not the most effective and best method but I used Derived Column component to replace the column. The [date filed] is imported as string [DT_STR]. Then I used the long formula to do some string operations.
First is to add leading zeros to month:
FINDSTRING([date field],“/”,1) < 3 ? “0”:“”
If the first slash is on third position then the month is greater than 9 - no need for zero. Otherwise the zero will be added to the string representing the month.
For adding the zero to day:
FINDSTRING([date field] ,“/”,2) - FINDSTRING([date field] ,“/”,1) < 3 ? “0” : “”
It’s just counting the characters between both slashes.
The rest is just repositioning the date parts. Since the date string is not of fixed length I had to use the FINDSTRING to determine where the date parts start and their length.
SUBSTRING function does the job of extracting the parts:
SUBSTRING([date field], where to start, how many chars)  
Year: SUBSTRING([date field], 5, 4)
But the year may start at 5, 6 or 7th position. That’s why it has to be replaced by the function:
FINDSTRING([date field] ,“/”,2)+1which is just finding the position of the second slash and adding 1. So now it becomes:
SUBSTRING([date field],  FINDSTRING([date field] ,“/”,2)+1 , 4)
Next is the month: SUBSTRING([date field], 1, 1)
But we don’t know if it’s just one or two characters. So we replace the second 1 with the formula:
FINDSTRING([date field],“/”,1) - 1which gives us the position prior to the first slash. We end up with the formula for month:
SUBSTRING([date field], 1,  FINDSTRING([date field],“/”,1) - 1 )
The day is most complex as it lies in the middle and has flexible length. So we need to find both, the start and the length:
SUBSTRING([date field], FINDSTRING([date field],“/”,1) + 1FINDSTRING([date field ],“/”,2) - (FINDSTRING([date field],“/”,1) + 1)
Finally it’s just adding the strings, slashes, leading zeros and casting the result as date (and adding check in case the value is null):
TRIM([date field]) == “” ? NULL(DT_DBDATE) : (DT_DBDATE)(SUBSTRING([date field],FINDSTRING([date field],“/”,2) + 1,4) + “/” + (FINDSTRING([date field],“/”,1) < 3 ? “0” : “”) + SUBSTRING([date field],1,FINDSTRING([date field],“/”,1) - 1) + “/” + (FINDSTRING([date field],“/”,2) - FINDSTRING([date field],“/”,1) < 3 ? “0” : “”) + SUBSTRING([date field],FINDSTRING([date field],“/”,1) + 1,FINDSTRING([date field],“/”,2) - (FINDSTRING([date field],“/”,1) + 1)))
Complex? Sure but it works!

Note from MSDN:
FINDSTRING works only with the DT_WSTR data type. character_expression and searchstring arguments that are string literals or data columns with the DT_STR data type are implicitly cast to the DT_WSTR data type before FINDSTRING performs its operation. Other data types must be explicitly cast to the DT_WSTR data type