Dealing with Bad Data
By: Zack Turnbow
Introduction
If you haven’t been following along this introductory series on SQL Server Integration Services (SSIS), I strongly encourage you to go back and take a look at what has been covered so far. Most of the basics have been covered so far. One technique that comes into play most often is checking the data being imported for incorrect format and data types. We partly covered correcting incorrectly formatted data in the last article so now let us take a look at checking for bad data.
Server Intellect assists companies of all sizes with their hosting needs by offering fully configured server solutions coupled with proactive server management services. Server Intellect specializes in providing complete internet-ready server solutions backed by their expert 24/365 proactive support team.
Scenario
In most cases in database design, there are columns that are of a certain type and character length. Continuing with the previous examples, credit card numbers are always having a fixed length of 16 characters. SSIS can check to see if the incoming data meets the requirement of having the fixed length of 16 characters. If the data does not meet the requirement, then the “bad” data can be sent to an alternate location to be fixed later. For our purposes, another report file will be created to handle the “bad” data so it can be reconciled with the vendor.
Implementation
First things first, go ahead and create a new package in the existing SSIS project. Give it the name BadData. As before, copy the Data Flow component from the Control Flow work area in the Script package and paste in the Control Flow work area in the new package. Re-create the connections for the Vendor A, Vendor B, and Daily Report Flat File Connections. Re-configure the corresponding components to use the connections that were just created. Next, create the database connection where the good data will go and re-configure the OLE Database Destination Component to use that database connection.
[Click to enlarge]
The component that is going to be used to decide which data is “bad” data and which is going to be imported is the Conditional Split component. Find it in the toolbox and drag into the work area and place it under the Script component. Connect the output of the Script component to the Condition Split component.
If you're looking for a really good web host, try Server Intellect - we found the setup procedure and control panel, very easy to adapt to and their IT team is awesome!
[Click to enlarge]
Open up the Conditional Split component to configure it. A dialog box will open up and look similar to the Derived Column component. Click down into the table area. You will see the order column. Just like the Multicast component, the Conditional Split can have many outputs so the order column indicates in what order the conditions will be applied to the data. Tab over to the Output Name Column and give it the name Good Credit Card Number. This will help identify which output is being connected to downstream components later. Tab over to the Condition column. This is where the conditional logic is applied. Expand the Columns tree in the upper left corner and click and drag the CreditCardNumber into the Condition field in the table below. The character length of this column needs to be verified so expand the String Functions tree in the upper right box and find the LEN function. You can either type the function around the CreditCardNumber column or you can drag the function into the field then move the column inside the parenthesis. This will give the length of the field. Next it needs to be compared to the length requirement, which is 16. So use the double equal (==) operator and type the number 16. For the default output name, type in Bad Credit Card Number so that for any data that doesn’t meet the length requirement can be flagged and save to the bad data file.
[Click to enlarge]
We used over 10 web hosting companies before we found Server Intellect. Their dedicated servers and add-ons were setup swiftly, in less than 24 hours. We were able to confirm our order over the phone. They respond to our inquiries within an hour. Server Intellect's customer support and assistance are the best we've ever experienced.
Click OK. Connect the output of the Conditional Split component to the Multicast component. A dialog box will open us to ask which output needs to be used. Select the Good Credit Card Number output.
[Click to enlarge]
Since there is a need to capture the “bad” data before importing, a new file will be created called BadCreditCardNumbers in the same folder as the Daily Transactions report file. Create a new Flat File connection in the Connection Managers area to access this file. Drag a Flat File Destination component into the work area and configure it to use the new flat file connection. Connect the remaining output of the Conditional Split to the newly created Flat File Destination component.
[Click to enlarge]
Testing the Package
If you're ever in the market for some great Windows web hosting, try Server Intellect. We have been very pleased with their services and most importantly, technical support.
Now that both the good and bad data has a place to be captured, the package is now ready to be tested. Run the package using the new data files found in the attached source code. In the file called 07072008Transactions, the first row has an extra number added to the CreditCardNumber column.
[Click to enlarge]
As the package runs it can be seen that the one row with the bad credit card number is filter into the new file to be reconciled later. Also note that the there is a warning on the OLE DB Destination component. The reason is that the database is expecting 16 characters for the credit card number and inside the package; the character limit is not set on the OLE DB Destination component. Since the Conditional Split is verifying the credit card number, this warning can be ignored.
What have we learned?
How to configure a Conditional Split component.
How to use the LEN function to test the length of a field.
How to use the Conditional Split component to verify the data being processed.
Attachments
|