Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DTS - Automatic Data Transformation Mapping

Status
Not open for further replies.

JinjaNinjaUK

Programmer
Nov 29, 2006
4
GB
Hi All,

I wonder if anyone can assist me. I'm trying to create a DTS that will import bespoke XLS files into SQL2K5. I've used the Dynamic Properties Task to control the source data and to point it at the file in question, and I've added an ActiveX task to ensure that the import uses the IMEX=1 flag to avoid the issues surrounding mixed-format columns being imported into SQL, both of which work fine.

My problem now is that the files that the users will be uploading could contain any number of columns from 2 to 20. Because of this, I've realised that I'm going to struggle to transform the data, as there seems to be no way of "redoing the auto-mapping" programmatically.

I've found the following piece of code that looks like it might begin to solve the problem, but I can't seem to find a way of identifying the number of active columns included in the XLS (prior to transform) to allow me to run through a loop that sets up the required number of transforms. (I'm aware the code below is far from working, but I wanted to include as it does appear to address the problem.)
Code:
=======================
 'Add transformations for DataPump task
  Set tran = cusData.Transformations.New("DTS.DataPumpTransformScript.1")
  tran.Name = "DTSTransformation__1"

    'Add Source columns for transformation
    Set col = tran.SourceColumns.New("au_id", 1)
    col.Name = "au_id"
    col.Ordinal = 1
    tran.SourceColumns.Add col
    Set col = Nothing

    'Add Destination columns for transformation
    Set col = tran.DestinationColumns.New("AuthorID", 1)
    col.Name = "AuthorID"
    col.Ordinal = 1
    tran.DestinationColumns.Add col
    Set col = Nothing
=======================
I'm trying to avoid installing Excel on the webserver, and want to try and make this as painless as possible for the individuals uploading the files.

Anybody got any ideas? They'd be much appreciated! Is there any way I can combine my pre-transformed XLS file into a view so that it always has, say, 50 columns? Something like this may allow me to import into a 50 column table, from which I can trim out excess columns, and move to a table of it's own.

In case it helps, the system is intended to allow external staff to upload an XLS file, in any format, which they then identify as something like "Contacts","Pricelists",etc. Following the import of the file, they then identify the fields contained, and match them against a vocabulary of attributes for the relevant type. (e.g Contacts would have "First Name","Surname","Full Name","Initial","Salutation" etc) and they then match any or all of the uploaded fields to the fields specified in the vocabulary. Obviously, the intention is to then update a table of data elsewhere with this information, based on the translation specified. To deliver this how I would like to, I would need the information within SQL prior to requesting field information, so putting this behind the field selection to ascertain the number of fields is not really an option.

Any and all help would be much appreciated.
 
Hi,

I think it can be done but you say you are importing into a SQL 2005 database? I don't know too much about 2005 except that I thought DTS had been superceded by SSIS?

Unfortunately, I can only give you a few pointers as your project could get pretty big...

In any case if you wanted to do this using DTS you have two choices:

1) ActiveX
2) A mixture of DTS Tasks (including ActiveX)

If you were going to go the all ActiveX route then you might consider looking at VB6 or VBScript as I don't know what benefit using DTS would be.

If you were going to go for the mixture, then what you could do is something along the lines of:

Determine each of the Spreadsheet formats that you are going to work with and set up a data pump task between a sample spreadsheet and the target table (it is possible to set connection properties dynamically)

Assuming you have a separate DTS per data pump task you could then have an overall DTS that checks the Spreadsheet type and then determines which DTS (data pump) should be run.

I did a bit of sample code for that bit as below:

Function Main()
Dim oSheet

Dim oPackage
Dim oConn

Dim appexcel
Dim newbook
Dim newsheet

Set appexcel = CreateObject("Excel.Application")
Set newbook = appexcel.Workbooks.Open("D:\DTSTest.xls")
Set newsheet = newbook.ActiveSheet

'Specify the column name in the Excel worksheet

msgbox newsheet.Cells(1, 1)

Select Case newsheet.Cells(1, 1)
Case "Contacts"
'Set global variable to "Contacts"
Case "PriceLists"
'Set global variable to "Contacts"
Case "etc"
'Set global variable to "Etc"
End Select

Set newsheet = Nothing
Set newbook = Nothing
Set appexcel = Nothing

Main = DTSTaskExecResult_Success
End Function


This code would tell you which sub DTS to run

Best of luck,
Tom

 
Hi Tom,

Thanks for your comments, it's definately an option, but I would be a little concerned by the sheer amount of combinations that would be possible, and thus the amount of DTS' I would need to create and maintain.

When originally looking at the problem, I decided that using the sp_addlinkedserver to create a Jet Engine connection to the Excel file would be the best solution. (This would allow me to query the XLS file using SQL, and thus the number of fields didn't actually matter as I could do a straight SQL INSERT INTO, which would reproduce the original structure)

Unfortunately, the mixed datattype issue became apparent when I tried to set this up, as the query would bring in NULL values where the field contained a mixture of strings and integers. Having begun to try and resolve this, I came up with some ActiveX code that I could use in a DTS to get around this problem, but this then highlighted the "auto-remapping" problem that I advised originally.

I decided to go back and take a look at the sp_addlinkedserver solution, and lo, I managed to utilise the IMEX=1 extended property to get this working!

Code:
EXEC	sp_addlinkedserver	EXCEL,
	'Jet 4.0',
	'Microsoft.Jet.OLEDB.4.0',
	'C:\DataTest.xls',
	NULL,
	'Excel 5.0;HDR=No;IMEX=1'

EXEC	sp_tables_ex	EXCEL

SELECT	*
FROM	Excel..Sheet1$

EXEC	sp_dropserver	EXCEL

From the few intial test I have run, this seems to work fine, (I must remember to test it with 8 blank records at the top of the XLS file though, just to make sure it continues to function)

While I would still be interested to see if there is any way of programmatically handling the mapping of data transforms, this alternative solution looks to be a better fit for what I am trying to do, and I'll see how I get on with it.

Thanks again for your comments!
 
Hi,

Well done. Best of luck with your project.

Regards,
Tom

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top