Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parse textfiles 1

Status
Not open for further replies.

Machiaveli

Programmer
Dec 16, 2003
91
NL
Hello,

I've posted this item before, but no really solutions to solve this.

I've some textfiles which i would like to import. In one textfile there are over 1024 columns when separated. This exceeds sql server's maximum columns. Let call this the resultfile.
In the other texfile i've got a descriptionfile about the resultfile. The description contains the columnnames of the resultfile (see below for example). The problem with the resultfile is that it doesn't contain any columnnames, but the columns appear in order of the descriptionfile.

The tricky part is that there are columns which really represents 1 column (those columns has the same name but with an indexnumber). Those columns need to be parsed, but there are some of those colums that doesn't need to be parsed.

For example (never mind the values):
field1_1 | field1_2 | field2 | field3
empty yes 1 2003

Above should be:
field1 | field2 | field3
yes 1 2003

Descriptionfile example:

formid fieldnr fieldname idx length
1 1 field0 0 2
1 2 field1 1 1
1 3 field1 2 1
1 4 field2 0 2
1 5 field3 0 4

Now there should be a way to transform the data so that the columns where no value is specified are parsed.

Can someone help me on this? This bustin my brains for weeks.
 
I believe that you can do this through an Active X script in a DTS package. Don't ask me how, I've never used an Active X script in a package. But I understand that it can handle this kind of stuff easly.

Denny

--Anything is possible. All it takes is a little research. (Me)

[noevil]
 
Machiaveli,

As far as importing the text file goes, create an intermediate import table which has one (or more if necessary) long varchar columns in it - long enough to hold one of the text file records. In the Transform Data Task define the text file as having fixed length columns the length of the varchar colums in the import file (don't worry about the actual text file columns at this time).

Once you have the text file imported into SQL Server, you can manipulate and parse the raw data into columned SQL tables any way your heart desires by using a stored procedure called from an Execute SQL Task.

As far as having more that 1024 columns goes, I don't see any way around that limitation except to group the data and parse it (after you have imported it) into two or more tables which have less than 1024 columns each.
 
Machiaveli,

Remember, you can't do anything with the data if it's sitting in a text file. Just getting the data into SQL Server should be the most important and first step, even if it's still in raw, uncolumned format.

Once the data is in SQL Server it should then be easy to manipulate it any way you need to.
 
Hi EdwinGene,

I'll try to do what you suggest. Can i return to you if i'm running into problems?
 
Hi Edwin,

I tried to do what you suggested, but i'm having some trouble with it.

I normalized some columns into another table, the rest of the columns needs to be normalized also (like you said by grouping them). In the descriptionfile where the columname is, has also an index. I was thinking about using that index to identify the columns that need to be grouped.

The textfile with the records is still not in sql server but it can be transformed using data tranformation with vbscript. Can you help me with this?
 
First things first.

Keep things simple to start with. If you're still having trouble importing the text file, then don't do any complicated transformations using a vb script when importing. Just do a straightforward import into an intermediate import table. The import table should be structured with only one or more varchar columns with enough total length to accommodate the longest text file record. Only one varchar column if possible, more if necessary. The simplest thing would be for the end of the varchar columns to correspond to the end of one of the virtual text file columns. It will make processing after the import a little simpler.

Now, just plop the text file straight into this import file. You can worry about the more detailed transformations after the text file has been imported.

In the Properties window of the "Text File (Source)" DTS Connection, click the "Properties" button for the text file.

Click "Fixed Field" and then "Next >". Now define the column positions in your text file format to correspond to the length of the varchar columns in the intermediate import file.

I think that's about as complicated as it should get. The SQL Server connection and the Data Pump task should be pretty straightforward from this point.
 
Hi Edwin,

I did what you suggested. I've two tables where i import the raw data.

The first table is structured and contains the columnnames. The second table is the rawdata table with only two columns, one with the form_id (see first message) and second the rawdata.

What should i do next? In the raw data there a separator, but like i said before is that some columns should be parsed. In the first table there is also a column named index. If there is a index for a columname (e.g. 1 to 4) then the rawdate containing those columns should be parsed into one column.

Is this easy to transform?
 
Now that you have the data in SQL Server you can manipulate it using a stored procedure or procedures, and Transact SQL commands.

Stored procedures can be called from the DTS package using the Execute SQL Task. If your stored procedure has no parameters, the syntax for calling the stored procedure is exactly the same as calling a stored procedure from Query Analyzer or from another stored procedure.

To find the column delimiter use the CharIndex or PatIndex function. To parse, use the SubString function.

If you feel you have to process the data one record at a time you can use a Cursor (acronym for CURrent Set Of Records). Although Cursors are very slow and if you have a lot of records cause a HUUUUUUUGE time increase, the programming can be easier to visualize and implement in case you are not yet fluent in Transact SQL.

If you are fluent in Transact SQL, you can do the transformations with an Insert and Select. This is more complicated than using a Cursor, but much faster.

I can't help you with much detail on how you are going to manipulate your data - that's the fun part for you to work out. But at least now you have the data inside SQL Server where you can get to it, and that's half the battle, if not more.
 
Hi Edwin,

I tried to do what you said, but i'm having a little bit trouble.

I need the columns with the indexnr from the first table to identify the columns in the second and than parse.

Can you help me?
 
I have some questions for you.

1. Is this import going to happen on a recurring basis, or is it a one-time thing?

2. If it is going to be a recurring event and you have a description file, does that mean the record structure of the text file can and will be different every time the import occurs?

3. If the record structure of the file is going to be the same for every import, or if this is a one-time-only event, then why do you need to imort the Description File at all? By examining the Description File in its text format you can set up your final destination table and will know all of the field lengths in advance.

4. If the record structure of the source data is going to be different for every import, how can you have anything in place to actually use the data after it is imported? The only practical way to process the data after it is imported is if you have something set up to process it BEFORE it is imported, which means you have to know what the data is and where it will be located.
 
Hi Edwin,

Thanks for replying.

To answer your questions in your order:

1. Yes, it going to occur on a daily basis.
2. No, because the records should be normalised in the 2 tables which i've created. (e.g.)

first table:
reference | scandate | scantime | resultnr

second table:
resultnr | columnname (spec. in description file) | result

The first table i could manage to fill in, it's the second table i'm having trouble with. That's why i need to parse the indexed columns into 1 so it can be one columname and one result. If there are any indexed columns which has more than one result, it should be parsed. The values should be separated with spaces (e.g: values 1 2 3) in that case.

So if the data can be stored that way, i don't have to worry about record structure changes.

3. True, but the description file will be made once and i only need it to set the columnnames in the second table. Otherwise i have to rename it manually. If there are changes, there will be a new description file made (version management).

4. By doing point 2 and 3 it is solved.

I really hope you can help me with this.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top