Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HELP with TAB problem!!!

Status
Not open for further replies.

Seelen

Programmer
Dec 7, 2006
1
0
0
ZA
I am writing an app in VB.Net and have encountered a problem. I have a text file with data extracted from a database. I need to remove the TAB chars and replace them with spaces. The problem is that when I open the file in a text editor (such as Programmer's File Editor), I can see the length of a TAB (in terms of characters) - I can see the column number where the TAB begins and and where it ends, allowing me to calculate it's length in number of characters. The problem I am having is determining the actual number of characters that a TAB contains in VB.Net (as the text file contains TABs with variable character lengths - some TABs are 5 characters long, some are 7, others 8, etc.) and replacing it with spaces of the same length. For example,

If I find a TAB that is 7 characters long, I must replace it with an empty string that is 7 characters long.

I know how to do the location, removal, and replacement of the TAB but just to calculate the exact amount of characters that the TAB is, is the problem. I have tried using the Length function in VB.Net but it takes the TAB character as 1 character space.

The TABs are in a standard ASCII text file so there is only one font being used (MS Sans Serif, I think). I need to remove the TABs because each line in the text file represents a record in which specific data starts at certain "columns" in the line (eg. data at "column 1 to column 15" may represent their social security number, then "column 16 to column 25" may represent their first name, etc.)

I know a better way would be to use a file of records to put the data in but the company I am working for gets the data from their clients in this format (their clients extract the data from their database send it to them. This is not going to change anytime soon so I just have to find a solution to this text file problem). The text file must be free of TABs so that the data it contains can be loaded into another database but some lines have 1 or more TABs where as other lines do not have any TABs.

When I remove the TABs using CHR(9) or vbTabs, only one character space is replaced and the alignment in the text file goes completely off.

To repeat, when I open the text file in a text editor such as Programmer's File Editor I can see at which "column" the TAB starts at and which "column" the TAB ends, thus allowing me to calculate it's "size/length". I have found TABs of differing sizes/lengths such as 4 characters, 7 characters, 8 characters, etc.

In an example file that I have, when I open up the file in Programmer's File Editor, I can see that the TAB in the example file starts at "column" 19 and ends at "column" 25 giving it a size/length of 6 "characters".

Now if I use the CHR(9) to replace the TAB, the TAB will be replaced with a single character whereas I need to replace the TAB with the same amount of spaces as it's size/length in order to maintain formatting.

Any help would be greatfully appreciated.
 
First, the type of file you are working with is called "tab-delimited," which means that each field in the file is separated (delimited) by a tab. You are trying to convert it to a "fixed-width" file, which means that each field takes up the same length on each line.

Perhaps a better tactic to approach this would be to extract the data betweent the tabs, pad it to the proper length with spaces and then write it back out to a text file with the spaces instead of the tabs.

You say you can do location, removal and replacement of the tabs, so you already are mostly there. What you need to do is locate the tabs, but instead of removing them use their locations to pull out the text in between the tabs. For example, say the first tab comes at position 10 in VB .NET, and the tab is 4 characters long in your text editor. This means that the first field is 14 characters wide - 10 characters for the data and 4 for the tab (the character position in a string is zero-based). So you need to extract all of the characters from 0 to 9 then pad this text with 4 characters to the right, which is done like so:

Dim sr As New System.IO.StreamReader("Path\to\text\file.txt")

Dim ThisLine As String
Dim Field1 As String

Do
ThisLine = sr.ReadLine()

Field1 = ThisLine.SubString(0, 9)

Field1 = Field1.PadRight(14)

'read other fields and write back out to another text file here

Loop Until ThisField Is Nothing


Unfortunately, this method means you need to go through your text file and determine the exact starting position and field lengths for each field, and then use these starting positions and lengths in the code above to get each field, pad it with the appropriate number of characters and write it out to another file.

The only concern I have with this is that you say this: "...some lines have 1 or more TABs where as other lines do not have any TABs.". This leads me to believe that either this is no a "traditional" tab-delimited text file, of that you have word-wrap or saomething similar in your text editor that is making one line look like more than one line.

Anyway, I hope my ideas help. Post back with any more questions if you need to.

I used to rock and roll every night and party every day. Then it was every other day. Now I'm lucky if I can find 30 minutes a week in which to get funky. - Homer Simpson

Arrrr, mateys! Ye needs ta be preparin' yerselves fer Talk Like a Pirate Day! Ye has a choice: talk like a pira
 
The split command usually works for me with delimited customer data.

Dim read As New IO.StreamReader("C:\test")
Dim strLine As String
Dim strData() As String

'loop to the end of the data file
Do While Not read.EndOfStream
'read a line of the data file
strLine = read.ReadLine

strData = strLine.Split(vbTab)
'you could either put this in a loop or hard code everything. sometimes for the amount of if statements required in a loop it's easier to hardcode each data assignment....it all depends on the situation.
yourColumn1 = strData(0) 'pad it if needed
yourColumn2 = strData(1) 'pad it if needed
yourColumn3 = strData(2) 'pad it if needed
yourColumn4 = strData(3) 'pad it if needed
yourColumn5 = strData(4) 'pad it if needed
. .
. .
. .

Loop
read.close
 
I think you might be combining/confusing two data file formats. It's either tab delimited or it's fixed width, it can't be both (not that I've seen anyway) so you don't have to worry about how many spaces a tab is - just locate the tabs. Sounds like you already know how to do that or look at my code above.

For an example, if you have Access, try to export a table, it asks if you want it to be delimited (comma or tab) or Fixed Width...not both.
 
I think it's wrong to assume that this data is tab delimited. From the description provided it sounds like it could be simple text with embedded tabs with an assumed tab width of 8 characters.

These are simple to expand as follows:

Find the first or next tab.

Using its position in the line add enough spaces to the line to bring the next character to a position that is the next multiple of 8 plus 1

Using the newly extended line repeat until no more tabs are found

The code would look something like this:
Code:
Dim lLine as string ' contains the current line from the file
.
.
.
Dim n, p as integer

p = lLine.IndexOf(vbTab)
Do while p <> -1
    ' remove the tab character
    lLine = lLine.Remove(p,1)
    n = 8 - (p mod 8) ' the number of spaces required
    ' insert the required number of spaces
    lLine = lLine.Insert(p, New String(" ", n))
    ' rescan the line
    p = lLine.IndexOf(vbTab)
Loop
' process the expanded line as required
.
.

Bob Boffin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top