Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merging 2 lines in one line, TEXT FILE 1

Status
Not open for further replies.

Huslayer

IS-IT--Management
Jan 4, 2010
15
US
Hello VBS Gurus,

I've an easy challenge for you :)

I'm working on a text file, the Data comes in 2 lines, but some lines has "FORM FEEDER", doing the following code to replace the form feeder by a new line
and then split the contents so I can merge the 2 lines

the results:
It removes and adds a new line but doesn't split correctly the output !! still in 2 lines.

Any Idea? what I'm doing wrong?

Sample data in .txt is here


and here's my mighty nighty script..

--------------------------------------------
Const ForReading = 1
Const ForWriting = 2

strSourceTxtFile = "PXDRAD January 2010 thru May 2010.txt"
strTargetTxtFile = "PXDRAD January 2010 thru May 2010 - One Line - " & ShortDateTime(Now) & ".txt"



'Reads the TXT file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(strSourceTxtFile, ForReading)

'Skip header data
For x = 0 to 19
objFile.ReadLine
Next

strAllLines = objFile.ReadAll
objFile.Close



arrAllLines = Split(strAllLines, vbFormFeed)

For i = 0 TO UBound(arrAllLines) - 1 Step 2
strNewText = strNewText & arrAllLines(i) & arrAllLines(i + 1) & VbCrLf
Next

Set objFile = objFSO.OpenTextFile(strTargetTxtFile, ForWriting, True)
objFile.WriteLine(strNewText)
objFile.Close


Function ShortDateTime(dtmTime)
strYear = Year(dtmTime)
strMonth = Right("0" & Month(dtmTime), 2)
strDay = Right("0" & Day(dtmTime), 2)
strHour = Right("0" & Hour(dtmTime), 2)
strMinute = Right("0" & Minute(dtmTime), 2)
strSecond = Right("0" & Second(dtmTime), 2)
ShortDateTime = strYear & strMonth & strDay & "-" & strHour & strMinute & strSecond
End Function
-------------------------------------------------

Thanks
 
A starting point:
Code:
...
strAllLines = [!]Replace([/!]objFile.ReadAll[!], vbFormFeed, vbCrLf)[/!]
objFile.Close
arrAllLines = Split(strAllLines, [!]vbCrLf[/!])
For i = 0 TO UBound(arrAllLines) - 1 Step 2
  strNewText  = strNewText & arrAllLines(i) & arrAllLines(i + 1) & vbCrLf
Next
...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hey PHV,
merci beaucoup for your help, i've tested it on the original file, but WOW it's still running for the last 30 minutes !

so i've gave it a try on the sample data file, it didn't work, it only have one line in the output.

here's the script that i've used

------------------------------------
Const ForReading = 1
Const ForWriting = 2

strSourceTxtFile = "PXDRAD January 2010 thru May 2010.txt"
strTargetTxtFile = "PXDRAD January 2010 thru May 2010 - One Line - " & ShortDateTime(Now) & ".txt"



'Reads the TXT file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(strSourceTxtFile, ForReading)

'Skip header data
For x = 0 to 19
objFile.ReadLine
Next



strAllLines = Replace(objFile.ReadAll, vbFormFeed, vbCrLf)
objFile.Close

arrAllLines = Split(strAllLines, vbCrLf)

For i = 0 TO UBound(arrAllLines) - 1 Step 2
strNewText = strNewText & arrAllLines(i) & arrAllLines(i + 1) & VbCrLf
Next

Set objFile = objFSO.OpenTextFile(strTargetTxtFile, ForWriting, True)
objFile.WriteLine(strNewText)
objFile.Close


Function ShortDateTime(dtmTime)
strYear = Year(dtmTime)
strMonth = Right("0" & Month(dtmTime), 2)
strDay = Right("0" & Day(dtmTime), 2)
strHour = Right("0" & Hour(dtmTime), 2)
strMinute = Right("0" & Minute(dtmTime), 2)
strSecond = Right("0" & Second(dtmTime), 2)
ShortDateTime = strYear & strMonth & strDay & "-" & strHour & strMinute & strSecond
End Function
--------------------------------------

Thank you so much again, please advise :(
 
[0] If there are formfeed (FF, vbFormFeed, 0x0c), it occurs at "line 10"; and it is a double FF (0x0c 0x0c).

[1] There are some characters 0xb3 which you may or may not desire to take it back to 7-bit character by replacing.

[2] The tool to debug, it is to inspect the file with hexeditor.
 
tsuji,
good morning...
Thanks for joining and god bless for sharing the knowledge :)
but really I don't understand what you're talking about, i'm a DBA and just need to clean the data files before processing
can you clarify please? code? example?

Thanks
 
[0.1] What I tried to convey is that on the sample.txt, there is only one place that a FormFeed character (vbFormFeed) appeared and it is a double FormFeed. It appears in line #10 (base one). It is read like this.
[tt]
225454.37[highlight][0x0c 0x0c][/highlight]1247307005/14/201005/24/2010DEtestlAN, DAISY DEtestlAN, DAISY P01 MEDICAID 012P001T1 GMD118831 8[highlight][0xb3][/highlight]8831 8TE
[/tt]
The highlighted [0x0c 0x0c](two in succession) are where the FormFeed characters occur. Another place highlighted [0xb3] is another 8-bit character, no relevant to this discussion.

[0.2] Since it appears in line #10, and that you read pass 20 lines ignoring them, there is no vbFormFeed character to split from line #21 to line #23.

[3] Hence, this part:
[tt]
arrAllLines = Split(strAllLines, [red]vbFormFeed[/red])

For i = 0 TO UBound(arrAllLines) - 1 Step 2
strNewText = strNewText & arrAllLines(i) & arrAllLines(i + 1) & VbCrLf
Next
[/tt]

would result in empty, as ubound(arrAllLines) is zero (0). Hence, the loop would not have a chance to loop at all.

[3.1] Instead, there are three lines in the strAllLines. Hence, if you split against vbcrlf, you would have ubound()=2 and the loop would execute once joining the first two lines and the third line would be ignored.
[tt]
arrAllLines = Split(strAllLines, [red]vbCrLf[/red])
[blue]strNewText=""[/blue]
For i = 0 TO UBound(arrAllLines) - 1 Step 2
strNewText = strNewText & arrAllLines(i) & arrAllLines(i + 1) & VbCrLf
Next
[/tt]
 
ohh no "[0.2] Since it appears in line #10, and that you read pass 20 lines ignoring them, there is no vbFormFeed character to split from line #21 to line #23."

Sorry my code reads the whole data file not the sample, i test against the whole data file

Ignore that skip on the sample file when you test it

but thank you so much, I understand your point
 
[4]
>Ignore that skip on the sample file when you test it
I can sure. But you said also
>so i've gave it a try on the sample data file, it didn't work, it only have one line in the output.
That is exactly the anticipated output for skipping 20 lines on sample.txt and then split vbcrlf from line 21 onword.

[5] I do the whole sample file with either
[tt] strAllLines=replace(strAllLines,[red]vbFormFeed[/red],vbcrlf)[/tt]
or only do with double formfeed (if it is generically so appearing)
[tt] strAllLines=replace(strAllLines,[red]vbFormFeed & vbFormFeed[/red],vbcrlf)[/tt]
and the result is what I would anticipated. There is nothing abnormal there. But...

[5.1] The constant is vbFormFeed not vbFeedForm!
 
[5.1-ignored] Ignore this. I had a vision that I thought I saw somewhere in your script appearing vbFeedForm. There isn't. So that note is a double mistake.
 
Tsuji,
Thank you so much for your help, you're the best..I ended up doing that

----------------------------------
Const ForReading = 1
Const ForWriting = 2
Const UniCode = -1

strSourceTxtFile = "PXDRAD January 2010 thru May 2010.txt"
strTargetTxtFile = "PXDRAD January 2010 thru May 2010 - One Line - " & ShortDateTime(Now) & ".txt"

'Reads the TXT file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInputFile = objFSO.OpenTextFile(strSourceTxtFile, ForReading, UniCode)


Set objOutputFile = objFSO.CreateTextFile(strTargetTxtFile, ForWriting, UniCode)
intLineIndex = 0
While Not objInputFile.AtEndOfStream
strLine = objInputFile.ReadLine
If InStr(strLine, Chr(12) & Chr(12)) > 0 Then intLineIndex = 0
If intLineIndex = 0 Then
objOutputFile.Write Replace(strLine, Chr(12) & Chr(12), VbCrLf)
intLineIndex = 1
Else
objOutputFile.WriteLine Replace(strLine, Chr(12) & Chr(12), VbCrLf)
intLineIndex = 0
End If
Wend

objOutputFile.Close
objInputFile.Close


Function ShortDateTime(dtmTime)
strYear = Year(dtmTime)
strMonth = Right("0" & Month(dtmTime), 2)
strDay = Right("0" & Day(dtmTime), 2)
strHour = Right("0" & Hour(dtmTime), 2)
strMinute = Right("0" & Minute(dtmTime), 2)
strSecond = Right("0" & Second(dtmTime), 2)
ShortDateTime = strYear & strMonth & strDay & "-" & strHour & strMinute & strSecond
End Function
----------------------------------

everything works fine, till some lines which is the FORM FEEDER is at the begging of the line !!! and then again missed up everything

----------------
attached is an example of the stupid form feeder !!

oh by the way, what hexeditor tool you're using?
 
 http://www.box.net/shared/gltpmhelvk
[6] >what hexeditor tool you're using?
Nothing grandioso and special, just plain and basic. One I use with pleasure is the XVI32 (freeware) of Christian Maas.

[7] >everything works fine, till some lines which is the FORM FEEDER is at the begging of the line !!! and then again missed up everything.
I have not scrutinized your intention of the new approach. But what I would guess is that you do not like empty line resulting from replacing and splitting.

[7.1] If that is the case, I think using regexp would be the proper solution. I use the notion used in your previous script, namely strAllLines and arrAllLines for illustration.
[tt]
'... etc etc

strAllLines = objFile.ReadAll
objFile.Close

dim rx
set rx=new regexp
with rx
.pattern="(\x0c|\x0d\x0a)+"
.global=true
end with

strAllLines=rx.replace(strAllLines,vbcrlf)
arrAllLines=split(strAllLines,vbcrlf)

strNewText=""
For i = 0 TO UBound(arrAllLines) - 1 Step 2
strNewText = strNewText & arrAllLines(i) & arrAllLines(i + 1) & VbCrLf
Next

'etc etc...
[/tt]
[7.2] In your new sample, it is not in unicode. Just as a side-note, as your new script use opentextfile method with unicode text file. I leave it to yourself for taking care of this.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top