Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

text replacement for CR/LF characters

Status
Not open for further replies.

dataslinger

Programmer
Nov 13, 2013
12
US
Hello,
I need to programmatically read in a text file that are occasionally sprinkled with misplaced single carriage returns (hex 0D/dec 13) or line feeds (hex 0A/dec 10) and then eventually import to a table. When the CR/LF characters are together in a proper pair, I would like to keep them, otherwise replace them with a space. I have tried various combinations of CHRTRAN, CHRTRANC, STRTRAN, and STRCONV with hex and decimal notation but I cannot get VFP to replace the carriage returns no matter what I do.

There are a few different replacement sequences that would work, but something like this should be straight-forward:

Code:
cFile = "mytestfile.txt"
cString = FILETOSTR(cFile)
cString = CHRTRANC(cString,CHR(13),"|")  && CR with a pipe, this replacement fails for some reason
cString = CHRTRANC(cString,CHR(10),"|")  && LF with a pipe, this replacement works fine
cString = STRTRAN(cString,”||”,CHR(13)+CHR(10))  && double pipes to CR/LF pair
cString = CHRTRANC(cString,"|",CHR(32))  && remaining pipes to spaces
STRTOFILE(cString,"myoutput.txt")

Any ideas?

I would like to work natively from text in stock VFP9, no third party tools please. Files are potentially large, if there's something better than FILETOSTR, advice on that is also appreciated.

thanks in advance
 
This is a place I might be useful for a change...


You're using CHRTRANC() instead of CHRTRAN(). The "C" version is for double-byte characters (like Chinese Kanji, Arabic, ect).
Try changing to CHRTRAN() and see if that workd.


Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Thanks but no dice. I mentioned I tried that before, but tried again here and it made no difference. My thought was perhaps handling CR as double-byte could help. Regardless, per VFP documentation,
if the expressions contains only single-byte characters, CHRTRANC( ) is equivalent to CHRTRAN( ).
 
That's very odd.

I did this at command line:

lcText1 = CHR(10)
MESSAGEBOX(lcText1) -> shows empty message box.
lcText2 = CHRTRAN(lcText1,CHR(10),"|")
MESSAGEBOX(lcText2 -> |
lcText2 = CHRTRANC(lcText1,CHR(10),"|")
MESSAGEBOX(lcText2 -> shows empty message box.

Try it in your command line.
I can tell you though, no matter what if your string is not double-byte CHRTRANC is only ever going to return non-conversion.

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Scott,
I understand this works I appreciate that you want to help, but per post, this is for a file. Please try this with a few lines of data in a file with isolated CRs and LFs here and there (not only as a pair), the CRs will essentially go missing.
Thanks
 
Well this is troubl-shooting, one step at a time. My point here first was to make sure that you're using the right function, which you are not.

Why don't you try running it on a couple of lines instead of a whole file, with SET STEP ON and see if you can determine where it's failing....
For what it's worth, it's 3:30am where I am and I'm also trying to get my own broken code to run...

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Scott,
Per post I already tried CHARTRAN, but I did again for you here and it didn't work. Per VFP, CHARTRAN & CHARTRANC are equivalent for single byte characters so it's not wrong and not the problem ... and I don't have a preference for either one so let's just use CHARTRAN and move past that.

SET STEP ON will not add anything. I've already tried a lot of things before posting, including one line at a time and isolating this to a single character at a time as you have. The problem is when the single CRs are within the file and not at the end.

I understand you are busy, please allow someone with the time to read the post and show how to get a file as described to work.
Thanks,
 
This should do it:

[pre]lcString = filetostr('yourfile.csv')
lcString = strtran(lcString, chr(13) + chr(10), chr(250)) && Replace chr(250) with another dummy character if you must
lcString = chrtran(lcString, chr(13) + chr(10),' ')
lcString = strtran(lcString, chr(250), chr(13) + chr(10) ) && Replace chr(250) with another dummy character if you must
[/pre]
 
Ok, correction, it didn't quite work. I agree that "chr(13) + chr(10)" should work, since you can list multiple characters w/CHRTRAN, but it doesn't in practice. The stray 0A is doing exactly what it did for me, which is that it disappears and there is no replacement character (space) in the output. However, by just splitting that one line from tbleken into two lines like this, it then worked:

Code:
lcString = chrtran(lcString, chr(13),' ')
lcString = chrtran(lcString, chr(10),' ')


 
When I read in your file I see 1 CR (13) and 4 LF (10) via OCCURS. And OCCURS also sees 0 CRLF in combination.

You're chasing ghosts, most probably other chars than CR and LF are stray in your CSV files and sindle LF mark line ends. CHRTRAN does not depend on any setting, so if you don't ever get double pipes the combination CRLF does not exist in your files, that's the simple truth and that's what you verify if you never get any double pipes. You might have a csv with only LFs from a linux server, think about how different OSes work in regards of line feeds. Linux/Unix/OSX: LF only, old Macintosh OS: CR only, Windows: CRLF.

And if your "stray" single LFs are within quotes they are correct and no end of record marks, VFPs CSV processing ia not working with this rule, though, so you better use seomthing else for data import. Your csv preprocessing with CHRTRANS will fail, though, as you don't have what you expect.

Bye, Olaf.
 
Hi Olaf,
Thanks, did you download the last uploaded file (from post 8 Sep 15 20:09)? The first file that I could not delete from the thread, did not have the pairs, but the latest one does. If you look at the second file in a hex editor the CR/LF pairs are visible and OCCURS will be > 0. There are no other stray characters apart from CR or LF (I hand-created the small sample so it is a small enough sample to visually inspect).

You are correct about other OSes having different record delimiters not necessarily both CR/LF and also that when framed in double quotes a stray CR or LF would be ok for a CSV (though I do not want those).

I am basically in good shape now anyway, though still curious why the one line from tbleken doesn't work, because I think it should work on that file:
Code:
lcString = strtran(lcString, chr(13) + chr(10), chr(250)) && Replace chr(250)
 
Indeed the second file contains three line breaks and the code of Tore Bleken works and temporarily creates chr(250).

The whole code replaces CRLF temporary with CHR(250) and puts back CRLF after all single CR and LF have been CHTRANed to spaces. There's nothing I see that could not work with this file.

What is your Set("Collate")? It shouldn't influence any search of CR,LF or any combination, but it's one of the things which might make a difference, eg there was a bug with Indexes with certain collations in earlier VFP version. What is your's by the way?

And the other thing maybe making a difference is your current codepage, what is it?
? CPCURRENT()
? CPCURRENT(1)
? CPCURRENT(2)

Bye, Olaf.
 
Thanks. I'm surprised the code works completely for you, for me, where it takes out the lone CR, the expected space is missing. It's mostly right, and by no means terrible.

Assuming I found this correctly, they are:
[ul]
[li]version: VFP 9, sp2, w/hotfix[/li]
[li]code page: "1252"[/li]
[li]collage: "machine"[/li]
[/ul]

However, it's using FILETOSTR, doing the replacements, then STRTOFILE, so it's never in an actual table, not sure if some of those would then be relevant.
 
OK, try with lcString = chrtran(lcString, chr(13) + chr(10),space(2)) - each original single char needs it's replacement target char, if you do chrtran() with a single space only, only chr(13) is replaced with space, chr(10) is removed instead of replaced with spaces.

Anyway your original code also works, so your initial problem is still unresolved. I think we'll never know what really was the case.
The only thing not working with the code as you posted is the use of ” instead of ". But that's shown as obvious syntax error and I assume you simply didn't really copy & paste that code.

Bye, Olaf.
 
Olaf,
You solved it. The issue has been that the one space would be missing when replacing the lone CRs, while lone LFs would have the expected space, even though they use the same replacement. As stated in your first paragraph in your last post, there needs to be a 1:1 between characters sought and replaced with CHRTRAN. I was experiencing the missing space, which caused me to post; I had done the same as tbleken with CHTRAN in some of my many iterations of trying to get this to work before posting. I caused confusion by using/attaching the wrong text file at one point and my initial posted code would have worked had it not been for that. Still, the missing space has been the main thing causing me to post, thinking the CR replace was failing, it was just that I couldn't use 2 search characters with one replacement in CHRTRAN.

There is no "instead of" in my code in the first post or the code I'm using from tbleken? In any case, problem solved! many thanks!!
 
>There is no "instead of" in my code in the first post

You're not taken this literally enough, I talked about the double quote characters in the form of " (the normal double quote) and ” (the qoute character normally ending a text quote).
Aside of using the wrong double quote character your initial code works with the second file, as you do two chrtrans for CR and LF separately and a strtran of the double pipes. It's more complicated, but it works.

And to be precise about CHRTRAN: The replacement characters are allowed to be less than the sought characters, as that means some chars are not replaced, but removed or replaced with empty string.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top