Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading line from txt with special charater problem

Status
Not open for further replies.

vipinkrshamra

Programmer
Jul 20, 2005
18
US
Hi!,

I am trying to read txt file from .Net app using following code.

*****code****
Dim myReader As New System.IO.StreamReader(FileName, System.Text.Encoding.Default)
While myReader.Peek <> -1
mystring = myReader.ReadLine
End While
*****code****

This txt file is created by some 3rd party application from some database. One of the field which is added to file may have line feed character.

If this file is opened in notepad, the line, which may have some sort of line feed character, appears fine in single line with that special character appearing as small square box type character.

But if I read this txt file using StreamReader this line break into two line and which breaks my logic. Is there anyway to ignore any special character while reading txt file using StreamReader . This txt file may have 1000's of line and each file need to processed seperately. Because of this special character StreamReader reads some lines as multile line instead of single line. Since notepad is able to display that line in single line itself I am assuming, using StreamReader also I should be able to ignore that character. I tried using System.Text.Encoding.Default but that did not help.

Any suggestion/help would be greatly appreciated.

Thanks,
Vipin
 
ReadLine will read up to where it sees a linefeed character combination:
MSDN said:
A line is defined as a sequence of characters followed by a line feed ("\n") or a carriage return immediately followed by a line feed ("\r\n").
It sounds like you need to read it as a series of bytes, and do your own end-of-line detection.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Thx a lot Chiph for help.

I do understand what you are saying but kind of stuck to fins the solution. As I mentioned I need to process each line separately and this line feed character is creating extra line with Readline.

Since notepad is able to ignore these characters and could show that line as single line, I am also trying to figure out to ignore these characters while reading line.

Some more info :

The field in database, which is part of line in question, may have multiple lines, which is causing problem. Since this txt file is generated by third party software, I don’t have control on that. I need to ignore line feed characters and it need to maintain right line number in txt file which very crucial for my application.

If you could provide me specific example on how to handle it, that would be great.




 
Since the field may have multiple lines, the linefeed character would probably be used to seperate these.

If you read the file as a series of bytes as chiph suggested you could then look for ASCII 13 followed by ASCII 10 which is the Carriage Reurn Linefeed sequence, and build your record based on this.


Hope this helps.
 
thx Earhandfire,

I have two concern with your suggestion

#1 How will I differentiate between that special character and actual line feed. As I mention there are 1000s of line in txt file, only one or two line may have that special character and I need to maintain original line sequence as I could see in Notepad.

#2 If I check character by character it will be very slow. Approx my file would be some where 5 to 7 MB size
 
Ok the character which is causing problem is ASCII(10).

Now is there any stream to open txt file to ignore ASCII(10) character.

I guess it should be possible since NOTEPAD and many other text editor like Edit Plus are able to ignore this character and are displaying complete line as single line instead of breaking line at character having ASCII(10) value..

Looking forward for any valuable suggestion.
 
From help:

A line is defined as a sequence of characters followed by a carriage return (0x000d), a line feed (0x000a), Environment.NewLine, or the end of stream marker. The string that is returned does not contain the terminating carriage return and/or line feed. The returned value is a null reference (Nothing in Visual Basic) if the end of the input stream has been reached.


One solution might be to read in the whole file, then use split to break the file into lines at Environment.Newline. Then each item in the array will be a complete line/record.


Hope this helps.
 
Yes I also had this solution in my mind but since my txt file could be 10 mb or some time even bigger. This solution could have memory issue and also may be little slower.

I was expecting while opening file-using StreamReader there must be way to ignore special characters like ASCII(10).

As I mentioned earlier if I open file in notepad the line in question appear as single line in notepad with Ascii(10) charater appearing as square character. I wish something that could be possible with StreamReader.

But anyways thx a lot earthandfire for your suggestion. Let me know if I am able to explain you what I am trying to do and if you have any idea that would be greatly appreciated.
 
The only other solution that I can see is to do as chiph suggested and process the file byte by byte.

Basically, use a StringBuilder variable, clear it, keep reading into it until you hit:
ASCII 10 - store a space in the string
ASCII 13 (don't add this character) then read and throw away the next character should be ASCII 10

or End of File

you now should have a complete line - do what you need with this line - then clear the stringbuilder and .... repeat the process until End of File

(By the way there are a number of buffer manipulation algorithms - probably some even posted here on TT - that would enable you to read the file a block at a time.)

Hope this helps.
 
This is assembler but may give you some ideas:

thread272-1026257


Hope this helps.
 
thx a lot earthandfire and keeping this thread alive.

I can use any of these methods. One last question in your opinion, which would be faster and efficient way of tackling this problem and what could be possible pros and cons of each technique
 
With the right algorithm, I would guess that the buffer method would be a lot faster and more efficient - equally a bad algorithm would have the opposite effect.

Personally, I would probably go for the byte by byte method - the code is easier to write and if I had to refer back to it in six months or year I would understand it. Using a clever algorithm, I would need to carefully document each step and would still probably have to carefully analyse the code each time I need to refer to it.

Not really a very good answer - but you did say in your opinion [smile]

Hope this helps.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top