Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Read big unicode file, massage data, write unicode 1

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
I am trying to read a 25 MG file with UTF-8 words in it, massage the data and then write it back out preserving the UTF-8 encoding. Is there a way to read in an EOL terminated line one at a time and preserve the UTF-8?

Binary safe fread and fwrite recognize EOF but not EOL.
 
have you tried fgets() with the auto_detect_line_endings?
 
Code:
ini_set('auto_detect_line_endings', true);
$fh   = fopen($filename, rb);
while (false !== ($data = fgets($handle, 4096)):
   //do something with $data
endwhile;
fclose($fh);

this will not fix the problem if the line endings are actually themselves coded in unicode. However this would be unusual since all reputable sources counsel against doing so.

However, even if the line endings are in unicode, the script will still digest the file in small chunks that should be memory efficient. Although 25MB does not seem too big to do all in one chunk anyway.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top