Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

explode and preg_split eat all Memory 2

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
Why do both the explode and preg_split functions eat up the available 32MG of memory on my server resulting in a Fatal error? Kinda crazy that it does. (I un-commented the explode and got same error.) The data has a lot of Unicode non-English chars.

Code:
$len = mb_strlen($FileContents);
	echo '<br> len  of filecontents=' . $len; 

	// $SignificantPlaceNames = explode("\n", $FileContents);   // stupid explode kills memory
	$SignificantPlaceNames = preg_split("/[\s,]+/", $FileContents);

Output:

len of filecontents=2767569
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 16 bytes) in /var/ on line 57

 
ALL I WANT TO DO IS READ 2.7 MG WORTH OF UNICODE WORDS INTO AN ARRAY (1 word per element) FROM A FILE THAT HAS ONE WORD PER LINE. Why is this so difficult? I read 2.7mg of data into $FileContents using fread. After reading the file in with fread and displaying $FileContents with var_dump it appears to then be separated by spaces.

feherke, thx but MB_split just doesn't work. I've played with the function for a long time and it will always read the entire string into a single element in an array. I read somewhere that mb_split is 'experimental'.

Code:
$array = mb_split('/\s/', $FileContents);
The above line of code produces a 1 element array. I've tried other regexes and the lowercase 'u' modifier but to no avail.

explode and str_word_count cause a fatal memory error with my script suddenly asking for more than 33 mg of memory. Using fgets in a loop to read one line from the file at a time also cause the fatal memory error.

The PHP script uses about 2.8mg of memory (memory_get_usage) until it gets the fatal memory error and suddenly it needs 33 MG.

Crazy! There must be 1) some bug OR 2) explode and str_word_count are barfing on the Unicode. But then PHP never was much good with Unicode.

 
what version of unicode are you using?
are you using utf new line sequences or traditional \n?

 
The data file is UTF-8. I assume the it is the traditional new line. My editor, says the data file is UTF-8.
 
My apologies. I had meant to ask what version of php.
 
anyway, if standard line terminators are used and if your requirement is, as you say, to read the file into an array of lines, and each line has only a single word then this should be fine

Code:
$file  = '/path/to/file.txt';
$lines  = file($file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
 
Any attempt using the functions 'file', fegts (in a loop), explode, preg_split or str_word_count (after fread to a string) --all result in the fatal error below. The script uses less than 3 mg of memory and then when it tries one of the aforementioned functions it suddenly needs 33mg and it gets the memory error. 33mg to read a 2.7 mg file? Could this be a bug? Could PHP 5 be barfing on Unicode sense it was never meant to process Unicode? Should I say a little prayer before running the script? :) Running "PHP Version 5.2.4-2ubuntu5.12".

Code:
echo '<br> begin DEBUG: BEGIN MEMORY=' . memory_get_usage();
echo '<br> begin DEBUG: MEMORY PEAK=' . memory_get_peak_usage();
echo '<br> hello in GetSignificantPlaceNames B';
$SignificantPlaceNames = file($FTfile, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

OUPUT:

begin DEBUG: BEGIN MEMORY=64488
begin DEBUG: MEMORY PEAK=74928
hello in GetSignificantPlaceNames B
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 7 bytes) in /var/ on line 46
 
If you are using a packaged version of php from apt repos then try Dow loading the source and compile it.

However 32mb is a woefully small amount of memory to allow to php. Is there any reason you cannot allocate more?
 
i am trying to rule out a corrupt php install. if you build from scratch then you are using something that you yourself can be confident in and have made choices in respect of. if you use an apt-repo you are relying on others to make choices for you (for example as to libraries, methods of building etc).
 
Good suggestion. I'll try it out on another server with a different build.
 
I tried the same thing on a different LAMP with a lot more memory and I do not get the fatal error but I do get the same behavior --reading the 2.7 mg file into an array suddenly eats up 30mg.
 
[0] If using file(), fgets(), fread() all result in memory blown, can you tell by what means you prepare $FileContents for further processing?

[1] How about try the basic like splitting byte-by-byte and check the sizeof() of the resultant array?
[tt]
$a=preg_split('//', $FileContents);
echo "<br />" . sizeof($a);
[/tt]
Does it blow up? Does it agree with the size of the string within 0 to 3 bytes allowed for the bom?

[1.1] One possibility is due to the failure of compiling the regex of out-dated or whatever reason due to the building of pcre library. So simplify to some basic pattern to see how it behaves.
 
I've opened many other smaller files, some with Unicode, and they never caused the fatal memory error. I think this is a bug. There's just no reason why reading 2.7mg of Unicode data into an array (with whatever method) should cause the script to suddenly eat up 33mg of memory causing a Fatal Error. Opinions? Submit this as a bug to bugs.php.net ??
 
[0.1] Show the lines leading to $FileContents before grieving... You have not answered to that aspect of thing. This is your statement.
>Any attempt using the functions 'file', fegts (in a loop), explode, preg_split or str_word_count (after fread to a string) --all result in the fatal error below.
How do you get to $FileContents.
 
tsuji,

I really did not do anything special to $FileContents, just this:

$FileContents = fread($thefile, filesize($FTfile));

I tried opening if with "r" and "rb" both.
 
Can we suppose this, or it is largely different?
[tt]
$thefile=fopen($FTfile);
$FileContents=fread($thefile, filesize($FTfile));
[/tt]
Can the file where $FTfile is pointing to be shown by loading somewhere?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top