Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

explode and preg_split eat all Memory 2

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
Why do both the explode and preg_split functions eat up the available 32MG of memory on my server resulting in a Fatal error? Kinda crazy that it does. (I un-commented the explode and got same error.) The data has a lot of Unicode non-English chars.

Code:
$len = mb_strlen($FileContents);
	echo '<br> len  of filecontents=' . $len; 

	// $SignificantPlaceNames = explode("\n", $FileContents);   // stupid explode kills memory
	$SignificantPlaceNames = preg_split("/[\s,]+/", $FileContents);

Output:

len of filecontents=2767569
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 16 bytes) in /var/ on line 57

 
There is quite a bit of traffic on this going back to 2007. It would be interesting to re run the tests on a 32 bit architecture. I would imagine that this would be better.

But basically the php array structure supports so many functions that I suspect the indices need to be quite fulsome. Also remember that php is an interpreted language and the developers are open about the fact that they do not aim for memory hardware optimization as anything higher up the food chain than 'nice to have'.
 
#1 fgets
before DEBUG: BEGIN MEMORY=338728
before DEBUG: MEMORY PEAK=357088
after DEBUG: BEGIN MEMORY=32719904
after DEBUG: MEMORY PEAK=32733896

#2 explode
before DEBUG: BEGIN MEMORY=32719984
before DEBUG: MEMORY PEAK=3273389
after DEBUG: BEGIN MEMORY=35086688
after DEBUG: MEMORY PEAK=67469112

#3 the file function
before DEBUG: BEGIN MEMORY=35086688
before DEBUG: MEMORY PEAK=67469112
after DEBUG: BEGIN MEMORY=35086624
after DEBUG: MEMORY PEAK=69843920

#4 preg_split
before DEBUG: BEGIN MEMORY=35086624
before DEBUG: MEMORY PEAK=69843920
after DEBUG: BEGIN MEMORY=35269928
after DEBUG: MEMORY PEAK=69843920

#5 mb_split
before DEBUG: BEGIN MEMORY=35269928
before DEBUG: MEMORY PEAK=69843920
after DEBUG: BEGIN MEMORY=5873064
after DEBUG: MEMORY PEAK=69843920
This is under Windows (on XP) version 5.3.5
I don't have php (or a web server) on the machine I'm so I just downloaded the latest version and ran it from the command prompt and captured the output. Looks to use more memory then other *nix and I hope it's not by editing!
I got the new witerings about timezones which I epected but it also told me that you are using an undelared variable ($i) in line 14 and $FileContents on line 18. Not a biggie just havn't seen PHP be this pickey before.
One thing I was going to ask is have you tried running from the command line?, I was thinking perhaps some kind of memory issues might only be appranant in web server runs.
 
have you tried running from the command line
my tests were run from the command line and via a web browser.

Not a biggie just havn't seen PHP be this pickey before.
it just throws a E_NOTICE for undeclared variables. this has long been the case but most installs will have the reporting set to ignore notices.

I am not sure however that your (Ingresman) tests hold much water. it appears that you ran the script all in one go, with all scriptlets being run sequentially. This won't be a fair test as the starting memory will always mis report (against intentions) and the peak memory will not provide a definitively useful measurement in each case. You should run each test separately to achieve meaningful results.

As with my tests, the mb_split variant that you ran shows a marked decrease in memory usage. Whilst looking positive, I believe that you will find that you are left with a single element array and not what was anticipated. This is indicative that the source data was not using standard line terminators.
 
The only solution for the affected versions of PHP is to simply give the aforementioned functions the memory they want. Thx for all the help!
 
I am not so sure. If you would care to share more of your business intent on this we may be able to come up with a solution. As a starter, to my mind, storing this kind of data in a flat file is a bad idea. I would store it in an indexed database table. and maybe even add an FT index too. If you don't have the memory footprint for mysql, then this is probably the perfect application for sqlite.

then the heavy work can all be done inside the db engine and not inside an array structure held in memory by an interpreted language (which is very far from optimal for either memory usage or speed)
 
I'm creating a simple function to determine if a word is a place name -that's all it does. (Aspell does not have most place names in it.) Using 2.7 mg of memory for an array seems like it would have a lot less overall foot print than even SQLlite --but I could be wrong.
 
consider also storing a leventshein value or soundex to allow for fuzzy matching on misspells.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top