Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Faster string handling

Status
Not open for further replies.

Tilen

Programmer
Apr 2, 2005
75
SI
Hi

I have a big process to handle data from file. Benchmark is 130Mb file with 3Mio lines.

I use a lots of Delete,Copy,Length and Pos function to handle strings.

Does anybody have faster functions that Delphi's that would do the same thing, just faster?

The biggest file is 1GB with 35Mio lines, so any improvement in speed would be great. At the momen it process 130Mb in 100s and 1GB in 15mins... I did a lot of optimization of code and I think I could get it done even faster with maybe some functions that use assembler code.

Please advise
Thanx Tilen
 
What are you wanting to get done with your string handling? It might be able to benefit from different algorithms than simple string searches and copies.

Have you tried isolating the code that does this out and profiling it to see where most of your time is being spent? If you don't have such a critter, go download the demo of ProDelphi (which will profile anything up to a maximum of 20 procedures/functions) -
If you don't do this you won't be incredibly sure where your speed problems lie. It may be that the standard functions are well enough and other things may be of issue. It's hard to guess without something to tell you for sure.
 
Another thought I just had...how are you performing your I/O on these files? Often times a lot of speed can be gained by changing up how that is done. What kind of file def. are these files you are processing?
 
Well, the thing is. I have around 10K of code that process the data from file. Lots of it is almost repeated, but it has to be separate since the object's info I get from file.

I know I should have done it from the beginning in a more generic way, but I haven't and it would take me least two weeks to change it. I don't have that much time.

So, I'm tweaking it where I can. I already use ProDelphi and I did optimize all things I could. But I use lots of these simple functions (Pos,Delete,Copy,Length) there's at least 10times per line.

With optimization I got it down from 6 minutes to 100s, which is pretty good, but it's still to much for 1GB in 15 minutes.

I tried CopyStr from FastStrings, but it's not any faster than Copy, since it has more flexibility and it's faster on more complex Copy stuff.

I use simple Readln from TexFile and it takes about 40s to read 1Gb file.. all the rest is processing. I know I could make it faster with TStream, I just don't have time to implement all the changes.

Thanx
 
I use simple Readln from TexFile and it takes about 40s to read 1Gb file.. all the rest is processing."

You can use SetTextBuf() ( ) to change the buffer size of your text file. The accepted default usually is 128 bytes, so it would definitely pay off it sounds like to utilize this function in your app, if you do not already do it. 128 bytes is ludicrously small and for anything of the size you are doing, I would consider at least going to 64KB and probably 256KB because the memory is there and most drives today are capable of reading much more in a single pass. You will have to test to determine which value is best.

The catch is though, with string handling, that it is an inherently slow operation, so you likely won't find anything much more efficient than what Delphi is already offering. So it will probably come down to looking at your algorithms you are using.

But I can always look in my stash of stuff and see if I can find something that might be different/better/whatever than what is available..
 
Yes, I know I could read into buffer. Yes, it is much faster.

But the problem si, all my code is based on line by line reading and it also process in that way. Each object is in it's own line, and some have additional properties in next lines.

The thing is, I know I could read into buffer 64K, but then I would need to parse that into lines... perhaps TStringList, and then feed in the process procedure each line, And then I would have to concatenate the last line, and if in the last line I have an object that need another info from next line, I would need to have that logic in each procedure that needs to read next line, to read a buffer again, and get just that line... Too much code change.

I will do it, maybe in a month or two, but right now, I just can't afford it.

My structure is something like:

OBJECT1="NAME1",PROP_A=1,PROP_B=232,PROP_C="YES",PROPD_D={SUB_PROP_A=1,SUN_PROP_B="NEXT",SUB_PROP_C={
SUB_SUB_PROP_A="MAYBE"}}

An to parse each property I need to check if it'0s text ".." or just number that ends with ',' or '}'... So, lots of Pos and Delete and Copy...

So , Ithough mybe with some assembler code for these functions I could make it a bit faster.

If not, no problems.


Thanx anyway
 
SetTextBuf is not something that would require you to perform any additional processing. It just takes your memory you outline and lines it into the code that is behind the "Text" file type. You still readln and writeln your stuff the same way.

It's a two-line code addition.
 
OK, now I see what you meant. I'm not sure how woud SetExtBuf work...

For example I have 4 line file:

OBJECT1=...
OBJECT2=...
PROP_F={..}
OBJECT3=...

This is how I do it now:

...
Readln(file,line);
ProcessObjects(line)
{
ProcessLine;
If LastCharInLine='{' then Readln(file,next_file);
ProcessNextLine;
}

...

So , If I read more than 1 line I would need to change how this code :

If LastCharInLine='{' then Readln(file,next_file);

works. Since next line could be 1000 lines from cuurent, if I change a buffer to read more than one line.


If I understand what SetTextBuf does.. it reads more than 1 line per Readln, depending on the size.

Or am I worong?
 
No, readln is still working line by line. SetTextBuf eliminates frequent access to disks or other IO devices.
IF is the slowest comparison operation. Try case

Steven
 
I think I get it... I'll try with SetTextBuf and let you know how it goes.

Hm... I didnt' know I could use Case for non-numeric expressions:

but this works:

Case line[Length(Line)] of
'{':...;
'}':...;
end;

So Case is faster the IF??? I'm sure I have a 1000 If statements...

Thanx
 
I think I get it... I'll try with SetTextBuf and let you know how it goes.

I haven't gotten the chance to look at the stuff I have for string-handling code, but I will point out that there was an example in the link I posted above.
 
Yes, it is faster to use 16K. but no that much.

At the moment I can't restart my PC so 1GB readln takjes almost 120s , with 16K buffer almost 80s. Bit faster but no too much.

It went down from 15 minutes to 14 minutes, to process 1GB file.

I'm still looking for faster functions, and I got something for Pos function:


I havent' had a chance to test it, yet.
 
Perhaps if you post some of your string handling code we can point out any tips to speed that section up. You can then apply our recommendations to the rest of your string handling code.
 
As far as I got with string handling, I use PosShaAsm4 function from the FastCodeProject. It is faster than Delphi's Pos function.

But that's it, I tried finding faster Delete and Copy function, no luck.

Than I found about FastMM ( ) and this makes a huge difference, since I use a lot of memory handling.

So, my string handling functions are not so much of an issue any more, since my application runs 50% faster with FastMM.
FastMM only helps with lots of memory processing, or it doesn't have any effect of performance.
 
Great find. The only catch with using third-party stuff is that you're subject to the license agreement...

Are you wishing that? Or if you are doing this for hire for a company, does the company wish that? If you work for a company is the company aware of this, and is their legal department prepared to handle the use of this code?

See for the license on FastMM.
 
The company I'm working for has all the info about license agreements for all my development parts. From VCL components to FastMM.

The only thing is, all license fees are coming out from their budget for my development. So, from my pocket.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top