Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

URGENT HELP Required regarding the use of FORK 1

Status
Not open for further replies.

VZAKUMAR

Programmer
Oct 5, 2001
2
US
I wanted one of the UNIX Gurus to help me resolve my problem.

I have a file with around 5 million records (50 lakhs). Now my original process was taking around 30 hours to read the complete file, process each and every record and write it to another file. we do a lot of calculations for each and every record so it takes that time.

Now I planned to implement PARALLEL processing in my program. So I am dividing the complete input file into 5 chunks (each of 1 million) and sending every chunk to every child process to process. Now every child process will process its own chunk and write it to its own temporary file. Finally in the parent process I am planning to merge all the temporary files together. By doing this I believe I shall save a lot of processing time.

What I am basically interested to know is that what are the side effects of using FORK in the C programs? Are there any SYSTEM level impacts by using FORK? Is there any system call to merge multiple files into ONE?

Can some one briefly advise how I can proceed with my Logic. I have already written the logic but I want to cross check if there is something I am missing in my logic.

Thanks,
 
Hi,
What system are you using? Some systems support Threads and therefore you don't need to use fork but use CreateThread. Threads behave like separate processes but share the same data space and therefore task switching between threads is easier on the system than switch processes which is an expesive operation.

 
There's got to be a point at which it will be more efficient (faster) for you to use a database to store your processed records. I suspect that, with 5 mill rows, you passed that point a while ago.

I'm sure that you'll get a perfomance benefit from processing the input file in parallel - but you'll lose quite a bit if you have to merge the separate output files afterwards.

Have you considered using one of the free databases (MySQL springs to mind, simple - and fast. PostgreSQL is also well thought of) to store the processed rows? You could process the input file in parallel (as you're planning) and store the results in a single table in the database - you could then extract the data from that table in whatever order you wanted. Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
If you don't have SMP you won't benefit of partitioning!
I hope it works...
Unix was made by and for smart people.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top