Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

check-point 1

Status
Not open for further replies.

rangr1

Technical User
Jun 18, 2009
2
US
I have trouble creating a check-point i.e. to restart a program from where it was if it is suddenly killed.

I output from the program in a file called "xyz.out", which has x y z (real dp) values. At the end of each output "data block" (5 xyz lines), I have a check variable which is 65e10 65e10 1000 (say) where 65e10 is a large number and 1000 is the current iteration index.

I output a file called "run.out" which has a namelist with iteration number and other details to restart the program each time I output a "data block" in xyz.out. I compare this iteration number (in run.out) with the one in the xyz.out file. If they are the same I can start my program.

To Illustrate,
xyz.out has,
.
.
.
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000
650000000000 65000000000 15.000000000

and run.out has

Iteration=15.000000

Now the problem is when I suddenly kill my program, it is usually in the middle of writing either run.out or xyz.out. Also, iteration number in run.out is far ahead of xyz.out, even though xyz.out is output first in the program sequence. I need both the iteration numbers to match up to restart my program successfully.

i.e. what happens is

xyz.out has,
.
.
650000000000 65000000000 14.000000000
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000
1.0000000000 2.000000000 3.0000000000

and run.out has

Iteration=15.000000

I can backspace upto 14.00000000 and start the program from there, but, the file run.out has "Iteration=15.00000", which creates a conflict.

I tried using flush, but to no avail. It would be of great help to me if there is any simple solution to problem.

I use GFORTRAN compiler and I have coded in F90.
 
Try writing the iteration number in xyz.out and write run.out after you have written xyz.out.

1) If run.out matches xyz.out, you can restart - it is a complete iteration
2) If run.out is less than xyz.out, you need to backspace to where run.out is
 
Exactly. That's the logic I followed and I did both the things in my program.

My problem is the iteration number in run.out is further ahead of the number in xyz.out.

This is because there is a buffer for xyz.out (which is interrupted when I kill suddenly) and for run.out I open and close each time (replacing the previous contents by the new one i.e. overwriting it.) Though the sequence of the program writes xyz.out first and then run.out, run.out is in a sense faster than xyz.out.

Many a time the program gets terminated in middle of writing run.out.

xwb thanks for understanding my problem.
 
I don't know gfortran that well, I know IVF better. Does gfortran have the equivalent of kbhit? What it does is check whether the user has entered anything on the keyboard. I couldn't find anything like that on the online manual.

If there is such a routine then what you could do is
Code:
do while (.not. kbhit())
   !! do calcs
   !! write out files
end do
!! close files
stop
Only check if the user has asked the program to abort after all the files have been written out. This allows the program to terminate cleanly instead of aborting.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top