Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Another efficiency issue 2

Status
Not open for further replies.

JEichhorn

Programmer
Oct 21, 2014
10
DE
Is it possible to re-code the following in a more efficient way? The code performs a so-called red-black SOR to solve a Poisson equation, dividing an iterative step into two half-steps with the second using updated neighbored values of the first.

Code:
DO ijswap = 0,1
      DO k = 2,n-2
      DO j = 2,m-2
      DO i = 2,l-2
         IF (MOD((i+j+k),2) .EQ. ijswap) THEN ! 
            compute some stuff depending on i,j,k
         END IF 
      END DO
      END DO
      END DO
   END DO ! ijswap

I'm know that best would be to switch to a state-of-the-art solver, but i need to stay with this one for maintenance reasons. I'm concerned about the IF block again, since this is called within a 3d loop which is performed a several 1000 times during each run of my program. The solver consumes about 2/3 of the code's CPU time.
 
The mod operation uses division. Try IAND(i+j+k,1). Should be a lot cheaper.
 
Thanks for the hint. I've tried it out, but didn't experience a performance gain. Maybe the compiler has already optimized this?
 
I've just run 2 little test cases with a 2000³ loop, once computing MOD((i+j+k),2), once IAND((i+j+k),1). The former took 20.8 sec, tha latter 20.7. So there is no time to save in this place :)
 
Presently, the MOD operation and IF are being evaluated i*j*k times; but, if you take them out of the inner-most loop, it could reduce the number of MOD-calls and IF-evaluations down to j*k.

What I am thinking is that the stuff inside the inner-most loop is already only being done every other value of i...but once you know j and k before entering the i loop, you KNOW which values of i will yield an even or odd number and, so, I am thinking of something along the following lines:
Code:
DO ijswap = 0,1
    DO k = 2,n-2
    DO j = 2,m-2
        if (MOD((j+k),2) .EQ. ijswap) then
            istart = 3
        else
            istart = 2
        end if
        DO i = istart,l-2,2
            compute some stuff depending on i,j,k
        END DO
    END DO
    END DO
END DO ! ijswap
I just wrote this here and have not compiled it or made sure it does "some stuff" for the correct indexes or not...so, you check it and make sure it works and/or that it is not backwards.

Hope this helps.

 
If the above alone does not do the job, maybe you can split the ijswap loop into two...one for 0 and one for 1; or, you can add an outer " IF (ijswap.eq.0) " to the one proposed above, to correctly set istart for both cases.
 
Thanks for your help, splitting the IF helped a lot. My test case took 599 sec CPU time before i started to tune the code. Replacing most divisions by multiplications reduced that to ~570 sec. Following your suggestion gave a further massive improvement down to 528 sec.
 
Oh, good.

Say, I read the original post and it mentions something about the second iteration using updated neighboring values...what exactly is this all about? are you talking about actual array entries? I mean, do the values being updated depend only on values that are NOT being updated during the same iteration? If so, you may be able to further "parallelize" or at least open the door for parallel processing with a FORALL or an openmp directive.

 
The routine is part of a numerical flow model, it solves a Poisson equation for pressure. The whole code was designed back in the 90s and still is used by most of my clients on single processor machines (Desktop PC).

The stuff inside the innermost loop looks like this:

Code:
pneu     = (- (p(i+1,j,k)*xpd(i,j,k) + p(i-1,j,k)*xpd(i-1,j,k))*dxi(i)  &
            - (p(i,j+1,k)*ypd(i,j,k) + p(i,j-1,k)*ypd(i,j-1,k))*dyi(j)  &
            - (p(i,j,k+1)*zpd(i,j,k) + p(i,j,k-1)*zpd(i,j,k-1))*dzi(k)  &
            + rs(i,j,k)) &
            * quot(i,j,k)
palt     = p(i,j,k)
p(i,j,k) = ijkxyz(i,j,k)*pneu*omega + palt*(1.e0-omega)

As you guessed, the new values p(i,j,k) depend on their neighbors only. xpd, ypd, zpd are not modified during the iteration, also rs, quot and ijkxyz are fixed arrays.
 
Say, I have not had much opportunity to use FORALL, so, I was wondering if you could give it a try and post some performance feedback.

For the case of your three nested do-loops, it would look something like this:

Code:
do ijswap=0,1
    forall(k=2:n-2, j=2:m-2, i=2:l-2, mod(i+j+k,2).eq.ijswap) p(i,j,k) = something
end do

...pretty elegant and rather concise, if you ask me. What is your opinion about this construct? do you think of it as clear or obfuscated?

Here it is after adding your assignment:

Code:
do ijswap=0,1
    forall(k=2:n-2, j=2:m-2, i=2:l-2, mod(i+j+k,2).eq.ijswap) &
        p(i,j,k) = (1.e0-omega)*p(i,j,k) + &
                   omega*ijkxyz(i,j,k)*(- (p(i+1,j,k)*xpd(i,j,k) + p(i-1,j,k)*xpd(i-1,j,k))*dxi(i)  &
                                        - (p(i,j+1,k)*ypd(i,j,k) + p(i,j-1,k)*ypd(i,j-1,k))*dyi(j)  &
                                        - (p(i,j,k+1)*zpd(i,j,k) + p(i,j,k-1)*zpd(i,j,k-1))*dzi(k)  &
                                        + rs(i,j,k))
end do

Granted, this puts the call to the MOD function back into being evaluated (i*j*k) times; but I wonder if there is an alternate benefit to using FORALL. As it often happens, payback does not kick in until a problem is large enough; by the way, in your problem, what are the values for l,m,n?

If you would care to give it a try, I would like to hear your thoughts and test (timing) results.

Cheers,

gsal
 
The forall construct appears to consume ~5% more CPU time than the 'splitted IF'. For this case, (l,m,n) = (105,105,43). Next i will compare both versions for a configuration with (l,m,n) = (750,500,23) which rewuired ~7 hours CPU time without any tuning of the program.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top