Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to parallelize a counter?

Status
Not open for further replies.

lesaadmi

Technical User
Aug 29, 2012
11
MX
Hello everybody,

I have the following piece of fortran code:

cont=0
do kk=1,ncz
do jj=1,ncy
do ii=1,ncx
cont=cont+1
grdtgrvmsft(cont)=grdtgrvmsft(cont)+(derdensi(jj,ii,kk)*(1.0d0/(gravstd**2.0d0))*misfit)
enddo
enddo
enddo

I have been trying to parallelize it but I have not been able to. So far I have done as follows but without good results:

cont=0
!$OMP PARALLEL PRIVATE(ii,jj,kk,cont), &
!$OMP SHARED(derdensi)
!$OMP DO REDUCTION (+:grdtgrvmsft)
do kk=1,ncz
do jj=1,ncy
do ii=1,ncx
!$OMP ATOMIC UPDATE
cont=cont+1
grdtgrvmsft(cont)=grdtgrvmsft(cont)+(derdensi(jj,ii,kk)*(1.0d0/(gravstd)**2.0d0))*misfit)
enddo
enddo
enddo
!$OMP END DO
!$OMP END PARALLEL

Does anyone have an advice how to correct this?
 
I think what you need to do is calculate the "counter" as a function of the loop variables so that it can be calculated any time independent of the order in which the looks are carried out (in parallel). Something like:

cont = ii + ncx*(jj-1) + ncx*ncy*(kk-1)

 
Sigh - this would be so easy in APL.

How about an implied do loop array)

(/(i, L=1, n)/)

Then RESHAPE it...

If the compiler is good enough, it would optimize that with an array operation.

I'll let you figure out the details.

It's even possible that the most modern FORTRANS already have a single function that does it. I haven't kept up.

Do not use loops if you want to parallize it - the compiler might not be that smart.
 
One minor note: If you can, you may want to re-order the subscripts of your arrays, so that the first two subscripts are reversed, because the implied do loop would put things in the wrong order for what you want. In particular, storage order increments the first index first (i.e., incrementing the first index takes you to the next location in memory, which is where that implied do loop would take the next index, then the second subscript, then the third.

In addition, memory caching works best if you work in storage order - i.e. where possible the innermost loop should increment the first index, and the outermost loop should increment that last index. And many CPUs can execute multiple operations at once if you work in storage order. So that improves efficiency even without parallel execution - though you could have gotten that just by switching the do loop order...

Hope that helps.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top