How to parallelize a counter?

lesaadmi · Apr 13, 2017

Hello everybody,

I have the following piece of fortran code:

cont=0
do kk=1,ncz
do jj=1,ncy
do ii=1,ncx
cont=cont+1
grdtgrvmsft(cont)=grdtgrvmsft(cont)+(derdensi(jj,ii,kk)*(1.0d0/(gravstd**2.0d0))*misfit)
enddo
enddo
enddo

I have been trying to parallelize it but I have not been able to. So far I have done as follows but without good results:

cont=0
!$OMP PARALLEL PRIVATE(ii,jj,kk,cont), &
!$OMP SHARED(derdensi)
!$OMP DO REDUCTION (+:grdtgrvmsft)
do kk=1,ncz
do jj=1,ncy
do ii=1,ncx
!$OMP ATOMIC UPDATE
cont=cont+1
grdtgrvmsft(cont)=grdtgrvmsft(cont)+(derdensi(jj,ii,kk)*(1.0d0/(gravstd)**2.0d0))*misfit)
enddo
enddo
enddo
!$OMP END DO
!$OMP END PARALLEL

Does anyone have an advice how to correct this?

salgerman · Apr 13, 2017

I think what you need to do is calculate the "counter" as a function of the loop variables so that it can be calculated any time independent of the order in which the looks are carried out (in parallel). Something like:

cont = ii + ncx*(jj-1) + ncx*ncy*(kk-1)

mitch grunes · May 21, 2017

Sigh - this would be so easy in APL.

How about an implied do loop array)

(/(i, L=1, n)/)

Then RESHAPE it...

If the compiler is good enough, it would optimize that with an array operation.

I'll let you figure out the details.

It's even possible that the most modern FORTRANS already have a single function that does it. I haven't kept up.

Do not use loops if you want to parallize it - the compiler might not be that smart.

mitch grunes · May 21, 2017

One minor note: If you can, you may want to re-order the subscripts of your arrays, so that the first two subscripts are reversed, because the implied do loop would put things in the wrong order for what you want. In particular, storage order increments the first index first (i.e., incrementing the first index takes you to the next location in memory, which is where that implied do loop would take the next index, then the second subscript, then the third.

In addition, memory caching works best if you work in storage order - i.e. where possible the innermost loop should increment the first index, and the outermost loop should increment that last index. And many CPUs can execute multiple operations at once if you work in storage order. So that improves efficiency even without parallel execution - though you could have gotten that just by switching the do loop order...

Hope that helps.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to parallelize a counter?

lesaadmi

Technical User

salgerman

Programmer

mitch grunes

Programmer

mitch grunes

Programmer

Similar threads

Part and Inventory Search

Sponsor