Increasing double loop efficiency 1

Vahid9 · May 22, 2023

I have the following double loop;

Code:

DO i=1,100
  DO j=1,100
    S(i,j)= S(i,j)+ALPHA*exp[(real(i)/b1)**2]*exp[(real(j)/b2)**2]
  ENDDO
ENDDO

Here, S is a 100x100 symmetric REAL array, ALPHA, b1, and b2 are REAL constants.

I am hoping to incorporate LAPACK dgemm or some other library function to increase the efficiency of this calculation. However, dgemm( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA,B, LDB, BETA, C, LDC ) needs A and B arrays as input. I could define A as the first exponential using a DO loop and B as the second exponential using another DO loop but this approach may be less efficient.

What would be the most efficient way to calculate the above double loop in fortran?

Thanks,
Vahid

mikrom · May 22, 2023

In your case you compute
[pre]S = 1*S + ALPHA*A*B[/pre]
so the call
[pre]DGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC )[/pre]
seems to be
[pre]DGEMM('N', 'N', 100, 100, 100, ALPHA, A, 100, B, 100, 1, S, 100)[/pre]

where
S is your 100x100 matrix

A is 100x100 matrix of this form, which only has the first column non-zero, otherwise all columns are zeros:
[pre]
| exp(( 1/b1)**2) 0 0 0 0 ... 0 |
| exp(( 2/b1)**2) 0 0 0 0 ... 0 |
| ..................................... |
| exp((100/b1)**2) 0 0 0 0 ... 0 |
[/pre]

B is 100x100 matrix of this form, which only has the first row non-zero, otherwise all rows are zeros:
[pre]
| exp((1/b2)**2) exp((2/b2)**2) ... exp((100/b2)**2) |
| 0 0 ... 0 |
| ....................................................|
| 0 0 ... 0 |
[/pre]

Vahid9 · May 22, 2023

Thanks mikrom for your response. Out of curiosity, wouldn't it be more efficient to define A as a column matrix, B as a row matrix and set K=1? Wouldn't this involve fewer calculations?

Vahid

mikrom · May 22, 2023

Yes, of course (i didn't even think of that)

A is 100x1 matrix of this form:
[pre]
| exp(( 1/b1)**2) |
| exp(( 2/b1)**2) |
| .................|
| exp((100/b1)**2) |
[/pre]

B is 1x100 matrix of this form:
[pre]
| exp((1/b2)**2) exp((2/b2)**2) ... exp((100/b2)**2) |
[/pre]
Then the call seems to be
[pre]DGEMM('N', 'N', 100, 100, 1, ALPHA, A, 100, B, 1, 1, S, 100)[/pre]

mikrom · May 22, 2023

I was thinking about, how to use DGEMM for your case.
But now when I think about it again, I have doubt, that DGEMM is faster for your simple case, than your simple 2 loops.
Look at the source o DGEMM how many loops it has:

https://netlib.org/lapack/explore-html/d7/d2b/dgemm_8f_source.html

Maybe DGEMM is efficient for more complicated cases ...

Vahid9 · May 22, 2023

I will try both cases using 1) the two do loops, and 2) DGEMM, to see which is faster. It may well be that they are similar in speed.

Thanks,
Vahid

mikrom · May 23, 2023

Yes try it and let me know

Vahid9 · May 23, 2023

I replaced the double loop with DGEMM in my code. The DGEMM is run inside three other loops and the whole code is run in parallel across 4 nodes of 64 cores each.

When using the double DO loop, the total run time is 7h19m. With DGEMM replacing the double loop, the runtime is 3h34m, a significant speed up.

Thanks mikrom for all your help.

Cheers,
Vahid

mikrom · May 24, 2023

Vahid - good job with an amazing result !

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Increasing double loop efficiency 1

Vahid9

Technical User

mikrom

Programmer

Vahid9

Technical User

mikrom

Programmer

mikrom

Programmer

Vahid9

Technical User

mikrom

Programmer

Vahid9

Technical User

mikrom

Programmer

Similar threads

Part and Inventory Search

Sponsor