Run time error 2

armeros · Aug 30, 2014

Hi guys,

Perhaps I need some help again. My program can compile but it generate run time error.
The message is

"C:\Console1\Console1\Debug\Console1.exe is not a valid Win32 application"

When I reduce the number of elements in arrays, it said

What is the bug I should look for ?

Thanks a lot.

salgerman · Aug 31, 2014

Check this out...go to the pages from the various bullets...first, you need to learn to compile for using OpenMP...you need compiler flags and you need to add "use openmp_lib" statement inside the fortran source THEN, of course the directives surrounding the do loop to be parallelized.

mikrom · Sep 2, 2014

salgerman, thank you very much for the information about OpenMP.
I never tried such thing before before, but now I'm interested now.

I have gfortran 4.8.1 installed with MinGW+MSYS on Windows.
First when I tried the switch -fopenmp I got an error. Then I added to my MinGW+MSYS installation the package mingw32-pthreads-w32 and now it compiles.

example: armeros.f95

Code:

[COLOR=#0000ff]! Compile with OpenMP:[/color]
[COLOR=#0000ff]! gfortran armeros.f95 -o armeros -fopenmp[/color]
[COLOR=#a020f0]program[/color] test
[COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
[COLOR=#2e8b57][b]integer[/b][/color] :: i,j,ii,k,l,threads,thread_id
[COLOR=#2e8b57][b]integer[/b][/color], [COLOR=#2e8b57][b]external[/b][/color] :: omp_get_num_threads, omp_get_thread_num
[COLOR=#804040][b]do[/b][/color] i[COLOR=#804040][b]=[/b][/color][COLOR=#ff00ff]1[/color],[COLOR=#ff00ff]250[/color]
  [COLOR=#804040][b]do[/b][/color] j [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color],[COLOR=#ff00ff]250[/color]
    [COLOR=#0000ff]!$OMP PARALLEL[/color]
    threads [COLOR=#804040][b]=[/b][/color] omp_get_num_threads()
    thread_id [COLOR=#804040][b]=[/b][/color] omp_get_thread_num() [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
    [COLOR=#804040][b]do[/b][/color] ii[COLOR=#804040][b]=[/b][/color][COLOR=#ff00ff]1[/color],[COLOR=#ff00ff]2[/color]
      [COLOR=#804040][b]do[/b][/color] k [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color],[COLOR=#ff00ff]250[/color]
        [COLOR=#804040][b]do[/b][/color] l [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color],[COLOR=#ff00ff]250[/color]
          [COLOR=#008080]call[/color] do_something(i,j,ii,k,l,thread_id,threads)
        [COLOR=#804040][b]end do[/b][/color]
      [COLOR=#804040][b]end do[/b][/color]
    [COLOR=#804040][b]end do[/b][/color]
    [COLOR=#0000ff]!$OMP END PARALLEL[/color]
  [COLOR=#804040][b]end do[/b][/color]
[COLOR=#804040][b]end do[/b][/color]
[COLOR=#a020f0]end program[/color] test

[COLOR=#a020f0]subroutine[/color] do_something(i,j,ii,k,l,thread_id,threads)
  [COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
  [COLOR=#2e8b57][b]integer[/b][/color] :: i,j,ii,k,l,thread_id,threads

  [COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#ff00ff]10[/color]) thread_id, threads,i,j,ii,k,l 
  [COLOR=#6a5acd]10[/color] [COLOR=#804040][b]format[/b][/color](i2,[COLOR=#ff00ff]'/'[/color],i2,[COLOR=#ff00ff]': (i,j,ii,k,l) = ('[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]')'[/color]) 
[COLOR=#a020f0]end subroutine[/color] do_something

Compilation & Running:

Code:

$ gfortran armeros.f95 -o armeros -fopenmp
$ armeros
 2/ 2: (i,j,ii,k,l) = (  1,  1,  1,  1,  1)
 2/ 2: (i,j,ii,k,l) = (  1,  1,  1,  1,  2)
 2/ 2: (i,j,ii,k,l) = (  1,  1,  1,  1,  3)
 ...
 ...

I have realized that number of threads returned from the function omp_get_num_threads() depends of the machine. At my work I got 8 threads and here (see above) at my hhome machine only 2.
The usage of threads seems to be very different too: At my work machine I saw 2 threads (1 and 8) and here only one thread (i.e.2( seems to be used.

Now, the question is how could be the task of armeros parallelized properly.

mikrom · Sep 2, 2014

The usage of threads seems to be very different too: At my work machine I saw 2 threads (1 and 8) and here only one thread (i.e.2) seems to be used.

Wrong! After a while - now at my home machine i see that the thead 1 will be used too:

Code:

...
 1/ 2: (i,j,ii,k,l) = (  1,  2,  2, 12, 72)
 1/ 2: (i,j,ii,k,l) = (  1,  2,  2, 21,196)
 1/ 2: (i,j,ii,k,l) = (  1,  2,  2, 12, 73)
 1/ 2: (i,j,ii,k,l) = (  1,  2,  2, 21,197)
...

salgerman · Sep 2, 2014

mikrom: say, in your office computer, can you try specifying one more argument to the PARALLEL directive? Let's try to purposely increase the number of threads being used.

Code:

    !$omp parallel do NUM_THREADS(4)
    .
    .
    .
    !$omp end parallel do

Additionally, there may be need to specify which variables are private to the thread (mostly loop indices) and which variables are shared among threads (to be sure, the one variable to which all threads are contributing to).

Code:

    !$omp parallel do NUM_THREADS(4)
    !$omp     private(ii,k,l) &
    !$omp     shared(A)
    .
    .
    .
    !$omp end parallel do

armeros · Sep 2, 2014

Do we need more than one single computer (or single processor) to do this ?

salgerman · Sep 2, 2014

I don't know about multiple computers; what I have done before with these few directives is to take advantage of more than one core on a single computer...I know my Linux box at the office has 4 cores; this laptop seems to have two dual-core processors. You can easily inspect your computer by opening "my computer" properties or open the "task manager" and go to "performance" tab.

mikrom · Sep 3, 2014

I took this simple example with 3 loops

Code:

[COLOR=#0000ff]! NON PARALLEL example[/color]
[COLOR=#0000ff]! Compile:[/color]
[COLOR=#0000ff]! gfortran armeros_n.f95 -o armeros_n[/color]
[COLOR=#a020f0]program[/color] test
[COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
[COLOR=#2e8b57][b]integer[/b][/color] :: i,j, k, thread_id, threads, n
[COLOR=#2e8b57][b]real[/b][/color] :: start, finish

[COLOR=#008080]call[/color] [COLOR=#008080]cpu_time[/color](start)
thread_id [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color]
threads [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color]
n [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]0[/color]
[COLOR=#804040][b]do[/b][/color] i[COLOR=#804040][b]=[/b][/color][COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]4[/color]
  [COLOR=#804040][b]do[/b][/color] j [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]5[/color]
    [COLOR=#804040][b]do[/b][/color] k [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]5[/color]
      [COLOR=#008080]call[/color] do_something(i, j, k, thread_id, threads)
      n [COLOR=#804040][b]=[/b][/color] n [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
    [COLOR=#804040][b]end do[/b][/color]
  [COLOR=#804040][b]end do[/b][/color]
[COLOR=#804040][b]end do[/b][/color]
[COLOR=#008080]call[/color] [COLOR=#008080]cpu_time[/color](finish)

[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color])
[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color]) [COLOR=#ff00ff]'Count = '[/color], n
[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color]) [COLOR=#ff00ff]'Time in seconds ='[/color], finish[COLOR=#804040][b]-[/b][/color]start

[COLOR=#a020f0]end program[/color] test

[COLOR=#a020f0]subroutine[/color] do_something(i, j, k, thread_id,threads)
  [COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
  [COLOR=#2e8b57][b]integer[/b][/color] :: i, j, k, thread_id, threads

  [COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#ff00ff]10[/color]) thread_id, threads, i, j, k 
  [COLOR=#6a5acd]10[/color] [COLOR=#804040][b]format[/b][/color](i2,[COLOR=#ff00ff]'/'[/color],i2,[COLOR=#ff00ff]': (i,j,k) = ('[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]')'[/color]) 
[COLOR=#a020f0]end subroutine[/color] do_something

and tried to parallelize the outer loop

Code:

do i=1, 4
 ...
end do

using 4 threads - so:

Code:

[COLOR=#0000ff]! PARALLEL example[/color]
[COLOR=#0000ff]! Compile with OpenMP:[/color]
[COLOR=#0000ff]! gfortran armeros_p.f95 -o armeros_p -fopenmp[/color]
[COLOR=#a020f0]program[/color] test
[COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
[COLOR=#2e8b57][b]integer[/b][/color] :: i,j,k, threads,thread_id, n
[COLOR=#2e8b57][b]integer[/b][/color], [COLOR=#2e8b57][b]external[/b][/color] :: omp_get_num_threads, omp_get_thread_num
[COLOR=#2e8b57][b]real[/b][/color] :: start, finish

[COLOR=#008080]call[/color] [COLOR=#008080]cpu_time[/color](start)
n [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]0[/color]
[COLOR=#0000ff]!$omp parallel NUM_THREADS(4) PRIVATE(thread_id, i, j, k) [/color]
threads [COLOR=#804040][b]=[/b][/color] omp_get_num_threads()
thread_id [COLOR=#804040][b]=[/b][/color] omp_get_thread_num() [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
[COLOR=#0000ff]!$omp parallel do[/color]
[COLOR=#804040][b]do[/b][/color] j [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]5[/color]
  [COLOR=#804040][b]do[/b][/color] k [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]5[/color]
    i [COLOR=#804040][b]=[/b][/color] thread_id
    [COLOR=#008080]call[/color] do_something(i, j, k, thread_id, threads)
    n [COLOR=#804040][b]=[/b][/color] n [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
  [COLOR=#804040][b]end do[/b][/color]
[COLOR=#804040][b]end do[/b][/color]
[COLOR=#0000ff]!$omp end parallel do[/color]
[COLOR=#0000ff]!$omp end parallel[/color]
[COLOR=#008080]call[/color] [COLOR=#008080]cpu_time[/color](finish)

[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color])
[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color]) [COLOR=#ff00ff]'Count = '[/color], n
[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color]) [COLOR=#ff00ff]'Time in seconds ='[/color], finish[COLOR=#804040][b]-[/b][/color]start

[COLOR=#a020f0]end program[/color] test

[COLOR=#a020f0]subroutine[/color] do_something(i, j, k, thread_id,threads)
  [COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
  [COLOR=#2e8b57][b]integer[/b][/color] :: i, j, k, thread_id, threads

  [COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#ff00ff]10[/color]) thread_id, threads, i, j, k 
  [COLOR=#6a5acd]10[/color] [COLOR=#804040][b]format[/b][/color](i2,[COLOR=#ff00ff]'/'[/color],i2,[COLOR=#ff00ff]': (i,j,k) = ('[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]')'[/color]) 
[COLOR=#a020f0]end subroutine[/color] do_something

Output:

Code:

$ gfortran armeros_p.f95 -o armeros_p -fopenmp
$ armeros_p
 2/ 4: (i,j,k) = (  2,  1,  1)
 1/ 4: (i,j,k) = (  1,  1,  1)
 3/ 4: (i,j,k) = (  3,  1,  1)
 4/ 4: (i,j,k) = (  4,  1,  1)
...
...
2/ 4: (i,j,k) = (  2,  5,  5)
1/ 4: (i,j,k) = (  1,  5,  5)
3/ 4: (i,j,k) = (  3,  5,  5)
4/ 4: (i,j,k) = (  4,  5,  5)

Count =          100
Time in seconds =   0.00000000

It seems to work, but I don't see any CPU-time improvement between normal and parallelized version for bigger valuess of J and K.

For example for max J,K = 250 I got with normal version better CPU-time than with the parallelized version

Code:

...
 1/ 1: (i,j,k) = (  4,250,249)
 1/ 1: (i,j,k) = (  4,250,250)

 Count =       250000
 Time in seconds =  0.936005950

and with parallelized version

Code:

...
 3/ 4: (i,j,k) = (  3,250,250)
 2/ 4: (i,j,k) = (  2,250,250)
 1/ 4: (i,j,k) = (  1,250,249)
 4/ 4: (i,j,k) = (  4,250,250)
 1/ 4: (i,j,k) = (  1,250,250)

 Count =       250000
 Time in seconds =   1.68481004

What is wrong ?

salgerman · Sep 3, 2014

Nothing is wrong...there is some overhead when creating and managing threads, so, the payback is not immediate...you would need to do away with your write statements give your do_something subroutine something costly to do, like calculating the sine of an angle or logarithm of something...then, keep increasing the number of times it is done...eventually, you will notice the ratio of the times it take to solve of 1cpu vs. Xcpu starts to increase, here is a sample picture.

Having said said that, I think you are using the "$omp parallel do" directive incorrectly. Consider the following:

[ul]
[li]A program structure is not supposed to be changed when adding OpenMP directives...you changed it, you took away the outer-most loop.[/li]
[li]A program with OpenMP directives (behind in-line comment markers) is supposed to work the same whether the compiler understood the directives or not.[/li]
[li]In short, you do NOT put a "$omp parallel do" INSTEAD of an actual loop; you use the directive to wrap the loop that you desire to parallelize.[/li]
[/ul]

In your case, you took away the outer loop and the associated loop variable i...what if the value of i was actually being used in some formula and affecting the calculated values? If the program runs in one thread, i will always be 1; if the program runs in 4 threads, the values will be 1, 2, 3, 4....but that is besides the point...what if the i-loop was supposed to go from 1 to 5000? You are not going to get 5000 threads and expect to assign i the value of the current thread...so, you leave the loop alone and simply surround it with omp directives and let OpenMP break down the range of the loop variable values into as many threads it can...but the values to the loop variable are assigned from its range, not from whatever thread it is in...those are two different things.

mikrom · Sep 3, 2014

Thank you salgerman.

mikrom · Sep 4, 2014

I tried the example without changing outer loop - only by adding OpenMP directives (on a machine with CPU Intel i7):

Code:

[COLOR=#0000ff]! PARALLEL example[/color]
[COLOR=#0000ff]! Compile with OpenMP:[/color]
[COLOR=#0000ff]! gfortran armeros_p.f95 -o armeros_p -fopenmp[/color]
[COLOR=#a020f0]program[/color] test
[COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
[COLOR=#2e8b57][b]integer[/b][/color] :: i,j,k, thread_id, threads, n
[COLOR=#2e8b57][b]integer[/b][/color], [COLOR=#2e8b57][b]external[/b][/color] :: omp_get_num_threads, omp_get_thread_num
[COLOR=#2e8b57][b]real[/b][/color] :: start, finish

[COLOR=#008080]call[/color] [COLOR=#008080]cpu_time[/color](start)
n [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]0[/color]
[COLOR=#0000ff]!$omp parallel do NUM_THREADS(4) PRIVATE(i, j, k, thread_id, threads)[/color]
[COLOR=#804040][b]do[/b][/color] i[COLOR=#804040][b]=[/b][/color][COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]4[/color]
  [COLOR=#804040][b]do[/b][/color] j [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]5[/color]
    [COLOR=#804040][b]do[/b][/color] k [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]1[/color], [COLOR=#ff00ff]5[/color]
      threads [COLOR=#804040][b]=[/b][/color] omp_get_num_threads()
      thread_id [COLOR=#804040][b]=[/b][/color] omp_get_thread_num() [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
      [COLOR=#008080]call[/color] do_something(i, j, k, thread_id, threads)
      n [COLOR=#804040][b]=[/b][/color] n [COLOR=#804040][b]+[/b][/color] [COLOR=#ff00ff]1[/color]
    [COLOR=#804040][b]end do[/b][/color]
  [COLOR=#804040][b]end do[/b][/color]
[COLOR=#804040][b]end do[/b][/color]
[COLOR=#0000ff]!$omp end parallel do[/color]
[COLOR=#008080]call[/color] [COLOR=#008080]cpu_time[/color](finish)

[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color])
[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color]) [COLOR=#ff00ff]'Count = '[/color], n
[COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#804040][b]*[/b][/color]) [COLOR=#ff00ff]'Time in seconds ='[/color], finish[COLOR=#804040][b]-[/b][/color]start

[COLOR=#a020f0]end program[/color] test

[COLOR=#a020f0]subroutine[/color] do_something(i, j, k, thread_id,threads)
  [COLOR=#2e8b57][b]implicit[/b][/color] [COLOR=#2e8b57][b]none[/b][/color]
  [COLOR=#2e8b57][b]integer[/b][/color] :: i, j, k, thread_id, threads

  [COLOR=#804040][b]write[/b][/color]([COLOR=#804040][b]*[/b][/color],[COLOR=#ff00ff]10[/color]) thread_id, threads, i, j, k 
  [COLOR=#6a5acd]10[/color] [COLOR=#804040][b]format[/b][/color](i2,[COLOR=#ff00ff]'/'[/color],i2,[COLOR=#ff00ff]': (i,j,k) = ('[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]','[/color],i3,[COLOR=#ff00ff]')'[/color]) 
[COLOR=#a020f0]end subroutine[/color] do_something

It seems to process the outer loop in 4 threads:

Code:

$ gfortran armeros_p.f95 -o armeros_p -fopenmp

$ armeros_p
 2/ 4: (i,j,k) = (  2,  1,  1)
 3/ 4: (i,j,k) = (  3,  1,  1)
 1/ 4: (i,j,k) = (  1,  1,  1)
 4/ 4: (i,j,k) = (  4,  1,  1)
 2/ 4: (i,j,k) = (  2,  1,  2)
 ...
 ...
 4/ 4: (i,j,k) = (  4,  5,  4)
 2/ 4: (i,j,k) = (  2,  5,  5)
 3/ 4: (i,j,k) = (  3,  5,  5)
 1/ 4: (i,j,k) = (  1,  5,  5)
 4/ 4: (i,j,k) = (  4,  5,  5)

 Count =          100
 Time in seconds =   0.00000000

When I delete the NUM_THREADS(4) clause then the function omp_get_num_threads() returns 8 threads but only 4 seems to be used in processing:

Code:

$ armeros_p
 2/ 8: (i,j,k) = (  2,  1,  1)
 3/ 8: (i,j,k) = (  3,  1,  1)
 1/ 8: (i,j,k) = (  1,  1,  1)
 4/ 8: (i,j,k) = (  4,  1,  1)
 2/ 8: (i,j,k) = (  2,  1,  2)
 ...
 ...
 4/ 8: (i,j,k) = (  4,  5,  4)
 2/ 8: (i,j,k) = (  2,  5,  5)
 3/ 8: (i,j,k) = (  3,  5,  5)
 1/ 8: (i,j,k) = (  1,  5,  5)
 4/ 8: (i,j,k) = (  4,  5,  5)
 ...

When I changed the outer loop up to i = 8, all 8 threads were used, so it seems that OpenMP specifies the number of threads to be used automatically.

Further, I replaced write statement in the subroutine with simple computation, removed from the code the calls of OMP-functions omp_get_num_threads() and omp_get_thread_num(). Then I tried to compare the performance for bigger loops (i=1..8; j=1..50000; k=1..50000) but I didn't notice any performance improvement in parallelized version.
But I believe that there is a performance improvement for specific big cases. My goal was only to try how to use OpenMP.

salgerman · Sep 4, 2014

Cool.

I wonder how performance improves if, instead, you actually increase the range of the outer-most loop

Code:

!$omp parallel do NUM_THREADS(4) PRIVATE(i, j, k, thread_id, threads)do i=1, 8
do i=1, 100000
  do j = 1, 50
    do k = 1, 50
      threads = omp_get_num_threads()
      thread_id = omp_get_thread_num() + 1
      call do_something(i, j, k, thread_id, threads)
      n = n + 1
    end do
  end do
end do
!$omp end parallel do

We may need to read up more details about the "parallel" directives, how exactly they work and how to better enhance program performance...I think things depend on where you place the directive. For example:

Code:

!$omp parallel do NUM_THREADS(4) PRIVATE(i, j, k, thread_id, threads)do i=1, 8
do i = 1, 8
  do j = 1, 50000
    do k = 1, 50000
      threads = omp_get_num_threads()
      thread_id = omp_get_thread_num() + 1
      call do_something(i, j, k, thread_id, threads)
      n = n + 1
    end do
  end do
end do
!$omp end parallel do

versus

Code:

do i=1, 8
  do j = 1, 50000
!$omp parallel do NUM_THREADS(4) PRIVATE(i, j, k, thread_id, threads)
    do k = 1, 50000
      threads = omp_get_num_threads()
      thread_id = omp_get_thread_num() + 1
      call do_something(i, j, k, thread_id, threads)
      n = n + 1
    end do
!$omp end parallel do
  end do
end do

Anyway, mikrom, sounds we (mostly you) are done with this thread. Thanks for all the reporting.

salgerman · Sep 4, 2014

mikrom...two more things

The one time when I started to use OpenMP, I found that I had to limit the number of threads because if I didn't, the program would tend to use all processors and make the machine unresponsive until the program was done...so, it may be best to limit it...I think you can request the number of processors available, take away one or two and use the rest...

Lastly...how do you make the code to show up syntax-highlighted?

mikrom · Sep 4, 2014

Hi salgerman,

I guess that to use OpenMP efficiently a profitable will require intensive study of this matter.
But as i wrote above, my goal was just to see this thing a little bit closer
I see that this discussion is beyond the topic of this thread.

... how do you make the code to show up syntax-highlighted?

1. Before posting the source, I'm using "Convert to HTML" on the Syntax menu of my editor vim.
It creates a new file, which contains the HTML equivalent of the syntax highlighted source.

2. Then I'm running this script for converting HTML into TGML

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Run time error 2

armeros

Programmer

salgerman

Programmer

mikrom

Programmer

mikrom

Programmer

salgerman

Programmer

armeros

Programmer

salgerman

Programmer

mikrom

Programmer

salgerman

Programmer

mikrom

Programmer

mikrom

Programmer

salgerman

Programmer

salgerman

Programmer

mikrom

Programmer

Similar threads

Part and Inventory Search

Sponsor