There are two methods for intra-node parallelization (shared memory parallelization) , "Automatic Parallelization" and "OpenMP".

 

Automatic Parallelization

    The compiler analyzes the source code and automatically creates modules for intra-node parallel execution that are suitable for the structure of the program. Parallelization can be done easily by specifying only compile options.

     

    Basic use

      Basically, you only need to compile it to use it.
       

      How to compile

      You can Intel compiler on SQUID. Please note that commands are different for each programming language.

      $ module load BaseCPU

      $ ifort -parallel [options] source_file (Fortran)
      $ icc -parallel [options] source_file (C)
      $ icpc -parallel [options] source_file (C++)

      If you want to output the parallelization report, please add "-qopt-report=[n] -qopt-report-pahse=par" option. You can specify the message level from 0 to 5 for [n].
       

      Writting job script

      At runtime, the number of parallelism is specified in the environment variable named OMP_NUM_THREADS. The following is an example of a script that executes 76 parallel computations in a node with an elapsed time of 1 hour.

       

      Note

      Please do not forget to specify "OMP_NUM_THREADS" in the execution script. If you don't specify "OMP_NUM_THREADS" or you specify a wrong value, it will be executed with an unintended parallel number.

     

    Reference

OpenMP

    This is a method of intra-node parallelization (shared memory parallelization ) by inserting a line of instructions to the compiler in the program. The description method is slightly different in C and Fortran. A simple method of use is described below.
     

    Basic use

      The OpenMP program written in the following Fortran is explained with reference.

       

      The highlighted lines (lines 3-6) are the process related to OpenMP. include "omp_lib.h"[line 3], omp_get_thread_num()[line 5], and omp_get_num_threads()[line 5] are inserted for Print output of OpenMP processing. Even if you don't input them, the movement as a parallel processing doesn't change.

       

      Line 3: include "omp_lib.h"
      specifies the include file for the OpenMP library routines. This is required to use the library routines described below.

       

      Line 4: !$omp parallel
      and
      Line 6: !$omp end parallel
      is OpenMP directive. The parallel directive line indicates that the block is to be parallelized until the end parallel directive line. In this case, the 5th line will be parallelized. The number of parallelism is specified by the environment variable OMP_NUM_THREADS.
       

      Line 5 omp_get_thread_num and omp_get_num_threads is OpenMP library routine.
      If you want to use these routines, you need to specify the include file on the second line. omp_get_thread_num gets the number of processes divided (parallel threads). omp_get_num_threads gets a number for each divided process (parallel thread). The number starts from 0.

       

      The result of running the above program in 4 parallel is shown below.

      HELLO WORLD mythread= 3 ( 4 threads)
      HELLO WORLD mythread= 1 ( 4 threads)
      HELLO WORLD mythread= 0 ( 4 threads)
      HELLO WORLD mythread= 2 ( 4 threads)

       

      About directive lines

      Unlike automatic parallelization, users themselves need to insert "directive lines" to instruct the parallelization. The directive line differs depending on the language,

      ! $omp directive line name (in the case of Fortran)
      #pragma omp directive line name (in the case of C/C++)

      .
      About the instruction line, please see the reference material.

       

      How to compile

      $module load BaseCPU

      $ ifort -qopenmp [options] source_file
      $ icc -qopenmp [options] source_file
      $ icpc -qopenmp [options] source_file

      If you want to output the parallelization report, please add the option "-qopt-report=[n] -qopt-report-pahse=openmp". You can specify the message level from 0 to 2 for [n].
       

      Writting job script

      At runtime, the number of parallelism is specified in the environment variable OMP_NUM_THREADS. The following is an example of a script that executes 76 parallel computations in a node with an elapsed time of 1 hour.

       

      Note

      Please do not forget to specify "OMP_NUM_THREADS" in the execution script. If you don't specify "OMP_NUM_THREADS" or you specify a wrong value, it will be executed with an unintended parallel number.

     

    Reference