There are two ways to perform intra-node parallel (shared memory parallel) processing: "Automatic parallelization" and "OpenMP".

 

Automatic parallelization

    The compiler analyzes the source code and automatically creates modules for intra-node parallel execution that are suitable for the structure of the program. Parallelization can be done easily by specifying only compile options.

     

    Basic use

      Basically, you only need to compile your program to use Automatic parallelization.
       

      Compile

      You can use NEC compiler on SQUID vector nodes. Please note that the commands are different for each programming language.

      $ module load BaseVEC

      $ nfort -mparallel [options] source_file (FORTRAN)
      $ ncc -mparallel [options] source_file (C)
      $ nc++ -mparallel [options] source_file (C++)

      With "-report-all" option, the compiler output compilation messages, code generation list, diagnostic list, format list, inline list, option list.

       

      Job script

      You have to specify the number of parallel as "OMP_NUM_THREADS", environmental value, on your jobscropt. The following is an example of job script to execute 10 parallel computations in a vector engine (SX-Aurora TSUBASA) with an elapsed time of 1 hour.

       

      Note

      If "OMP_NUM_THREADS" is not specified, or if the wrong value is specified, the program will be executed in an unintended parallel number, which may affect the calculations of other users. Please be careful.

     

    Reference

OpenMP

    This is a method of intra-node parallelization (shared memory parallel processing) by inserting a line of instructions to the compiler in the program. The description method is slightly different in C and Fortran. The simple usage is described below.
     

    Basic use

      The following OpenMP program written in FORTRAN is explained with reference.

       

      The highlighted lines (lines 3-6) are the OpenMP related processing.
      We inserted to print out the OpenMP process on include "omp_lib.h"[line 3], omp_get_thread_num()[line 5], omp_get_num_threads()[line 5] . Even if you do not insert these, parallelization is same.
       
      We specify include file for OpenMP library routine on line 3: include "omp_lib.h".
       

      !$omp parallel[line 4],!$omp end parallel[line6] is a OpenMP derective.
      The parallel directive line indicates that the block is to be parallelized until the end parallel directive line.
      In this case, the 5th line will be processed in parallel. The number of parallelisms is specified by the environment variable OMP_NUM_THREADS.

       

      Line 5, omp_get_thread_num and omp_get_num_threads are OpenMP library routines. If you want to use these routines, you need to specify the include file in line 2. omp_get_thread_num gets the number of processes divided (the number of parallel threads). omp_get_num_threads gets the number of each divided process (parallel thread). The number starts from 0.

       

      The result of running the above program in 4 parallel is shown below.

      HELLO WORLD mythread= 3 ( 4 threads)
      HELLO WORLD mythread= 1 ( 4 threads)
      HELLO WORLD mythread= 0 ( 4 threads)
      HELLO WORLD mythread= 2 ( 4 threads)

       

      Derective

      Unlike auto-parallelization, users themselves must insert "directive lines" to instruct the parallelization. The directive line differs depending on the language,

      ! $omp directive line name (in the case of Fortran)
      #pragma omp directive line name (in the case of C/C++)

      About the instruction line, please see the reference material.
       

      Compile

      $module load BaseVEC

      $ nfort -fopenmp [options] source_file
      $ ncc -fopenmp [options] source_file
      $ nc++ -fopenmp [options] source_file

      With "-report-all" option, the compiler output compilation messages, code generation list, diagnostic list, format list, inline list, option list.

       

      Job script

      You have to specify the number of parallel as "OMP_NUM_THREADS", environmental value, on your jobscropt. The following is an example of job script to execute 10 parallel computations in a vector engine (SX-Aurora TSUBASA) with an elapsed time of 1 hour.

      Note

      If "OMP_NUM_THREADS" is not specified, or if the wrong value is specified, the program will be executed in an unintended parallel number, which may affect the calculations of other users. Please be careful.

     

    Reference