There are two methods for intra-node parallelization (shared memory parallelization) , "Automatic Parallelization" and "OpenMP".

Automatic Parallelization

Basic use

How to compile

You can Intel compiler on SQUID. Please note that commands are different for each programming language.

$ module load BaseCPU

$ ifort -parallel [options] source_file (Fortran)
$ icc -parallel [options] source_file (C)
$ icpc -parallel [options] source_file (C++)

If you want to output the parallelization report, please add "-qopt-report=[n] -qopt-report-pahse=par" option. You can specify the message level from 0 to 5 for [n].

Writting job script

At runtime, the number of parallelism is specified in the environment variable named OMP_NUM_THREADS. The following is an example of a script that executes 76 parallel computations in a node with an elapsed time of 1 hour.

#!/bin/bash

#PBS -q SQUID

#PBS --group=[group name]

#PBS -l elapstim_req=1:00:00

#PBS -l cpunum_job=76

#PBS -v OMP_NUM_THREADS=76

module load BaseCPU

cd $PBS_O_WORKDIR

./a.out

Note

Please do not forget to specify "OMP_NUM_THREADS" in the execution script. If you don't specify "OMP_NUM_THREADS" or you specify a wrong value, it will be executed with an unintended parallel number.

Reference

Intel compiler Document

OpenMP

Basic use

program sample

implicit none

include "omp_lib.h"

!$omp parallel

print *, "HELLO WORLD mythread = ",omp_get_thread_num(),"(",omp_get_num_threads(),"threads)"

!$omp end parallel

end

The highlighted lines (lines 3-6) are the process related to OpenMP. include "omp_lib.h"[line 3], omp_get_thread_num()[line 5], and omp_get_num_threads()[line 5] are inserted for Print output of OpenMP processing. Even if you don't input them, the movement as a parallel processing doesn't change.

Line 3: include "omp_lib.h"
specifies the include file for the OpenMP library routines. This is required to use the library routines described below.

Line 4: !$omp parallel
and
Line 6: !$omp end parallel
is OpenMP directive. The parallel directive line indicates that the block is to be parallelized until the end parallel directive line. In this case, the 5th line will be parallelized. The number of parallelism is specified by the environment variable OMP_NUM_THREADS.

Line 5 omp_get_thread_num and omp_get_num_threads is OpenMP library routine.
If you want to use these routines, you need to specify the include file on the second line. omp_get_thread_num gets the number of processes divided (parallel threads). omp_get_num_threads gets a number for each divided process (parallel thread). The number starts from 0.

The result of running the above program in 4 parallel is shown below.

HELLO WORLD mythread= 3 ( 4 threads)
HELLO WORLD mythread= 1 ( 4 threads)
HELLO WORLD mythread= 0 ( 4 threads)
HELLO WORLD mythread= 2 ( 4 threads)

About directive lines

Unlike automatic parallelization, users themselves need to insert "directive lines" to instruct the parallelization. The directive line differs depending on the language,

! $omp directive line name (in the case of Fortran)
#pragma omp directive line name (in the case of C/C++)

.
About the instruction line, please see the reference material.