There are two ways to perform intra-node parallel (shared memory parallel) processing: "Automatic parallelization" and "OpenMP".

Automatic parallelization

Basic use

Compile

You can use NEC compiler on SQUID vector nodes. Please note that the commands are different for each programming language.

$ module load BaseVEC

$ nfort -mparallel [options] source_file (FORTRAN)
$ ncc -mparallel [options] source_file (C)
$ nc++ -mparallel [options] source_file (C++)

With "-report-all" option, the compiler output compilation messages, code generation list, diagnostic list, format list, inline list, option list.

Job script

You have to specify the number of parallel as "OMP_NUM_THREADS", environmental value, on your jobscropt. The following is an example of job script to execute 10 parallel computations in a vector engine (SX-Aurora TSUBASA) with an elapsed time of 1 hour.

#!/bin/bash

#PBS -q SQUID

#PBS --group=[group name]

#PBS -l elapstim_req=1:00:00

#PBS --venode=1

#PBS -v OMP_NUM_THREADS=10

module load BaseVEC

cd $PBS_O_WORKDIR

./a.out

Note

If "OMP_NUM_THREADS" is not specified, or if the wrong value is specified, the program will be executed in an unintended parallel number, which may affect the calculations of other users. Please be careful.

Reference

NEC SX-Aurora TSUBASA Documentation

OpenMP

This is a method of intra-node parallelization (shared memory parallel processing) by inserting a line of instructions to the compiler in the program. The description method is slightly different in C and Fortran. The simple usage is described below.

Basic use

program sample

implicit none

include "omp_lib.h"

!$omp parallel

print *, "HELLO WORLD mythread = ",omp_get_thread_num(),"(",omp_get_num_threads(),"threads)"

!$omp end parallel

end

The highlighted lines (lines 3-6) are the OpenMP related processing.
We inserted to print out the OpenMP process on include "omp_lib.h"[line 3], omp_get_thread_num()[line 5], omp_get_num_threads()[line 5] . Even if you do not insert these, parallelization is same.

We specify include file for OpenMP library routine on line 3: include "omp_lib.h".

!$omp parallel[line 4],!$omp end parallel[line6] is a OpenMP derective.
The parallel directive line indicates that the block is to be parallelized until the end parallel directive line.
In this case, the 5th line will be processed in parallel. The number of parallelisms is specified by the environment variable OMP_NUM_THREADS.

Line 5, omp_get_thread_num and omp_get_num_threads are OpenMP library routines. If you want to use these routines, you need to specify the include file in line 2. omp_get_thread_num gets the number of processes divided (the number of parallel threads). omp_get_num_threads gets the number of each divided process (parallel thread). The number starts from 0.

The result of running the above program in 4 parallel is shown below.

HELLO WORLD mythread= 3 ( 4 threads)
HELLO WORLD mythread= 1 ( 4 threads)
HELLO WORLD mythread= 0 ( 4 threads)
HELLO WORLD mythread= 2 ( 4 threads)

Derective

Unlike auto-parallelization, users themselves must insert "directive lines" to instruct the parallelization. The directive line differs depending on the language,

! $omp directive line name (in the case of Fortran)
#pragma omp directive line name (in the case of C/C++)

About the instruction line, please see the reference material.