For executing the compiled load module(a.out) on large-scale computing system, you have to edit a job script file(shell script file for job-request) and submit to the scheduler.
The following is the example of job script.
1 2 3 4 5 6 7 |
#!/bin/bash #PBS -q OCTOPUS #PBS -l elapstim_req=1:00:00,cpunum_job=24 #PBS -M user@hpc.cmc.osaka-u.ac.jp #PBS -m b cd $PBS_O_WORKDIR ./a.out > result.txt |
The job-script file is two-stage construction.
1. Specify the resource of computer and environment(line started #PBS)
2. writing the process executed on the computer(shell script)
1. Specify the resource of computer and environment(line started #PBS)
#PBS (option)
Please write the comment part of job script as the option for submit. Please specify like "#PBS ..." on the comment part before first shell command.
Introduction of option
-
* We marked [for use!] about the required option.
-
elapstim_req: runtime limit
(The example in the case of executing for 2 hour: elapstim_req=2:00:00) -
cpunum_job: usage number of CPU cores per 1 node
If omitted, it will be automatically set to 24
(The example in the case of executing on 1 node and 24 cores: cpunum_job=24) -
gpunum_job: usage number of GPUs per 1 node
It can be omitted if you do not use GPUs.
(The example in the case of executing on 1 node and 4 GPUs: gpunum_job=4)
* We marked [for MPI!] about the required option for calculating with MPI.
Job-class (-q option) [for use!]
#PBS -q [job-class name]
Please specify the job-class of computer you want to use on [job-class name]. The following is the example of [job-class name].
destination node | job-class name |
---|---|
CPU node | OCTOPUS / DBG* |
GPU node | OCTOPUS / DBG* |
Xeon Phi node | OCTPHI |
Large-scale shared memory node | OCTMEM |
Please see the job-class table for the detail.
- * "OCTOPUS" and "DBG" job class
- When job-class name is "OCTOPUS" or "DBG", if the gpunum_job (explained below) is set to 1 or more in the following -l option, the job is submitted to GPU node. "DBG" is a short-time job class for debugging purposes. The elapstim_req (explained below) can be specified up to 10 minutes at the longest, but the waiting time is reduced because the job rotates quickly.
Limit of resource (-l option) [for use!]
#PBS -l elapstim_req=[runtime limit],cpunum_job=[usage number of cpu cores],gpunum_job=[usage number of GPUs per node]
Please specify the resource of computer you want to use.
If you use multi node also, please specify the resource of one node.
If your job runs longer than the value you specify elapstim_req option, it will be cancelled. The maximum run time is differ depending on the computer system. Please see the job-class table for the detail.
Number of nodes (-b option) [for MPI!]
#PBS -b [a number of nodes]
Please specify a number of nodes, if you will calculate on multi node.
Type of job (-T option) [for MPI!]
#PBS -T [type]
Please specify a MPI type of job you want to execute. The following is the example of [type].
MPI type | [type] |
---|---|
non-MPI | do not need (this option can be omitted) |
intelMPI | intmpi |
OpenMPI | openmpi |
MVAPICH2 | mvapich |
Specify the project ID(--group option)
#PBS --group=HPCI or JHPCN project ID
When you execute a program on the HPCI project or the JHPCN-HPCI project, you must write this line. If your project continues from the last fiscal year, please specify the project ID for this fiscal year.
examples:
#PBS --group=hp******(課題ID)
#PBS --group=jh******(課題ID)
Environment variables, option (-v option)
#PBS -v [Environment variables name]="[setting value]"
This option set the specified environment variables to all node the job executes. You can set the environment variables with "setenv", also. But the variable will not set to the slave node, if you execute job on multi nodes. Therefore, please use -v option.
The sending started/terminated message to e-mail (-m,-M option)
#PBS -m [mail option]
#PBS -M [email address]
When your job start to execute or terminate to execute, the system send message to your email address. Please set [mail option] about timing your want to receive the message.
b E-mail is sent when request execution is started.
e E-mail is sent when request execution is terminated.(includes an abnormal
termination
a E-mail is sent when request execution is abnormally terminated. Abnormal termination indicates the cases when at least one of the running batch job is terminated by signal or when forcibly terminated by trouble of SX hardware (CPU, IXS).
n E-mail is not.
eb E-mail is sent when request execution is started and terminated.
Result files option
#PBS -e [file name for standard error output]
#PBS -o [file name for standard output]
Please specify the result files of a request by the -e and -o option. If you didn't specify this option, name of output files are the following.
standard error output : "Job-name" + ".e" + "request ID"
standard output : "Job-name" + ".o" + "request ID"
These named file is convenient, if you analyze your program after the execution. We recommend that you don't specify this option.
Rerun setting(-r option)
#PBS -r n
Specify rerunning enable/disable of a request by the -r option.
The “y” means enable and the “n” means disable.
Other options
Please see "Part 4 1.16.qsub" on the following manual about other options.(Authentication in user ID required)
NQSII Users Guide
Please see the following job class tables about the job-class(queue) and the limit of classes on job script.
Job class table
2. writing the process executed on the computer(shell script)
Basically please write shell script for executing files or control directories.
cd $PBS_O_WORKDIR
# move to the directory you submitted job
./a.out > result.txt
# execute a.out, and redirect result to result.txt
* We prepare "$PBS_O_WORKDIR" as a environment valuables. This valuables is automatically set the directory you submitted job.
* The limit of standard output and standard error output is 100 MB. Please specify redirect ">", if you want to output over 100 MB.
./a.out > output.txt
Please note that the following is the example script in the case of bash.
-
execution file is "a.out".
The refirect file of standard output file is "out.txt".
The refirect file of standard error output file is "err.txt".
The refirect file of standard output and error output file is "out_and_err.txt".
redirect only the standard output file to the file.
./a.out > out.txt
redirect the standard output file and the standard error output file to the same file.
./a.out >& out_and_err.txt
redirect the standard output file and the standard error output file to the different file.
(./a.out > out.txt) >& err.txt
Notice in describing job script
Please insert the blank line at the end of script always. If you didn't insert the blank line, the system will not execute the last line of your script.