To execute the compiled load module(a.out) on a large-scale computing system, edit the job-script file(shell script file for job-request) and submit the file to the scheduler.
HPCI users must specify the HPCI project ID in the job script. Please see this section about for details..
The following is the example of a job-script.
1 2 3 4 5 6 7 8 |
#!/bin/csh #PBS -q ACE #PBS -l elapstim_req=5:30:00,memsz_job=10GB #PBS -M user@hpc.cmc.osaka-u.ac.jp #PBS -A hp123456 #PBS -m b cd $PBS_O_WORKDIR ./a.out > result.txt |
The job-script file is a two-stage construction.
1. Specify the resources of the computer and environment(line started #PBS)
2. Write the process executed on the computer(shell script)
1. Specifying the resources of the computer and environment (line starting #PBS)
#PBS (option)
Please write the comment part of the job script as the option for submit. Please specify the resources and environment such as "#PBS ..." on the comment part before first shell command.
Introduction of the option
* We marked [for use!] for the required option.
* We marked [for MPI!] for the required option to calculate using MPI.
Job-class (-q option) [for use!]
#PBS -q [job-class name]
Please specify the job-class of the computer you want to use on [job-class name]. Please see the job-class table about for details.
Project ID(-A option) [for use!]
#PBS -A [Project ID]
If you submit using HPCI or JHPCN-HPCI, please specify the project ID. Even if you are continuing the project from the previous year, please specify the project ID of current year.
Limit of resources (-l option) [for use!]
#PBS -l memsz_job=[memory per 1 node],elapstim_req=[runtime limit]
Please specify the resources of the computer you want to use.
If you also use a multi node, please specify the resource of the one node.
* cpunum_job: a number of CPU per 1 node
(The example is for the case of executing on 1 node and 2 CPU: cpunum_job=2)
* memsz_job: a memory size per 1 node
(The example is for the case of executing on 20 GB memory per 1 node: memsz_job=20GB)
* elapstim_req: runtime limit
(The example is for the case of executing for 2 hour: elapstim_req=2:00:00)
* cputim_job: runtime limit on CPU
(The example is for the case of executing for 10 minute per CPU: cputim_job=0:10:00)
If your job runs longer than the value you specify in the elapstim_req option, it will be cancelled. The maximum run time differs depending on the computer system. Please see the job-class table for details.
Number of nodes (-b option) [for MPI!]
#PBS -b [a number of nodes]
Please specify the number of nodes, if you calculate on a multi node.
Type of job (-T option) [for MPI!]
#PBS -T [type]
Please specify the type of job you want to execute. The following is an example of [type]:
* mpisx : in the case of executing with MPI or HPF on SX.
* intmpi: in the case of executing with MPI or HPF on a PC cluster.
Environment variables, option (-v option
#PBS -v [Environment variables name]="[setting value]"
This option sets the specified environment variables to all nodes the job executes. The environment variables can also be set with "setenv". However, the variable will not set to the slave node if you execute the job on multi nodes. Therefore, please use the -v option.
THe sending started/terminated message to e-mail (-m,-M option)
#PBS -m [mail option]
#PBS -M [email address]
When your job starts to execute or terminates the execution, the system sends a message to your email address. Please set your [mail option] to the you want to receive the message.
b E-mail is sent when request execution is started.
e E-mail is sent when request execution is terminated(includes an abnormal
termination).
a E-mail is sent when request execution is abnormally terminated. Abnormal termination indicates the cases when at least one of the running batch jobs is terminated by signal or when forcibly terminated by SX hardware trouble (CPU, IXS).
n E-mail is not sent.
eb E-mail is sent when request execution is started and terminated.
Result files option
#PBS -e [file name for standard error output]
#PBS -o [file name for standard output]
Please specify the result files of a request by the -e and -o option. If you didn't specify this option, the name of output files are the following:
standard error output : "Job-name" + ".e" + "request ID"
standard output : "Job-name" + ".o" + "request ID"
These named files are convenient if you analyze your program after the execution. We recommend that you don't specify this option.
Staging files(-I option、-O option) * you can use only HCC !
#PBS -I "[the client host],[the execution host]"
#PBS -O "[the client host],[the execution host]"
Staging is an option to transfer any file from the client host to the execution host before and after the execution.
If you can use HCC, this option is required ! Please see the following page for details:
How to describe the job script (Staging)
Rerun setting(-r option)
#PBS -r n
Specify the rerunning enable/disable of a request by the -r option.
The “y” means enable and the “n” means disable.
Other options
Please see "Part 4 1.16.qsub" in the following manual about other options.(Authentication in user ID required)
NQSII Users Guide
Please see the following job class tables about the job-class(queue) and the limit of classes on the job script.
Job class table
2. Writing the process executed on the computer(shell script)
Basically please write the shell script to executing files or control directories.
cd $PBS_O_WORKDIR
# move to the directory in which you submitted the job
./a.out > result.txt
# execute a.out, and redirect result to result.txt
* We prepare "$PBS_O_WORKDIR" as environment valuables. These valuables are automatically set to the directory in which you submitted the job.
* The limit of standard output and standard error output is 100 MB. Please specify redirect ">", if you want to output over 100 MB.
./a.out > output.txt
Please note that the following is the example script in the case of csh.
-
execution file is "a.out".
The refirect file of standard output file is "out.txt".
The refirect file of standard error output file is "err.txt".
The refirect file of standard output and error output file is "out_and_err.txt".
redirect only the standard output file to the file.
./a.out > out.txt
redirect the standard output file and the standard error output file to the same file.
./a.out >& out_and_err.txt
redirect the standard output file and the standard error output file to the different file.
(./a.out > out.txt) >& err.txt
Notice in describing job script
Please always insert a blank line at the end of script. If the blank line is not inserted, the system will not execute the last line of your script.