D3 Center, The University of Osaka » 分類

2024.04.18

Can I get the maximum memory usage per compute node ?

The maximum memory usage per generic CPU node can be collected by periodically executing "qstat -Jf".
Of the memory cgroup size (Memory Cgroup Resources) used by the job with "qstat -Jf", "Memory Usage" is the current memory usage and "Max Memory Usage" is the maximum memory usage.

Posted at 02:42 | Category: | Comments Off on Can I get the maximum memory usage per compute node ?

2022.10.13

Can I use wandb on SQUID?

You can install wandb with pip command.

pip install wandb
wandb login "XXXX"

SQUID can not access to internet on computational nodes, however, we permitted only for "wandb".
Please describe a job script file as the following:

#!/bin/bash
#PBS -q SQUID
#PBS --group=[group name]
#PBS -l elapstim_req=1:00:00
cd $PBS_O_WORKDIR
export http_proxy="http://ibgw1f-ib0:3128"
export https_proxy="http://ibgw1f-ib0:3128"
python test.py

Please use requests-2.24. Versions such as 2.26 will not work properly with SQUID.

Posted at 11:04 | Category: | Comments Off on Can I use wandb on SQUID?

2019.01.10

Binary data get into output file of MPI job

Binary data is seldom written in an output file when each process is written to output data to the same name file.

Please use MPI-IO or write to output data to separate files for each process.

Please see the following page for MPI-IO
MPIの実行結果を1つのファイルに出力したい

Posted at 03:29 | Category: | Comments Off on Binary data get into output file of MPI job

2017.06.06

How do I assign one MPI process each per node by round-robin?

[Supplementation for Question]
If I run parallel computing job(Intel MPI) on 4 node of VCC (20 core), I want to assign MPI nodes as the following:

node 1: rank 0, 4, 8, ..., 76
node 2: rank 1, 5, 9, ..., 77
node 3: rank 2, 6, 10, ..., 78
node 4: rank 3, 7, 11, ..., 79

[Answer]
In the case of this parallel computing, please specify as the following job-script:

#PBS -b 4
mpiexec -ppn 1 -n 80 ./a.out

manual about -ppn option for mpiexec(IntelMPI)

-perhost <# of processes>, -ppn <# of processes, -grr <# of processes>
Use this option to place the specified number of consecutive MPI processes on every host in the group using round robin scheduling.

Posted at 02:50 | Category: | Comments Off on How do I assign one MPI process each per node by round-robin?

2017.05.29

Why it increase the processing time on un-parallelization part with OpenMP?

When you specified OpenMP or Auto-Parallelization option on compile, the compiler links the library for parallelization whether parallel directive or not.
The functions of library for Parallelization differ from the normal functions in that have the lock routine for that other thread limit access to resources.

If it calls the functions of Library for parallelization on the un-parallelization part, it runs on one thread, of course. Therefore, it will not wait for other processes for lock routine. But, it needs a little bit processing time. Because the functions of library for parallelization have to make a decision for whether should do or do not the lock routine. Please note that.

Posted at 01:35 | Category: | Comments Off on Why it increase the processing time on un-parallelization part with OpenMP?

2017.01.20

We want to re-direct standard output of MPI result on vector node of SQUIDto the other file.

If you want to re-direct standard output of MPI result on vector node of SQUID, please use the script "/opt/nec/ve/bin/mpisep.sh".

How to use this script is the following:

1 2	#PBS -v MPISEPSELECT=3 mpirun -np 160 /opt/nec/ve/bin/mpisep.sh ./a.out

In the case, the standard output is output to stdout.0:(MPI process ID), and the standard error output is output to stderr.0:(MPI process ID) in real time.

Please see "3.3" of the following manual about the detail:
NEC MPI User's guide

If you modified mpisep.sh, you change stdout/stderr filename into whatever you want to name.

Posted at 11:58 | Category: | Comments Off on We want to re-direct standard output of MPI result on vector node of SQUIDto the other file.

2016.09.16

Can I specify any permissions for standard output file and standard error output file ?

On SX-ACE and VCC, the permission of standard output file and standard error output file depend on "umask". Please specify the permission with "umask" command on front-end server.

Posted at 04:47 | Category: | Comments Off on Can I specify any permissions for standard output file and standard error output file ?

2016.08.24

How could I make the independent random number generation ?

Many PRNG (pseudo random number generator) make random number from the specified random seed. If you specified same random seed, it will generate same random number. If you want to get the independent random number, you have to change random seed each time.

Posted at 09:35 | Category: | Comments Off on How could I make the independent random number generation ?

2015.06.17

How do I check MPIPROGINF after performing MPI on SX-ACE?

Please see the following manula section "2.13.6" about the output infomation with MPIPROGINF.
How to use MPI/SX

* Note
If you run the hybrid MPI and OpenMP parallel program, these output information in "UserTime" and "SysTime" are CPU times per process. These times will be longer than "RealTime", because these times are additions each CPU times.

Posted at 06:28 | Category: | Comments Off on How do I check MPIPROGINF after performing MPI on SX-ACE?

2014.07.14

実行中のジョブが終了したことを合図に、次のジョブを自動で投入したい。その際、実行中のjobが成功したかどうかで、投入するジョブを変えたい。

「qwait」コマンドを使用することで、実現可能です。

このコマンドは引数で与えたリクエストID(例：12345.cmc)を待ち合わせするというものです。
指定のリクエストIDのジョブが終了するとメッセージ終了と共にコマンドが終了します。

コマンドの詳細についてはポータルで公開されておりますマニュアル
「NQS利用の手引」のリファレンス編第1章ユーザコマンドをご参照頂けますようお願い致します。

NQSII利用の手引き(要認証)
NQSV利用の手引き
※ man qwait でもヘルプを参照できます。

qwait については下記のような使い方が可能です。

監視スクリプトをバックグラウンド実行し、スクリプト内で qwaitを実行します。
exitコード(上記のマニュアルに記載があります)で判定し、その後の動作を分岐させています。
参考にしてください。

-----------
$ qsub job1-1
Request 12345.cmc submitted to queue: Pxx.
$ (./chkjob >& log &)

----- chkjob
#!/bin/sh
while :
do
qwait 12345.cmc #リクエストIDを任意のものに変更して下さい
case $? in
0) qsub job1-2;exit;;
1) qsub job2-1;exit;;
2) qsub job3-1;exit;;
3) echo NQS error | mail xxxx@yyyy.ac.jp;exit;;#メールアドレスを任意のものに変更してください
7) continue;;
*) ;;
esac
done
------------

以上です。

Posted at 03:23 | Category: | No Comments