In shared usage, there may be a "waiting time" before executing calculations if other users are using the computer.
The execution order of calculations is determined and managed by a software called a scheduler, based on the amount of resources requested and various restrictions.
It’s difficult for users to control this, but by adjusting the scale of program calculations and the amount of resources requested in the job script, it’s possible to aim for shorter wait times.
The method will be explained below.
 

1.Check the current usage status

    There are several ways to check the usage status of other users, including yourself.
    If you want to visually check, including the future usage schedule, the following WEB page is available.

    OCTOPUS Usage Status SQUID Usage Status
     

    The following image shows the OCTOPUS Usage Status.
    "#" represents the computing nodes that are in use/scheduled.
    "." represents the computing nodes that are available.
     
    Depending on the status of the computer, there may be available computing nodes within the last few hours.
    In the example below, it shows that there is a "gap time" of 5 hours from 15:55 on 3 nodes of the OCTOPUS general-purpose CPU node group.
    By adjusting the elapsed time (elapstim_req) to fit into this "gap time" and submitting a job, there is a possibility that the calculation can be executed without waiting time.
     

     

2.Adjust the scale of computation and resource amount

    The following example is a job script to execute a 24-hour calculation on the OCTOPUS general-purpose CPU node group.
    If you submit it as it is with qsub, it will not fit into the above "gap time", and there will be a wait for the calculation execution.

    #!/bin/bash
    #PBS -q OCTOPUS
    #PBS -l elapstim_req=24:00:00
    cd $PBS_O_WORKDIR
    ./a.out

     

    In such a case, check if there is a discrepancy between the calculation time taken to execute the program and the "elpstim_req" time specified in the job script.
    The scheduler schedules (reserves) the calculation nodes based on the "elpstim_req" of the job script, so if it is set for a long time, it will not fit into the "gap time" and will be scheduled at the end, which may result in a long waiting time.
     
    In this example, if it is a program that ends execution in about 3 hours, it would be good to modify the elapstim_req specified in the job script to 4 hours.
    Also, even if it actually takes about 24 hours of calculation time, if it is a program that can reduce the calculation scale, a program that can be divided into multiple jobs, a program that can interrupt and resume the calculation, etc., change the parameters in the program to finish execution within 5 hours, and execute it in multiple times, it can be executed earlier in the end.
     
    As a note, when the calculation time reaches the time specified by elapstim_req, the program will be forcibly terminated, so please set elapstim_req with a little margin than the actual calculation time.
    If the execution time cannot be estimated, specify it longer, execute it several times, grasp the actual calculation time to some extent, and it is good to gradually shorten the specification of elapstim_req.
     

    This time, we modified the value of elapstim_req to end in 4 hours.

    #!/bin/bash
    #PBS -q OCTOPUS
    #PBS -l elapstim_req=4:00:00
    cd $PBS_O_WORKDIR
    ./a.out

     

    The job was successfully executed on the “free” node.

     

    Also, when executing a job that ends in a short time such as operation check, please use “DBG” for the job class specified by the #PBS -q option.
    The DBG queue is a short-time dedicated job class with “elpstim_req” up to 10 minutes at the longest, but the job rotation is fast, so the waiting time is less than the normal queue.
    Here is an example of a job script

    #!/bin/bash
    #PBS -q DBG
    #PBS -l elapstim_req=10:00
    cd $PBS_O_WORKDIR
    ./a.out

 

Notes

    The usage status WEB page is not real-time, but is updated once every 5 minutes.
    Also, when maintenance is being performed on the calculation node, it may temporarily look like a "free" node. Please note that the calculation may not be executed on the targeted node.
     
    As mentioned above, you can expect to shorten the waiting time by shortening the specification of elapstim_req, but if it exceeds elapstim_req, the job will be forcibly terminated at that point, so please be careful.

 

If you really want to execute it early: Use of high priority queue (SQUID limited)

    SQUID has a high priority queue called "SQUID-H".
     
    By specifying “SQUID-H” in the -q option of the job script and submitting a job, it becomes a high priority job, is scheduled preferentially to jobs submitted in the normal queue, and can shorten the waiting time. However, since the consumption coefficient is set higher than the normal queue, the point consumption amount when executed will be higher than the normal queue. For details on the point system, such as the consumption coefficient of the normal queue and the high priority queue, please see the following page.

    About the point system

    As mentioned above, it will be expensive to use, so we hope you will use it when you cannot shorten the specification of elapstim_req due to the nature of the program, or when the waiting time is not improved even if it is shortened due to excessive congestion.