(Closed) I/O delays in the SQUID file system (Lustre)

October 10, 10:00 a.m. add
 

We are currently performing urgent maintenance on the SQUID file system (Lustre) to deal with the I/O delays that have been occurring since October 4.
 

The investigation into the cause of the issue and restoration work has now been completed almost, All services will resume at 13:00 today, October 10.
Job submission/execution to SQUID and access to ONION-files are also possible.
 

We apologize for the long service suspension.
Thank you for your cooperation.
 

----

October 6, 3:00 p.m. add
 

Since 10/4, there have been I/O delays in the SQUID file system (Lustre). To address this, we have been conducting emergency maintenance for SQUID since 10:00 AM, 10/5.
 

We are still continuing with maintenance, but we will resume some services from 10/6 at 3:00 p.m. as detailed below.
Please note that job submissions and execution to SQUID are not possible. We appreciate your understanding.

 

- SQUID
We are still continuing with maintenance. However, you will be able to log in to the HPC front-end server (squidhpc.hpc.cmc.osaka-u.ac.jp). Access to the HPDA front-end server is not possible. You can access data on the file system, but please be aware that you cannot submit or execute jobs.
 

- ONION
ONION-file will stop during maintenance. Users cannot log in and access data during the maintenance process.
ONION-object can be used normally during maintenance.
 

- OCTOPUS and other WEB systems
These can be used normally during maintenance.
 

We apologize for the prolonged service interruption.

We will update you if there are any changes to the situation.
Thank you for your understanding.
 

----

October 5, 10:00 a.m. add
 

Emergency maintenance will be performed from 10:00 a.m. on Thursday October 5 to address this issue.
We apologize for the inconvenience.
We will inform you as soon as the maintenance is completed.
 

- SQUID
All SQUID service will stop during maintenance. Users cannot log in and submit jobs during the maintenance process.
All currently running jobs are rerun. After the maintenance, the job will be executed from the beginning. Applicable users will be notified individually.
 

- ONION
ExaScaler and ONION-file will stop during maintenance. Users cannot log in and access data during the maintenance process.
ONION-object can be used normally during maintenance.
 

- OCTOPUS and other WEB systems
These can be used normally during maintenance.
 

----
There were again I/O delays in the SQUID file system (Lustre) on Wednesday October 4. Specifically, the following phenomena are occurring:
- Jobs running on the compute nodes are delayed.
- Command operations on the frontend server are delayed.
Currently, we are conducting an investigation into the cause of the failure and working on restoration.
We apologize for the inconvenience.




Posted : October 04,2023