Two of the most basic commands of slurm scheduling system are sinfo and squeue. For extended documentation, someone could refer to the corresponding webpages (sinfo, squeue). A brief summary of these two commands follows. For jobs monitoring purposes through advanced use of sinfo and squeue, users can rely on the article Jobs Monitoring of this documentation.
sinfo:view of current state of nodes and partitions
The following picture depicts the output of sinfo command:
- Column PARTITION: cluster partitions are shown. The asterisk in “batch” partition means that batch is the default partition (if a partition is not defined in a user’s job, then the job is allocated to the batch partition).
- Column AVAIL: the availability of the partition is provided (expected values up/down).
- Column TIMELIMIT: the time duration that a job is allowed to run. In fast partition, this time is limited to 12 hours.
- Column NODES: the number of nodes at a specific state. For example, four nodes (node 4,5,6 and 10) are at “alloc” state.
- Column STATE: the current operation mode.
An idle partition/node is available for allocation and job execution
On an allocated partition/node a job or jobs are already running so allocation is impossible
A down partition/node is unavailable due to technical reasons
On a mix partition/node a job or jobs are already running but there are remaining resources (cores/memory) so that new jobs can be allocated.
- Column NODELIST: the set (names) of nodes at a specific state.
squeue: view of current state about the slurm scheduling queue
In the following pictures the output of squeue is illustrated.
- Column JOBID: the id of the job. Jobid is automatically given by slurm controller.
- Column PARTITION: the partition where the job is running.
- Column NAME: the name of the job (assigned by user).
- Column USER: the name of the user whose job is running.
- Column ST: the state of the job. Typical values are: R(running), PD (pending), CG (completing), CD (completed)
- Column TIME: the elapsed time since the beginning of execution
- Column NODELIST(REASON): The node(s) where the job is running or the reason that the job is pending for execution for. Typical values of REASON: Resources (the job is waiting for resources to become available), Priority (One or more higher priority jobs exist for this partition), Dependency (This job is waiting for a dependent job to complete)