Slrum使用

Last updated on August 18, 2025

修改环境变量export PATH=$PATH:/opt/slurm/bin

想用开机自启使用vim ~/.bashrc修改启动设置，将export PATH=$PATH:/opt/slurm/bin添加到最后一行
nvcc –version 没有找到这个指令

/usr/local也没有对应的CUDAx.x的文件夹，是因为HPC(High-Performance Computing)/服务器通常使用“模块(module)”来管理软件环境。
运行命令 module avail，看看列表里有没有类似 cuda 或 cudatoolkit 的条目。
从上面的列表里选择一个版本进行加载。例如，加载 12.1 版本：module load cuda/12.1
取消load是使用module unload xx

srun -p debug -n 4 --gres=gpu:1 --time=00:30:00 --pty bash

首先是申请GPU，salloc -p xx -n 4 --gres=gpu:1，等待GPU分配
- 随后可以使用srun python xx.py
- 或者使用srun --pty bash调出计算节点的bash，之后正常使用nvidia-smi和python xx.py
- 常用的分区指令是salloc -p i64m1tga800ue -n 4 --gres=gpu:1

squeue -u <username> 找到自己的job id ，然后使用scancel <jobid>直接中断作业

通过创建批处理文件来自动在程序结束后运行keepbusy.py程序。

#注意当前路径
chmod +x xx.sh
./xx.sh

#!/bin/bash
python 1.py
python keepbusy.py

正常的多终端应该使用：watch -n 1 nvidia-smi 在slurm系统可用sgview -j <作业号> 相当于使用一次

也可以使用conda install -c conda-forge ffmpeg在环境下安装，同样可以使用。

export TRANSFORMERS_CACHE=/hpc2hdd/home/yhuang489/.cache/huggingface/hub

This line appears after every note.

There are no notes linking to this note.

Here are all the notes in this garden, along with their links, visualized as a graph.