Thanks to CSC-Tieteen tietotekniikan keskus Oy, Students in Finland have change to run high performance computer for free. But they change the system frequently. By the time you read my post, I am not sure this page will work as the same.
This page is my workflow for using GPU cluster on CSC [July 2. 2024].
Choose “Login node shell” after your login.
[username@mahti-login ~]$ csc-workspaces
You will see 3 folder you have:
Disk area Capacity(used/max) Files(used/max) Project description
----------------------------------------------------------------------------------
Personal home folder
----------------------------------------------------------------------------------
/users/username 500k/10G .11k/100k
Project applications
----------------------------------------------------------------------------------
/projappl/project_xxxxxxx 17.59G/50G 0k/100k
Project scratch
----------------------------------------------------------------------------------
/scratch/project_xxxxxxx 54.47G/1T 4.01k/1000k
Usually /scratch/project_xxxxxx is used for large file, /projappl/project_xxxxxxx is used for your actual project.
cd /projappl/project_xxxxxxx
git clone https://github.com/PKU-YuanGroup/Video-LLaVA
cd Video-LLaVA
Don’t just try to use Conda, Cluster usually use their own Containerization. In CSC case, they use Tykky. Check the installation, sometime they update.
module load tykky
make sure you add all the dependencies you need, otherwise when you want to add other dependencies it will take FOREVER to update.
Step 3.2.1: Create a Conda environment YAML file (environment.yml)
export CW_DEBUG_KEEP_FILES="/projappl/project_xxxxxxx/Video-LLaVA"
vim videollava_evn.yml
Step 3.2.1: Add the requirements base on installation instruction Video-LLaVa provided
This the most difficult part, since CSC are using their own parameters.
'[/MAHTI_TYKKY_xj2Awum/miniconda/envs/env1/bin/python', '-m', 'pip', 'install', '-U', '-r', '/MAHTI_TYKKY_xj2Awum/condaenv.5mlbkjw8.requirements.txt', '--exists-action=b']
name: videollava
channels:
- defaults
- conda-forge
dependencies:
- python=3.10
- pip
- pip:
- pip==22.0
- -e /projappl/project_xxxxxxx/Video-LLaVA
- -e "/projappl/project_xxxxxxx/Video-LLaVA/[train]"
We have to deal with the rest of the requirement after we install the environment.
Step 3.3.1: Create a folder to store the package
mkdir videollava_evn
Step 3.3.2: Run container command
conda-containerize new --prefix ./videollava_evn videollava_evn.yml
If you’re luck enough, you can install everything in the Conda yml file, but a lot of time you will end of here, constantly install shits.
Step 3.4.1: Install the rest of the package we cannot install during the Conda
vim restpackages.sh
Step 3.4.2: Add the content
pip install flash-attn --no-build-isolation \
pip install decord opencv-python git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
Step 3.4.3: Run the conda update command
firstly load cuda, because package: flash-attn needs nvcc. secondly update the conda container
module load cuda
conda-containerize update --post-install post.sh /path/to_install
Step 3.5.1: Create a GPU batch job script
NOTE: change the directory to project.
vim run_videollava.sh
Step 3.5.2: add the rest of the content remember change the xxxxxxx to your project folder.
The sbatch means that we are using
#!/bin/bash
#SBATCH --job-name=videollava_test
#SBATCH --account=project_xxxxxxx
#SBATCH --partition=gpusmall
#SBATCH --gres=gpu:a100:1
#SBATCH --time=00:15:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
module load myprog/1.2.3
export PATH="/projappl/project_xxxxxxx/Video-LLaVA/videollava_evn/bin:$PATH"
export HF_DATASETS_CACHE=/scratch/project_xxxxxxx/videollava_cache
export XDG_CACHE_HOME=/scratch/project_xxxxxxx/videollava_cache
export PIP_CACHE_DIR=/scratch/project_xxxxxxx/videollava_cache
export TRANSFORMERS_CACHE=/scratch/project_xxxxxxx/videollava_cache
export HF_HOME=/scratch/project_xxxxxxx/videollava_cache
CUDA_VISIBLE_DEVICES=0 python -m videollava.serve.cli --model-path "LanguageBind/Video-LLaVA-7B" --file "/scratch//project_xxxxxxx/BDD/samples-1k/videos/08008acf-4c0ddc8b.mov" --load-4bit
Step 3.5.3 : Submit to the queue. instruction
sbatch run_videollava.sh
If you job is running.
Attach to the job
tail -f slurm-3578575.out