OSC Introduction and Guide¶
This documentation is taken from a variety of resources, with a number of links and external resources available at the end of this guide.
Remember, during our first week of the course we received an excellent overview of the Ohio Supercomputer Center (OSC) from Dr. Kate Cahill of OSC. Please find her lecture on Carmen as a refresher.
Below is an intro into using OSC (using Dr. Cahill’s slides as a guide) and should not serve as an all-encompassing list. If you need that, please consult the links at the bottom.
Systems¶
System |
Pitzer |
Pitzer |
Owens |
Year |
2020 |
2018 |
2016 |
Theoretical Performance |
1300 TF |
1200 TF |
1600 TF |
Nodes |
398 |
260 |
824 |
CPU cores |
19104 |
10560 |
23392 |
Total Memory |
97.3 Tb |
70.6 Tb |
120 Tb |
Memory/Core |
> 5 Gb |
> 5 Gb |
> 5 Gb |
Notice that Pitzer, though newer, has fewer nodes (80%) with more cores (120%). However, it has 150% the theoretical performance. Why would that be? Turns out the CPUs are faster and there are more CPUs/node.
Login Nodes¶
The login nodes are where you:
Submit jobs
Mange and edit files
Very small scale, interactive work
There is a 1 GB memory and 20 min CPU limit on login nodes. Why? That’s all you should need! Login nodes are not for doing large scale jobs!
Filesystems¶
There are a number of different filesystems available on OSC, each with a different purpose.
Home¶
Where to store your files. This is backed up daily. Use $HOME to reference it.
Project¶
Available to project PIs by request. Shared by all users on a project. Backed up daily.
Our project is PAS1573. It should be available at /fs/scratch/PAS1573.
Scratch¶
Where to store large input or output files. This has faster I/O than the home or project directories. What does that mean? It means reading and writing to the disk is faster. It’s like downloading a large file from the internet verses copying a file from one folder to another on your local machine. This is also temporary storage and is not backed up. If you really need to keep a copy of data, copy/keep it in $HOME.
$TMPDIR¶
This is local (relative to the compute node) storage during job execution. There’s about 1 TB available on Owens
Connecting to OSC¶
From a Mac, Linux or UNIX-based machine:
ssh userid@pitzer.osc.edu
Replace “pitzer” with “owens” for which system you wish to log into, and replace “userid” with your OSC userid, usually osuXXXX.
From Windows:
Grab a free SSH client, like PuTTY. You’ll need to set up the configuration to connect, which is slightly more “difficult” than opening a terminal window and typing “ssh.” However, once done. Login will be a simple selection of which machine to log into.
OR you can login using OSC’s OnDemand.
If you need to connect to OSC and be able to use a X-based GUI, simply append “-X” flag after ssh (node: X is capitalized).
Transferring Data to/from OSC¶
There are 3 main ways of transferring data to/from OSC, each with advantages or disadvantages.
OnDemand¶
Login to OSC’s OnDemand, navigate to “Files” and then drag-and-drop files (up to 5 GB) from your local computer to OSC.
SFTP/SCP¶
To connect to OSC:
sftp userid@sftp.osc.edu
Cyberduck¶
You can also download and setup Cyberduck.
Batch Processing (i.e. submitting jobs)¶
“Where the real data processing gets done.”
To create a job and submit it to OSC, the following steps are usually done (we’ll break down the steps afterward):
Create a batch script
Submit the batch script as a job
Job gets queued
When resources become available (length of time you wait depends on how many resources you request), job starts/runs
Job finishes up, output is written
Creating a batch script¶
There are a minimum number of resources that must be specified in order for OSC to accept & run a job. First of all, the job file (we’re going to call it “ourJob.sh”):
#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --account=PAS1573
#SBATCH --output=job.log
# Load modules
module load blast/2.8.0+
# Note that the job starts in the directory it was submitted from
# but you can use this variable to get the path to that directory
echo "Starting dir: $SLURM_SUBMIT_DIR"
# Copy input data to the job node's local disk space
# This allows for faster processing
cp query.fasta $TMPDIR
# Change directory to node's local disk
cd $TMPDIR
# Execute the command
blastn -query query.fasta -db nr -outfmt 6 -out results.tsv
# Copy the results back to the directory where the job was submitted
cp results.tsv $SLURM_SUBMIT_DIR
Submitting the job¶
sbatch ourJob.sh
Job gets queued¶
How to show the status:
sstat -a jobid
squeue -u username
Show expected start time:
squeue --start --jobs=jobid
How to delete a job:
scancel jobid
When resources become available, job runs¶
Walltime limits:
168 hours for serial (single node) jobs
96 hours for parallel (multiple node) jobs
Per-user limits:
128 currently running jobs
2040 processor cores in use
1000 jobs in batch system (running + queued)
Per-group limits:
192 concurrently running jobs
2040 processor corers in use
How long will you be waiting for a job to run? It depends on many factors, such as how many other users are using OSC, how many resources requested (nodes, cores, GPUs, software licenses), and if you (or your group!) are already using (or requesting) a lot of resources. But you can use the squeue command (see above) to get an estimate for when the job will start. Note that this is only an estimate at that particular point in time. Newer jobs could be submitted that have higher priority, pushing your job lower in the queue which would result in a later estimated start time.
Job finishes up, with results!¶
What does the output look like? Besides the output generated from the tool, the job will generate a couple more files.
A job output file is immediately created as soon as the job starts. This file will be named according to what you specified in the #SBATCH --output=
parameter. It will contain output printed to stdout, but it may also contain errors that were written to stderr. By default that SLURM batch parameter joins both STDOUT and STDERR together. If you want separate output files for each, you can additionally specify #SBATCH --error=
.
Other cases¶
Interactive batch jobs: good for jobs that can’t be run on the login nodes, or as debugging (when you need more than 1 hour).
sinteractive -A PAS1573 -t 12:00:00 -J jobname -M 178000 -n 48 -N 1
Keep in mind that it might not be practical to wait for a job when the system load is high. If Pitzer or Owens is at >95% capacity, expect to wait quite a few minutes.
You can also grab a debug node:
sinteractive -A PAS1573 -J jobname -M 178000 -n 48 -N 1 -p debug
Notice that the memory specification is a bit different from the SLURM batch parameters. You can alternatively use the salloc
command which uses more familiar parameters:
salloc -t 012:00:00 --nodes=1 --ntasks=48 --mem=177gb --account=PAS1573 srun --pty bash
Similarly, a debug node can be requested:
salloc -t 01:00:00 --nodes=1 --ntasks=48 --mem=177gb --account=PAS1573 --partition=debug srun --pty bash
Interactive jobs can be exited with the exit
command or with the keystrokes ctrl-d (control key on Apple keyboards).
Modules¶
Modules modify environmental variables like $PATH and $MANPATH. By loading a module, you modify your $PATH and lets the system find the tool you need.
module load blast/2.8.0+
With blast loaded, you can now access blastn, blastp, blastx and all the other blast-family executables!
A few things to keep in mind with modules:
Don’t fully replace $PATH with a single folder, like “/fs/project/PAS0000/bin/” - it will cause essential system files to not be found. Instead, if you need to update your $PATH:
export PATH=$HOME/bin:$PATH
Here, you’re extending the $PATH variable to include the binaries/executables in $HOME/bin.
A short list to some module commands:
module list
Will show you what modules are loaded. Upon login, you’ll have several already loaded.
module spider
Get a list of what modules are available. If you want to know details about a specific module, add the name of the module to the command:
module spider blast/2.8.0+
Load a module:
module load blast/2.8.0+
Unload a module:
module unload blast/2.8.0+
Load a different version of a module:
module swap intel intel/13.1.3.192
Links¶
OSC Getting started guide: a first stop for figuring out how to navigate OSC and its resources.
Kate Cahill’s guide to OSC: this is an excellent BEGINNERS guide to high-performance computing (HPC) on the Ohio Supercomputer Center (OSC). This provides a step-by-step guide to pretty much everything you need to know to get started. You’ll learn, through the course, what is the HPC, how to connect, how to use the scheduler (i.e. how to submit jobs), how to use the cluster efficiently, as well as a basic guide to UNIX commands.
Job Arrays¶
Sometimes you need to run the same analysis on many different inputs, e.g. assemble different datasets using identical parameters or identify viruses. Another reason, size. With some analyses, while it’s simplier to concatenate the dataset (like combine contigs from multiple assemblies), it might take too long to for the analysis to complete in any reasonable amount of time. OR, maybe the data is too large to be processed by the program (i.e. memory limits). INSTEAD, you can use job arrays to split up your job and apply it to many different inputs. That’s job arrays.
To submit your job:
sbatch --array=1-25 job.sh
When your job runs, the job system creates a variable, SLURM_ARRAY_TASK_ID that can be used to specify the files you want to run your analysis on. Let’s look at the job file.
#!/bin/bash
#SBATCH --job-name=analysis_name
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --account=PAS1573
#SBATCH --output=job.log
singularity_dir=/users/PAS1117/osu9664/eMicro-Apps/
singularity run $singularity_dir/iPHoP-1.1.0.sif predict --fa_file ${SLURM_ARRAY_TASK_ID} --db_dir iphop_db/Sept_2021_pub/ -o results/${SLURM_ARRAY_TASK_ID}