OSC Introduction and Guide

This documentation is taken from a variety of resources, with a number of links and external resources available at the end of this guide.

Remember, during our first week of the course we received an excellent overview of the Ohio Supercomputer Center (OSC) from Dr. Kate Cahill of OSC. Please find her lecture on Carmen as a refresher.

Below is an intro into using OSC (using Dr. Cahill’s slides as a guide) and should not serve as an all-encompassing list. If you need that, please consult the links at the bottom.

Systems

System

Pitzer

Pitzer

Owens

Year

2020

2018

2016

Theoretical Performance

1300 TF

1200 TF

1600 TF

Nodes

398

260

824

CPU cores

19104

10560

23392

Total Memory

97.3 Tb

70.6 Tb

120 Tb

Memory/Core

> 5 Gb

> 5 Gb

> 5 Gb

Notice that Pitzer, though newer, has fewer nodes (80%) with more cores (120%). However, it has 150% the theoretical performance. Why would that be? Turns out the CPUs are faster and there are more CPUs/node.

Login Nodes

The login nodes are where you:

  1. Submit jobs

  2. Mange and edit files

  3. Very small scale, interactive work

There is a 1 GB memory and 20 min CPU limit on login nodes. Why? That’s all you should need! Login nodes are not for doing large scale jobs!

Filesystems

There are a number of different filesystems available on OSC, each with a different purpose.

Home

Where to store your files. This is backed up daily. Use $HOME to reference it.

Project

Available to project PIs by request. Shared by all users on a project. Backed up daily.

Our project is PAS1573. It should be available at /fs/scratch/PAS1573.

Scratch

Where to store large input or output files. This has faster I/O than the home or project directories. What does that mean? It means reading and writing to the disk is faster. It’s like downloading a large file from the internet verses copying a file from one folder to another on your local machine. This is also temporary storage and is not backed up. If you really need to keep a copy of data, copy/keep it in $HOME.

$TMPDIR

This is local (relative to the compute node) storage during job execution. There’s about 1 TB available on Owens

Connecting to OSC

From a Mac, Linux or UNIX-based machine:

ssh userid@pitzer.osc.edu

Replace “pitzer” with “owens” for which system you wish to log into, and replace “userid” with your OSC userid, usually osuXXXX.

From Windows:

Grab a free SSH client, like PuTTY. You’ll need to set up the configuration to connect, which is slightly more “difficult” than opening a terminal window and typing “ssh.” However, once done. Login will be a simple selection of which machine to log into.

OR you can login using OSC’s OnDemand.

If you need to connect to OSC and be able to use a X-based GUI, simply append “-X” flag after ssh (node: X is capitalized).

Transferring Data to/from OSC

There are 3 main ways of transferring data to/from OSC, each with advantages or disadvantages.

OnDemand

Login to OSC’s OnDemand, navigate to “Files” and then drag-and-drop files (up to 5 GB) from your local computer to OSC.

SFTP/SCP

To connect to OSC:

sftp userid@sftp.osc.edu

Cyberduck

You can also download and setup Cyberduck.

Batch Processing (i.e. submitting jobs)

“Where the real data processing gets done.”

To create a job and submit it to OSC, the following steps are usually done (we’ll break down the steps afterward):

  1. Create a batch script

  2. Submit the batch script as a job

  3. Job gets queued

  4. When resources become available (length of time you wait depends on how many resources you request), job starts/runs

  5. Job finishes up, output is written

Creating a batch script

There are a minimum number of resources that must be specified in order for OSC to accept & run a job. First of all, the job file (we’re going to call it “ourJob.sh”):

#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --account=PAS1573
#SBATCH --output=job.log

# Load modules
module load blast/2.8.0+

# Note that the job starts in the directory it was submitted from
# but you can use this variable to get the path to that directory
echo "Starting dir: $SLURM_SUBMIT_DIR"

# Copy input data to the job node's local disk space
# This allows for faster processing
cp query.fasta $TMPDIR

# Change directory to node's local disk
cd $TMPDIR

# Execute the command
blastn -query query.fasta -db nr -outfmt 6 -out results.tsv

# Copy the results back to the directory where the job was submitted
cp results.tsv $SLURM_SUBMIT_DIR

Submitting the job

sbatch ourJob.sh

Job gets queued

How to show the status:

sstat -a jobid
squeue -u username

Show expected start time:

squeue --start --jobs=jobid

How to delete a job:

scancel jobid

When resources become available, job runs

Walltime limits:

  • 168 hours for serial (single node) jobs

  • 96 hours for parallel (multiple node) jobs

Per-user limits:

  • 128 currently running jobs

  • 2040 processor cores in use

  • 1000 jobs in batch system (running + queued)

Per-group limits:

  • 192 concurrently running jobs

  • 2040 processor corers in use

How long will you be waiting for a job to run? It depends on many factors, such as how many other users are using OSC, how many resources requested (nodes, cores, GPUs, software licenses), and if you (or your group!) are already using (or requesting) a lot of resources. But you can use the squeue command (see above) to get an estimate for when the job will start. Note that this is only an estimate at that particular point in time. Newer jobs could be submitted that have higher priority, pushing your job lower in the queue which would result in a later estimated start time.

Job finishes up, with results!

What does the output look like? Besides the output generated from the tool, the job will generate a couple more files.

A job output file is immediately created as soon as the job starts. This file will be named according to what you specified in the #SBATCH --output= parameter. It will contain output printed to stdout, but it may also contain errors that were written to stderr. By default that SLURM batch parameter joins both STDOUT and STDERR together. If you want separate output files for each, you can additionally specify #SBATCH --error=.

Other cases

Interactive batch jobs: good for jobs that can’t be run on the login nodes, or as debugging (when you need more than 1 hour).

sinteractive -A PAS1573 -t 12:00:00 -J jobname -M 178000 -n 48 -N 1

Keep in mind that it might not be practical to wait for a job when the system load is high. If Pitzer or Owens is at >95% capacity, expect to wait quite a few minutes.

You can also grab a debug node:

sinteractive -A PAS1573 -J jobname -M 178000 -n 48 -N 1 -p debug

Notice that the memory specification is a bit different from the SLURM batch parameters. You can alternatively use the salloc command which uses more familiar parameters:

salloc -t 012:00:00 --nodes=1 --ntasks=48 --mem=177gb --account=PAS1573 srun --pty bash

Similarly, a debug node can be requested:

salloc -t 01:00:00 --nodes=1 --ntasks=48 --mem=177gb --account=PAS1573 --partition=debug srun --pty bash

Interactive jobs can be exited with the exit command or with the keystrokes ctrl-d (control key on Apple keyboards).

Modules

Modules modify environmental variables like $PATH and $MANPATH. By loading a module, you modify your $PATH and lets the system find the tool you need.

module load blast/2.8.0+

With blast loaded, you can now access blastn, blastp, blastx and all the other blast-family executables!

A few things to keep in mind with modules:

  • Don’t fully replace $PATH with a single folder, like “/fs/project/PAS0000/bin/” - it will cause essential system files to not be found. Instead, if you need to update your $PATH:

export PATH=$HOME/bin:$PATH

Here, you’re extending the $PATH variable to include the binaries/executables in $HOME/bin.

A short list to some module commands:

module list

Will show you what modules are loaded. Upon login, you’ll have several already loaded.

module spider

Get a list of what modules are available. If you want to know details about a specific module, add the name of the module to the command:

module spider blast/2.8.0+

Load a module:

module load blast/2.8.0+

Unload a module:

module unload blast/2.8.0+

Load a different version of a module:

module swap intel intel/13.1.3.192

Job Arrays

Sometimes you need to run the same analysis on many different inputs, e.g. assemble different datasets using identical parameters or identify viruses. Another reason, size. With some analyses, while it’s simplier to concatenate the dataset (like combine contigs from multiple assemblies), it might take too long to for the analysis to complete in any reasonable amount of time. OR, maybe the data is too large to be processed by the program (i.e. memory limits). INSTEAD, you can use job arrays to split up your job and apply it to many different inputs. That’s job arrays.

To submit your job:

sbatch --array=1-25 job.sh

When your job runs, the job system creates a variable, SLURM_ARRAY_TASK_ID that can be used to specify the files you want to run your analysis on. Let’s look at the job file.

#!/bin/bash
#SBATCH --job-name=analysis_name
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --account=PAS1573
#SBATCH --output=job.log

singularity_dir=/users/PAS1117/osu9664/eMicro-Apps/

singularity run $singularity_dir/iPHoP-1.1.0.sif predict --fa_file ${SLURM_ARRAY_TASK_ID} --db_dir iphop_db/Sept_2021_pub/ -o results/${SLURM_ARRAY_TASK_ID}