Skip to content

Commit f2e6906

Browse files
authored
Merge pull request #188 from NYU-RTS/review_tutorial_intro_hpc
Review tutorial: Intro to HPC
2 parents b36660c + 6361b5e commit f2e6906

9 files changed

+137
-128
lines changed

docs/hpc/13_tutorial_intro_hpc/01_intro_hpc.mdx

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,18 @@
33
This tutorial is an introduction to using the Greene high-performance computing systems at NYU effectively. It is not intended to be an exhaustive course on parallel programming. The goal is to give new users of Greene an introduction and overview of the tools available and how to use them effectively.
44

55
:::warning[Prerequisites]
6-
Command line experience is necessary for this lesson. We recommend the participants to go through our [Introduction to Using the Shell on Greene](../12_tutorial_intro_shell_hpc/01_intro.mdx), if new to the command line (also known as terminal or shell).
6+
Command line experience is necessary for this lesson. We recommend participants go through our [Introduction to Using the Shell on Greene](../12_tutorial_intro_shell_hpc/01_intro.mdx) tutorial, if new to the command line (also known as terminal or shell).
77
:::
88

99
:::note[Objectives]
10-
By the end of this workshop, students will know how to:
10+
By the end of this tutorial, participants will know how to:
1111
- Identify problems a cluster can help solve
1212
- Use the UNIX shell (also known as terminal or command line) to connect to a cluster.
1313
- Transfer files onto a cluster.
1414
- Submit and manage jobs on a cluster using a scheduler.
1515
- Observe the benefits and limitations of parallel execution.
16-
:::
16+
:::
17+
18+
:::info[Provenance]
19+
This tutorial was adapted for NYU's Greene HPC from [HPC Carpentry](https://github.com/hpc-carpentry)
20+
:::

docs/hpc/13_tutorial_intro_hpc/02_why_use_cluster.mdx

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,18 @@ How does computing help you do your research? How could more computing help you
1717

1818
Frequently, research problems that use computing can outgrow the desktop or laptop computer where they started:
1919

20-
- A statistics student wants to do cross-validate their model. This involves running the model 1000 times — but each run takes an hour. Running on their laptop will take over a month!
20+
- A statistics student wants to cross-validate their model. This involves running the model 1000 times — but each run takes an hour. Running on their laptop will take over a month!
2121
- A genomics researcher has been using small datasets of sequence data, but soon will be receiving a new type of sequencing data that is 10 times as large. It’s already challenging to open the datasets on their computer — analyzing these larger datasets will probably crash it.
2222
- An engineer is using a fluid dynamics package that has an option to run in parallel. So far, they haven’t used this option on their desktop, but in going from 2D to 3D simulations, simulation time has more than tripled and it might be useful to take advantage of that feature.
2323

24-
In all these cases, what is needed is access to more computers than can be used at the same time. Luckily, large scale computing systems — shared computing resources with lots of computers — are available at many universities, labs, or through national networks. These resources usually have more central processing units(CPUs), CPUs that operate at higher speeds, more memory, more storage, and faster connections with other computer systems. They are frequently called “clusters”, “supercomputers” or resources for “high performance computing” or HPC. In this lesson, we will usually use the terminology of HPC and HPC cluster.
24+
In all these cases, what is needed is access to more computers than can be used at the same time. Luckily, large scale computing systems — shared computing resources with lots of computers — are available at many universities, labs, or through national networks. These resources usually have more central processing units(CPUs), CPUs that operate at higher speeds, more memory, more storage, and faster connections with other computer systems. They are frequently called “clusters”, “supercomputers” or resources for “high performance computing” or HPC. In this lesson, we will usually use the terminology of HPC and HPC cluster and we focus on NYU's HPC cluster Greene.
2525

2626
Using a cluster often has the following advantages for researchers:
2727

28-
- **Speed**: With many more CPU cores, often with higher performance specs, than a typical laptop or desktop, HPC systems can offer significant speed up.
28+
- **Speed**: With many more CPU cores, often with higher performance specs than a typical laptop or desktop, HPC systems can offer significant speed up.
2929
- **Volume**: Many HPC systems have both the processing memory (RAM) and disk storage to handle very large amounts of data. Terabytes of RAM and petabytes of storage are available for research projects.
3030
- **Efficiency**: Many HPC systems operate a pool of resources that are drawn on by many users. In most cases when the pool is large and diverse enough the resources on the system are used almost constantly.
31-
- **Cost**: Bulk purchasing and government funding mean that the cost to the research community for using these systems in significantly less that it would be otherwise.
31+
- **Cost**: Bulk purchasing and government funding mean that the cost to the research community for using these systems is significantly less that it would be otherwise.
3232
- **Convenience**: Maybe your calculations just take a long time to run or are otherwise inconvenient to run on your personal computer. There’s no need to tie up your own computer for hours when you can use someone else’s instead.
3333

3434
This is how a large-scale compute system like a cluster can help solve problems like those listed at the start of the lesson.
@@ -47,13 +47,13 @@ Learning to use Bash or any other shell sometimes feels more like programming th
4747
## The rest of this lesson
4848
The only way to use these types of resources is by learning to use the command line. This introduction to HPC systems has two parts:
4949

50-
- We will learn to use the UNIX command line (also known as Bash).
50+
- We will learn to use the UNIX command line (also known as the Bash shell).
5151
- We will use our new Bash skills to connect to and operate a high-performance computing supercomputer.
5252

5353
The skills we learn here have other uses beyond just HPC: Bash and UNIX skills are used everywhere, be it for web development, running software, or operating servers. It’s become so essential that Microsoft now [ships it as part of Windows](https://apps.microsoft.com/detail/9nblggh4msv6?hl=en-US&gl=US)! Knowing how to use Bash and HPC systems will allow you to operate virtually any modern device. With all of this in mind, let’s connect to a cluster and get started!
5454

5555
:::tip[Key Points]
5656
- High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
57-
- These HPC systems can be used to do work that would either be impossible or much slower or smaller systems.
58-
- The standard method of interacting with such systems is via a command line interface such as Bash.
57+
- These HPC systems can be used to do work that would either be impossible or much slower on smaller systems.
58+
- The standard method of interacting with such systems is via a command line interface such as the Bash shell.
5959
:::

docs/hpc/13_tutorial_intro_hpc/03_exploring_remote_resources.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,17 @@ Questions
66
- Are all compute nodes alike?
77

88
Objectives
9-
- Survey system resources using nproc, free, and the queuing system
9+
- Survey system resources using `nproc`, `free`, and the queuing system
1010
- Compare & contrast resources on the local machine, login node, and worker nodes
11-
- Learn about the various filesystems on the cluster using df
11+
- Learn about the various filesystems on the cluster using `df`
1212
- Find out who else is logged in
1313
- Assess the number of idle and occupied nodes
1414
:::
1515

1616
## Look Around the Remote System
1717
If you have not already connected to Greene, please do so now:
1818
```bash
19-
[NetID@glogin-1 ~]$ ssh NetID@greene.hpc.nyu.edu
19+
[user@laptop ~]$ ssh NetID@greene.hpc.nyu.edu
2020
```
2121
Take a look at your home directory on the remote system:
2222
```bash
@@ -47,9 +47,9 @@ Most high-performance computing systems run the Linux operating system, which is
4747
afs bin@ dev gpfs lib@ media mnt opt root sbin@ share state tmp var
4848
archive boot etc home lib64@ misc net proc run scratch srv sys usr vast
4949
```
50-
The `/home/NetID` directory is the one where we generally want to keep all of our files. Other folders on a UNIX OS contain system files and change as you install new software or upgrade your OS.
50+
The `/home/NetID`, `/scratch/NetID`, `/archive/NetID`, and `/vast/NetID` directories are created for you by default and they are where you'll probably store most of your files, but there are other options as well. Please see the tip below and our [storage documentation](../03_storage/01_intro_and_data_management.mdx) for details about how these directories differ, as well as other storage options available. Other folders on a UNIX OS contain system files and change as you install new software or upgrade your OS.
5151

52-
:::tip[Using HPC filesystems]
52+
:::tip[Using the HPC filesystems]
5353
On Geene, you have a number of places where you can store your files. These differ in both the amount of space allocated and whether or not they are backed up.
5454

5555
- **Home** – data stored here is available throughout the HPC system, and often backed up periodically. Please note the limit on the number of files (inodes) which can get used up easily. Use the `myquota` command to ensure that you are not running out of inodes!

docs/hpc/13_tutorial_intro_hpc/04_scheduler_fundamentals.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Objectives
1616
## Job Scheduler
1717
An HPC system might have thousands of nodes and thousands of users. How do we decide who gets what and when? How do we ensure that a task is run with the resources it needs? This job is handled by a special piece of software called the *scheduler*. On an HPC system, the scheduler manages which jobs run where and when.
1818

19-
The following illustration compares these tasks of a job scheduler to a waiter in a restaurant. If you can relate to an instance where you had to wait for a while in a queue to get in to a popular restaurant, then you may now understand why sometimes your job do not start instantly as in your laptop.
19+
The following illustration compares these tasks of a job scheduler to a waiter in a restaurant. If you can relate to an instance where you had to wait for a while in a queue to get in to a popular restaurant, then you may now understand why your job might not start instantly as it does on your laptop.
2020

2121
![Job Scheduler to Waiter in Restaurant](./static/restaurant_queue_manager.svg)
2222

@@ -184,7 +184,7 @@ JOBID USER ACCOUNT NAME ST REASON START_TIME TIME TIME_LEFT NODES CPUS
184184
```
185185

186186
:::tip[Cancelling multiple jobs]
187-
We can also all of our jobs at once using the `-u` option. This will delete all jobs for a specific user (in this case us). Note that you can only delete your own jobs.
187+
We can also cancel all of our jobs at once using the `-u` option. This will delete all jobs for a specific user (in this case us). Note that you can only delete your own jobs.
188188

189189
Try submitting multiple jobs and then cancelling them all with `scancel -u NetID`.
190190
:::
@@ -212,7 +212,7 @@ Sometimes, you will need a lot of resource for interactive use. Perhaps it’s o
212212
```bash
213213
srun --pty bash
214214
```
215-
You should be presented with a bash prompt. Note that the prompt will likely change to reflect your new location, in this case the compute node we are logged on. You can also verify this with `hostname`.
215+
You should be presented with a bash prompt. Note that the prompt will likely change to reflect your new location, in this case the compute node we are logged onto. You can also verify this with `hostname`.
216216

217217
:::tip[Creating remote graphics]
218218
To see graphical output inside your jobs, you need to use X11 forwarding. To connect with this feature enabled, use the `-Y` option when you login with `ssh` with the command `ssh -Y username@host`.

docs/hpc/13_tutorial_intro_hpc/06_modules.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,18 @@ Before we start using individual software packages, however, we should understan
1515
- versioning
1616
- dependencies
1717

18-
Software incompatibility is a major headache for programmers. Sometimes the presence (or absence) of a software package will break others that depend on it. Two well known examples are Python and C compiler versions. Python 3 famously provides a `python` command that conflicts with that provided by Python 2. Software compiled against a newer version of the C libraries and then run on a machine that has older C libraries installed will result in a nasty `'GLIBCXX_3.4.20' not found` error.
18+
**Software incompatibility** is a major headache for programmers. Sometimes the presence (or absence) of a software package will break others that depend on it. Two well known examples are Python and C compiler versions. Python 3 famously provides a `python` command that conflicts with that provided by Python 2. Software compiled against a newer version of the C libraries and then run on a machine that has older C libraries installed will result in a nasty `'GLIBCXX_3.4.20' not found` error.
1919

20-
Software versioning is another common issue. A team might depend on a certain package version for their research project - if the software version was to change (for instance, if a package was updated), it might affect their results. Having access to multiple software versions allows a set of researchers to prevent software versioning issues from affecting their results.
20+
**Software versioning** is another common issue. A team might depend on a certain package version for their research project - if the software version was to change (for instance, if a package was updated), it might affect their results. Having access to multiple software versions allows a set of researchers to prevent software versioning issues from affecting their results.
2121

22-
Dependencies are where a particular software package (or even a particular version) depends on having access to another software package (or even a particular version of another software package). For example, the VASP materials science software may depend on having a particular version of the FFTW (Fastest Fourier Transform in the West) software library available for it to work.
22+
**Dependencies** are where a particular software package (or even a particular version) depends on having access to another software package (or even a particular version of another software package). For example, the VASP materials science software may depend on having a particular version of the FFTW (Fastest Fourier Transform in the West) software library available for it to work.
2323

2424
## Environment Modules
25-
Environment modules are the solution to these problems. A *module* is a self-contained description of a software package – it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.
25+
**Environment modules** are the solution to these problems. A *module* is a self-contained description of a software package – it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.
2626

2727
There are a number of different environment module implementations commonly used on HPC systems: the two most common are *TCL modules* and *Lmod*. Both of these use similar syntax and the concepts are the same so learning to use one will allow you to use whichever is installed on the system you are using. In both implementations the `module` command is used to interact with environment modules. An additional subcommand is usually added to the command to specify what you want to do. For a list of subcommands you can use `module -h` or `module help`. As for all commands, you can access the full help on the *man* pages with `man module`.
2828

29-
On login you may start out with a default set of modules loaded or you may start out with an empty environment; this depends on the setup of the system you are using.
29+
On login to Greene you will start out with an empty environment.
3030

3131
# Listing Available Modules
3232
To see available software modules, use `module avail`:
@@ -57,7 +57,7 @@ No modules loaded
5757
```
5858

5959
## Loading and Unloading Software
60-
To load a software module, use `module load`. In this example we will use R.
60+
To load a software module, use `module load`. In this example we will use `R`.
6161

6262
Initially, R is not loaded. We can test this by using the `which` command. `which` looks for programs the same way that Bash does, so we can use it to tell us where a particular piece of software is stored.
6363
```bash

0 commit comments

Comments
 (0)