You can find a set of general introductory Unix tutorials here.
The vi editor is a standard Unix text editor. On Linux, vi is represented by vim (for Vi IMproved). A tutorial for that is here.
Writing job scripts for the PBS system requires some shell scripting. On Linux, the sh shell is bash. An introduction to writing scripts in bash (with more information than is usually needed for a PBS script) is here.
First we must set up the Windows environment to connect to a Unix system. Open SecureCRT. If using your own computer, select “Connect” from the file menu or the icon. Within the connect window, click on the “New Connection” icon to create a connection to the desired host. The connection type must be ssh2, as shown in the illustration.

X is the windowing system used by Unix operating systems. If you plan to run an X server such as Hummingbird eXceed or Cygwin/xorg so as to be able to use graphical programs from the frontend, enable X11 forwarding while setting up the new connection. Click on X11 in the Category sidebar; this will bring up a dialog. Check the box to permit ssh X11 forwarding. This allows X-based GUI applications to be displayed directly to your Windows system through the X server.

For further information about using Unix from Windows, including some important notes about file editing, please read http://www.itc.virginia.edu/research/unix/windows-tools
For this example we will go through Blue.unix. Log in to blue.unix.virginia.edu using SecureCRT.
We are now ready to log in to a cluster. For this tutorial, we will use the example hostname aspen; use the appropriate name for the cluster you wish to access. If using your own computer on Grounds, you may set up a session directly to the cluster; when it is set up click “Connect” and chose the host from the list. Whether using Blue or the direct host, you will be presented with a dialog requesting your login ID and your password. If you have never logged on to an ITC research system, your password is your blue.unix password. Once those are entered, clicking “connect” will start an ssh session on the cluster.
The next illustration shows the connection window within SecureCRT.

Throughout this document, will we use an example user ID of “mst3k.” You should use your actual login ID on the research systems. We shall also use the convention that characters you should type are on a separate line and are preceded by a $ symbol; do not type the $ symbol (which is an example of a prompt). The numbers at the right of these lines are for reference only; do not type them.
If using Blue, when you have logged on to Blue you should type
ssh +x -l mst3k aspen.itc.virginia.edu
Once you have logged on, you should be in your home directory on Aspen, in your default shell. In Unix, a shell is a program through which you can interact directly with the operating system – no icons required! However, using a command-line interface (CLI) can be confusing at first for those who are accustomed to using only GUIs. Once a CLI is mastered, however, it is far more efficient than most GUIs.
ITC's default shell is the Korn shell(ksh). In order to use the batch systems correctly, you may first need to edit your .profile file. Instructions can be found at
http://www.itc.virginia.edu/research/linux-cluster/aspen/tutor.html#Configuring Your Account
Many choices exist for editing files on the Unix systems. The standard Unix editors are command-oriented screen editors; that is, the file is presented as text on a screen and the keyboard is used to move around and make changes. The standard Unix editor is called vi (for “visual”), but emacs has a large and intensely loyal following. Most Unix users learn one or the other; some use alternatives such as nano, a variant of emacs that provides a menu. However, for beginners a GUI editor is easiest. Of the GUI editors, nedit is widely available and easy to use.
Nedit is an X application. The X Windowing System is used on all Unix systems; the program that actually displays the pixels is the X server, while the program that wishes to display is called the X client. Thus the X server runs on the local machine, while the client runs on the remote system. One of the more popular commercial X servers for Windows is eXceed, by Hummingbird. If you have eXceed on your system, start it now. Do not start an xterm (a terminal); simply start eXceed and allow it to set up a toolbar. It will run in the background and display clients as requested.
Not all users wish to pay for eXceed. A free alternative is Cygwin/xorg. These are not supported but for most users they work well. If you install Cygwin from www.cygwin.com, be sure to include xorg and ssh; you will need both. When using Cygwin, clicking on the Cygwin icon opens a local shell (notice that bash is the default, not ksh, though ksh is available for Cygwin). The command startx starts xorg, which by default appears as a bash shell on the local Windows machine. To log in to a remote Unix system with xorg, simply start an xterm and issue the command
ssh -X -l mst3k aspen.itc.virginia.edu
where the text after “-l” is your login on the remote system.
XLiveCD is a simplified version of Cygwin/xorg which can run from the CD drive; it was put together by Indiana University. It is rather slow to start but once loaded into memory it works well. The creators recommend that it not be installed on a system with Cygwin already present, however.
An X server is not required to use the system, but without one you will not be able to use X-based applications, such as the debugger TotalView.
If you are not running an X server locally, use a console-based editor. If you do not know vi or emacs, we recommend nano, which is simple and mostly self-explanatory. Regardless of your choice of editor, we recommend that you edit on the Unix system due to certain incompatibilities between Windows and Unix.
Log in to the Aspen cluster, either via SecureCRT on Windows or by the ssh client used on Linux (or other Unix) or Mac OSX. If you are running an X server, type at the prompt:
| $printenv DISPLAY | 001 |
and confirm that you see a response similar to
$localhost:17.0
The numbers following the colon will vary. If using the GUI editor, you should now be ready to start nedit. Type
| $nedit | 002 |
A new window should appear. Open your .profile file and make any needed changes as indicated on the Web site referenced above, Configuring Your Account.
If you are not using a GUI editor, type the name of your editor (nano in the example) followed by the name of the file.
| $nano .profile | 003 |
This screenshot shows the nedit window on a Windows desktop.
Type
| $man ls | 004 |
PAGER=moreThis should work on all ITC Unix systems. It is important to be able to run the man (for manual) command, since it is the quickest source of information about usage and options for most Unix commands.
When you have completed editing your files, exit the editor.
In order to continue in the current session, you must force any changes to .profile or .variables.ksh to take effect. This is accomplished by sourcing the .profile file:| . .profile | 005 |
We are now ready to begin working with the Aspen cluster. Make sure you are in your home directory
| $pwd | 006 |
Like many Unix commands, it is three letters long—it stands for “print working directory.” You should see a response similar to
$/home/mst3k
Let's look at the files in your home directory. From your shell, issue the command
| $ls -a | 007 |
This means “list all files, including hidden files.” In Unix, hidden files begin with a period and for this reason they are often called “dotfiles.”
Some system administrators set users' profiles so that the ls command is aliased to ls -a (possibly with other options to ls added). In such a case, dotfiles will always appear when files are listed.
Copy the tutorial files to your home directory. The files are in a bundle called a “tarfile.” Tar (tape archive) is the standard archival format for Unix.
| $cp /export/rescomp/tutorial.tar . | 008 |
The period at the end of the line means current directory
and
it is required. This command means copy the file /export/rescomp/tutorial.tar
into the current working directory.
Now unpack the tarfile; it will create a directory called “tutorial.”
| $tar xf tutorial.tar | 009 |
Change directory into the tutorial directory.
| $cd tutorial | 010 |
You should see several files. They consist of a short Fortran code, several example makefiles, and some example PBS scripts.
We are going to build and run this simple code. The first task is to choose a compiler. Documentation for ITC-supported compilers can be found at http://www.itc.virginia.edu/research/compilers.html. If you are not familiar with using compilers under Unix, please read this documentation before you begin this section.
We shall start with the Intel compiler. The clusters use the module system to set up various software packages. (See http://www.itc.virginia.edu/research/modules/home.html for information about using modules.) In order to use the Intel compiler, type| $module add ifort | 011 |
A path is a list of directories that the system will search when looking for executables. If you invoke a command or other executable, whether directly (from the command line) or indirectly (e.g. from a script), the system will not be able to find the command and an error will result. A default path is set by the system when you log in. You can change your path by loading and unloading modules. To check what your current path is, type
| $printenv PATH | 012 |
We can confirm that the system knows where the compiler is located by typing the command
| $which ifort | 013 |
The response should be similar to
$/opt/intel/fc/bin/ifort
Before the code can be run, it must be compiled. The “make” program greatly simplifies management of codes. Documentation on this command can be found starting from http://www.itc.virginia.edu/research/make.html
Make uses instruction files called description files, usually known colloquially as just “makefiles.” Makefiles can become quite complicated and their syntax is quirky, but a basic script for the compilation of a typical code is not difficult to understand.
Start your chosen editor again. If using nedit, this time use the command
| $nedit & | 014 |
The ampersand tells the system to run nedit in the background. You should see a prompt in your SecureCRT shell, which means you can run commands while editing files.
Open the file makefile1. The first line is
prog: main.o sub1.o sub2.o
In this line, “prog” is the target, the file to be created. The names after the colon are the dependencies or prerequisites of this target; those are the files that are required in order to build the target. However, we have not told make how to build the target. This is accomplished by specifying a rule. Make contains a set of built-in rules for common types of target, but we will specify the particular rule we want.
Immediately after the first line, add the line
<tab>ifort -o prog main.o sub1.o sub2.o
The tab is critical; one of make's idiosyncracies is that every rule must begin with a tab character.
Save the changes as makefile and close the file. Now in your shell type
| $make prog | 015 |
Compilation should begin. When it completes, you should have new files sub1.o, sub2.o, main.o, and prog. Let us look at the permissions of these files
| $ls -l | 016 |
Among the files you should see something like
-rwxr-xr-x 1 mst3k users 433032 Dec 1 16:57 prog
The leftmost column gives the permissions on the file. The leading dash is usual for ordinary files. The next three letters are the permissions for the file owner; in this case the owner has read, write, and execute permissions. The next three indicate group permissions; members of the group “users” can read and execute this file. The final three are for all users; in this case “world” permissions are the same as for group. The x permission must be present, at least for the owner, in order for the file to be executed. The binary files generated by compilers normally received execution permission automatically.
Make manages recompilation automatically. Suppose you later decide you want to change one of the functions to be evaluated from cosine to sine. We will make this change in sub2.f90. Open this file in your editor, find the string cos and replace it with sin (no “e” on the end). Close the file and type
| $make | 017 |
If no target is specified, make builds the first target it finds in the makefile. You should find that make recompiles only the file that changed; then it automatically relinks to build a new executable.
This makefile is the simplest possible. A feature of make that adds flexibility is its ability to use macros, which are definitions of variables for string substitution. Suppose you could run this code on either of two systems; on one the preferred compiler is the Intel compiler ifort, whereas on the other the compiler is the Portland Group compiler pgf90. You can use a variable for the compiler and, if necessary, for the compiler options, so that it will be simple to change compilers. The file makefile2 illustrates this. Open makefile2 in your editor. At the top are the variable declarations. The first is for the Fortran compiler. Later we see a line
LDR = $(FC)
It is generally preferable to use the compiler as the loader to link the executable; by introducing the variable we can automatically change the linker to correspond to the selected compiler.
We have decided to try a different compiler from the one we have been using. First check which compiler modules are loaded:
| $module list | 018 |
You should see
Currently Loaded Modulefiles:
ifort/8.1
To see which compilers might be selected, get a listing of all available modules:
| $module which | 019 |
One of the modules is pgi. To avoid conflicts, remove the Intel module:
| $module unload ifort | 020 |
and load the pgi module
| $module load pgi | 021 |
Check that this has been accomplished correctly:
| $which ifort | 022 |
| $which pgf90 | 023 |
Open makefile2 in your editor again, or go back to your editor window if using a GUI editor. Choose pgf90 by deleting the hash mark (#) at the beginning of its line. Leave the other line alone. Save your changes as makefile (overwrite your old makefile). Type
| $rm *.o prog | 024 |
This will remove the executable and all the object files so that the code will be recompiled. It is extremely important to do this if you change compilers, since different compilers use different formats for the object files. Now type
| $make | 025 |
and the executable should be recreated.
If you wish, go back to makefile (not makefile2), replace the hash mark in your compiler line and remove it from the other compiler, then repeat the above steps to build your executable.
Another very handy feature of make is suffix rules. These are symbols that stand for common types of target or rule. Close makefile2 (or makefile) in your editor and open the file makefile3. As in makefile2, first several variables are declared. Next comes a suffix rule: we specify to make how all files with the suffix .f90 are to be handled. The SUFFIXES line defines the file types to be handled. Following this is a line that tells make to produce a .o file from a .f90 file. (Make generally contains this relationship as part of its set of default rules, but it can be specified as part of a suffix rule.) After this line is the rule. In this case it states how all .f90 files are to be compiled—if any such file needs special treatment, a separate rule must be written for it. Like all rules, the suffix rule begins with a tab character. Suffix rules can include special symbols; the most common one, used here, is $< which stands for “current prerequisite.”
Makefile3 also contains two handy targets. “Clean” is fairly standard and is included in many makefiles; in this example it removes only object files, but sometimes it also deletes executables and other ancillary files. Our makefile3 contains another target, “clobber,” to remove both the object files and the executable.
Copy makefile3 to makefile:
| $cp makefile3 makefile | 026 |
Delete all object and binary files:
| $make clobber | 027 |
Recreate the binary:
| $make | 028 |
Now suppose we decide we need to debug the code. Debugging requires that the -g option be passed to the compiler. Edit makefile appropriately to make this change.
Adding the debug flag means that the object files and the binary are affected. Check the size of the executable:
| $ls -l prog | 029 |
Now recompile with the debug flag turned on. Remember to remove the object files and binary (how can this be accomplished?) before building. Check the size of the new binary:
| $ls -l prog | 030 |
How do the sizes compare? The debugging-enabled prog should be larger.
Now that we can create executables, we are ready to run jobs. Editing, compilation, and other short jobs can be performed on the Aspen frontend, but all actual computing must be carried out on the nodes, through the PBS resource manager. In order to use the cluster, a job script is prepared on the frontend and submitted to PBS.
Examine the file tutor.sh
| $more tutor.sh | 031 |
The first line specifies that it is a shell script. The next line is a directive to PBS. It will be ignored by the shell because it begins with a hash symbol, but PBS will use it to set the parameters of the job. The first directive requests one node and one processor on the node (ppn=1 stands for “processes per node”). PBS directives should all be located at the beginning of the script.
When the job is submitted, PBS reads the directives and places the job into the queue. When a processor becomes available, PBS requests that the assigned processor execute the job script. Any line not beginning with a hash mark is interpreted by the shell on the job's node. Most of the commands used on the command line can be inserted into a shell script. In our example, the lines following the PBS directives tell the shell to print the name of the host and the date and time at which the job begins. Next we change into the directory from which the job was submitted; this is represented in PBS with the environment variable PBS_O_WORKDIR. (PBS sets a number of these variables when the job is submitted.)
Next we execute our program. The leading ./ indicates that the executable resides in the current directory; it is a good idea to include this because sometimes the current directory is not in the user's path.
Finally, when the program has completed we print the ending time. PBS will return the results to us.
We submit a job to be run with the command qsub. Submit this job to the queue:
| $qsub tutor.sh | 032 |
PBS will return a number followed by an identifier for the frontend; this is the job id. Make a note of the number because it is how you will keep track of your job(s). Check on the job's progress:
| $qstat -a | 033 |
This will show you a list of all jobs in the queue, including those waiting for the requested number of processors to become available. If you wish to zero in on your job, type
| $qstat -a | grep <jobno> | 034 |
where <jobno> is the number that you received when the job was submitted. The vertical bar is a pipe; it tells the shell to take the output of the previous command (qstat -a) and send it as input to the command that follows the pipe. Grep, which stands for something similar to “get regular expression and print,” is a command that allows you to find strings in a group of files. For more information about general Unix usage, see http://www.itc.virginia.edu/research/unixbasics.html or the many Unix references available online and in book form.
If you want all available information about your job, type
| $qstat -f <jobno> | 035 |
This will return a great deal of output, so you may wish to scroll through it:
| $qstat -f <jobno> | more | 036 |
PBS provides many options so a job script can specify quite a few directives. Several examples are given at http://www.itc.virginia.edu/research/linux-cluster/aspen/tutor.html#PBS%20Job%20Command%20File
Go to this page and look at the first example; it contains many of the most common directives and provides an explanation of each one. Particularly useful is the directive for walltime; if it is absent the default is used, which on the clusters is 48 hours. Shorter jobs have higher priority and we know our job is very short. If your original job is still running, delete it from the queue:
| $qdel |
037 |
Copy the tutor.sh file to a new job script
| $cp tutor.sh tutor2.sh | 038 |
Edit tutor2.sh, adding a line to request only an hour of walltime. Submit this job:
| $qsub tutor2.sh | 039 |
Check on your jobs' status:
| $qstat -a | 040 |
If you have trouble finding your jobs, use the -u option to qstat:
| $qstat -u mst3k | 041 |
In Unix, every executable has associated with it two files, standard output (stdout) and standard error (stderr). We can assign names to these files with PBS directives
#PBS -o outname
#PBS -e errnameIf we do not specify, PBS will return them with default names given by
<job script name>.o<jobno>
<job script name>.e<jobno>
for standard out and standard error respectively. This pattern is sometimes useful to keep the output of different jobs separate.
Check your directory
| $ls | 042 |
If your jobs have finished, you should have new files; standard output, standard error, and a file called output. Look at your standard output and standard error files. If the job was successful, standard error should be empty and standard output should contain the information about the node and the starting and ending times. The file “output” contains the actual results from the run of the program.
Is the file “output” present? If not, what does the standard error file say? It appears that the colleague who gave you this program must have somewhere specified for some file an explicit path that is incorrect in your environment. Type
| $grep mst3k *f90 *h input | 41 |
This will search for all occurrences of the string “mst3k” in text all files. (Avoid using grep on object or binary files, as unexpected results can occur.) You should get a response similar to
input:restart_file = '/home/mst3k/tutorial/restart'
This tells you that the line starting with “restart_file” in the file “input” contains the string. No file “home/mst3k/tutorial/restart” is available to your job. Edit “input,” changing the string mst3k to your login ID on the system. Resubmit the job. Did you get an output file this time? Is standard error empty?
Please note that the name output is not unique to each job, so it can be overwritten if you do not move it elsewhere before another job completes. To avoid this, be sure to change the name between jobs:
| $mv output output.<jobno> | 043 |
Mv (“move”) changes the name or location of a file in Unix. (Remember that the true “name” of a file includes the full path to it; in this sense move simply renames files.) The current name comes first followed by the new name. In the example above, output.<jobno> is simply an example; you could call it whatever you like.
You may wish to transfer your output file to your local desktop system. On Windows you can use SecureFX to accomplish this. Either open SecureFX from the menu, or click on its icon in your SecureCRT window. Make a connection to aspen. You can maneuver around your directories in the usual way to find the file you wish to transfer and that the location to which you wish to copy it. Once you are where you want to be in both panes, drag the file's icon from the Aspen pane to the local pane.
If the application you wish to run has already been compiled for the appropriate architecture, either because it was supplied to you in executable form or because it is a package such as licensed software (Matlab, Gaussian98, etc.), you can simply submit it via PBS. Make sure to add any required directives so that the software will run in batch (noninteractive) mode.
Suppose that we have parallelized our code and wish to compile and run it on the clusters. Return to your home directory
| $cd | 044 |
Copy the mpi_tutorial tarfile from the same location as the base tutorial:
| $cp /export/rescomp/mpi_tutorial.tar . |
Untar it:
| 045 |
Change into the directory:
| $cd mpi_tutorial | 046 |
Compiling an MPI job is slightly more involved than is a serial job. In almost all cases, we should use the script provided by our implementation to make the job easier. Three or four of these scripts are available in every MPI installation: mpicc (for C codes), mpicxx (for C++ codes), mpif90 (for Fortran 90), and sometimes mpif77 (for Fortran 77, but often these codes are handled by mpif90). Our examplex in this case are a C++ code and a Fortran code, each of which does little other than print its process number, so the makefile is extremely simple. First we must decide which compiler and which communication library MPI should use. On Aspen, only Ethernet is available for communications, so we have no choice there. On the Birch cluster, however, both Ethernet and Myrinet are available.
The compiler of our choice will be
automatically loaded by the OpenMPI module. Look at the available
choices again:
| $module which | 047 |
Check which modules you have loaded:
| $module list | 048 |
We decide to use the PGI compilers over Ethernet. Remove any conflicting compiler module; if the Intel module is present, unload it:
| $module unload icc | 049 |
or
| $module unload ifort |
Add the appropriate OpenMPI module:
| $module load ompi-pgi | 050 |
You should choose to compile either the C++ or the Fortran version depending upon which language you plan to use for your code. (You can repeat the exercise for the other language later.)
Make sure that mpicxx or mpif90 is in your path:
| $which mpicxx | |
| $which mpif90 | 052 |
See the subsection Using Modules
above for more
information about modules.
Now compile the code:
| $make -f Makefile.cc | 053 |
or
| $make -f Makefile.f | 054 |
To learn more about Makefiles, see the above section
Make and Makefiles
and the ITC
Web pages.
Submit the job:
Look at mpihello.sh:
| $more mpihello.sh | 056 |
It uses the mpiexec command to execute the job. More information about this command can be obtained from its “manpage”:
| $man mpiexec | 057 |
Study the list of options. You will probably never need any others than -comm. How would you change it to run this program over Myrinet on Birch?
This job may be slow to begin (to “roll in,” as it is sometimes called in queuing systems), since it is requesting two processors and will occupy a node completely.
Many, probably most, parallel jobs need significant working storage Currently, each cluster includes one storage node consisting of a 1-Terabyte RAID. Input and output files can be copied at the beginning and at the end of the job. This is not permanent storage; files should be removed promptly, or they will be scrubbed after an interval of two weeks.
Information about creating a PBS job script for such a job is
here.
An example is provided in mpi_tutorial as bighello.sh
. Examine
this script. Notice that it also contains commands to print starting and
ending times and a list of compute nodes to standard output. Submit this
script. Be sure to make note of the job id that is returned. When
the job has completed, look at the appropriate directory
| $ls /bigtmp/pbstmp.<JOBID>.lc0 | 058 |
where JOBID is the number of the job. In this particular case there is little to be copied back, so the only content of the directory is the executable. In general, all the contents of the local directory would be transferred at the end of the job. Be sure to copy these files to more permanent storage within 72 hours, since /bigtmp is scrubbed.
Totalview is one of the few debugging tools that work for MPI programs. In order to use Totalview, you must first configure your system. Totalview is an X application, so an X server must be running on your workstation. If you are using Linux (or another Unix such as Solaris) or Mac OSX 10.3 or greater, an X server is built into your system. For Mac OSX versions below 10.3 you must obtain and install XDarwin. Windows users must install either a commercial X server such as eXceed or the free Cygwin server, as indicated earlier in this tutorial. Some further configuration is also required if you are behind a firewall; see the Totalview documentation for details.
Edit the appropriate Makefile to add the option -g either to FFLAGS or to CFLAGS. Recompile your executable. In order to use Totalview interactively, you must submit an interactive PBS job. The job script debug.sh is an example of such a job. Note that it includes only a few PBS directives -- no job is run because this will be performed interactively.
Type
| $qsub -I debug.sh | 059 |
Your interactive job must wait in the queue just like any other; it is thus prudent to request the minimum resources, such as walltime, that will enable you to complete your task. When your job is rolled in, you will find yourself in a shell on one of the compute nodes. Change into your directory:
| $cd mpi_tutorial | 060 |
For an MPI job, Totalview is invoked through the mpiexec or mpirun command. Type
| $mpiexec -tv [other options] mpihello | 061 |
If you have configured your system correctly, a Totalview window will appear. Totalview behaves much like other debuggers but can attach to individual processes as well as to the root process. We can do little but run this trivial code, so next through it. When it asks whether the parallel job should be stopped, click no and continue stepping. (Clicking yes allows you to set breakpoints and the like before continuing with the job.) To cycle through the processes, use the P+ and P- buttons on the far right of the toolbar. To obtain a separate window for each process, select the process in the process window, right click, and select Dive Anew.
As you step through each process, printing to standard output should appear on your terminal screen.
The ITC Totalview Web page contains links to some tutorials; the LLNL tutorial is particularly detailed, and contains information about using Totalview with parallel codes.
You should now understand the basics of using the system. You may do whatever you wish with your tutorials directories; feel free to use them to experiment on your own. Some suggested experiments:
If you have any questions, contact the Research Computing Support Group at 243-8799 or email res-consult@virginia.edu