This tutorial is designed for researchers who are new to the Birch Linux cluster. It covers basic information about the cluster, as well as how to create and submit batch jobs using the PBS resource management software. It also contains sample job command files that can be used as templates for running jobs under PBS.
The Birch Linux cluster uses Red Hat Linux as its operating system and the Portable Batch System (PBS) software to distribute the computational workload across the nodes. PBS is a batch job scheduling application that provides the facility for building, submitting, and processing batch jobs on the cluster.
Jobs are submitted to the cluster by creating a PBS job command file
that specifies certain attributes of the job, such as how long
the job is expected to run and how many nodes of the cluster are requested.
PBS then schedules when the job is to start
running on the cluster, runs and monitors the job
at the scheduled time, and returns any output to the user once the job
completes.
Important notice for Windows users: do not use a standard Windows editor such as Notepad to edit files that will be used on the Linux or other Unix systems. The two systems use different sequences of control characters to mark the end of line (EOL). If you are using the clusters from a Windows system, there are a number of options:
When a job is submited to the cluster through PBS a new login to your account is initiated, and any initialization commands in your startup files (.profile, .variables.ksh, .kshrc etc) are executed. PBS shells are not interactive and hence it is necessary to disable the interactive commands such as setting tset and stty. If these precautions are not taken then error messages will be written to the batch job's error file and your program may not run.
The recommended procedure to disable the interactive sections of the startup files is to test the environment variable PBS_ENVIRONMENT, which is set when PBS runs. If the variable has been set, meaning a PBS job has initiated the login, the interactive parts of the startup files are skipped.
Below is an example of a .profile file configured for use with PBS on the Birch
cluster.
# The following command exports variables set here to your user shell.
set -a
# This command runs your ".variables.ksh" file.
. ${HOME}/.variables.ksh
# Exclude interactive commands & umenu if LL_JOB is TRUE (SP batch job)
# or PBS_ENVIRONMENT is set (PBS batch jobs on any architecture)
if [ -z "$LL_JOB" -a -z "$PBS_ENVIRONMENT" ] ; then
# Make /home/ intial Linux command prompt directory path
cd /home/$USER
# Interactive lines such control key and terminal settings go here
# Close exclusion of interactive section (SP and PBS batch job requirement)
fi
The following link shows a complete .profile modified
to run PBS jobs using the Korn shell. If you are using the tcsh
shell, the following link shows a .login modified to run
PBS jobs. You should also make sure any stty commands are done inside
the PBS exclusion test in the .profile or .login.
Note: if you have trouble using the man command on Birch, in your .variables.ksh file replace the line
PAGER=/usr/bin/morewith
PAGER=moreThis should work on all systems since more is normally in the path automatically.
Note that tcsh users may get the warning "Warning: no access to tty, thus no job control in this shell" as part of their PBS job output. This is documented on page 18 of the PBSPro User's Guide and does not affect the job.
To allow access to the PBS commands and manual pages, the appropriate paths have been added to the system PATH and MANPATH environment variables. Users should make sure they are including the system PATH and MANPATH variables as part of their account PATH and MANPATH variables (e.g. in .variables.ksh, PATH=${HOME}/bin:${PATH}:/home/loadl/bin:.).
Users may need to modify their PAGER variable (typically in the .variables.ksh file) to be /bin/more so that the man command will work correctly on the cluster.
In general, the Intel compiler offers better performance, as well as the only Fortran 90/95 compiler on the cluster, but there are circumstances under which the gnu compiler might be chosen. Whether the Ethernet or Myrinet message layer will be preferable depends upon the application. Most "typical" code will probably run better under Myrinet, but code with relatively little communication relative to computations may well perform better under Ethernet. Both should be tried when a new code is introduced onto the cluster.
The modules command has a number of options, some of which are similar. For example, module add is synonymous with module load.
A full listing of the available modules can be obtained by typing
module which
Executing module which on Birch at a particular time yields
icc/7.0 : loads the Intel C++ Compiler Environment ifc/7.0 : loads the Intel Fortran Compiler Environment imsl/5.0 : loads the IMSL scientific library mpich-eth-gnu/1.2.4 : loads the mpich environment for Gnu over Ethernet mpich-eth-intel/1.2.4: loads the mpich environment for Intel over Ethernet mpich-eth-pgi/1.2.5 : loads the CRAY Freeware GNU mpich environment mpich-gm-gnu/1.2.4..9: loads the mpich environment for Gnu over Myrinet mpich-gm-intel/1.2.4..9: loads the mpich environment for Intel over Myrinet pgi/4.0 : loads the PGI Compiler Environment pgi/5.0 : loads the PGI Compiler Environment
icc [options] file.c, file.C, file.cpp, file.cxx (C, C++) ifort [options] file.f, file.f90, file.for, file.ftn (Fortran 77/90/95)For a complete list of options consult the relevant compiler man page, e.g. man ifc from you account on birch.itc. More detailed information about the Intel compilers can be found at
wwww.itc.virginia.edu/research/intel/Documentation links are provided near the end of the page. Information about installing these compilers on your own Linux workstation can also be found on this Web page.
To compile parallel programs, the open source MPI (Message Passing Interface) libraries MPICH have been compiled with both the gnu and the Intel compilers. The appropriate module must be loaded in order to set up the correct compiler environment. MPICH is specific to compiler and to networking protocol. For example, to use an MPICH compiled with the Intel compiler over the Myrinet networking protocol, which is available only on Birch, the command would be
module load mpich-gm-intelOnce a module is loaded, the following commands should be used to compile programs that use MPI code:
mpicc [options] file.c (C) mpiCC [options] file.C (C++) mpif77 [options] file.f (Fortran 77) mpif90 [options] file.f (Fortran 90)
The following webpage provides information on using the MPI libraries.
www.itc.Virginia.EDU/research/mpi/Once you have obtained an executable version of a program you want to run, whether it's source code you've compiled yourself or a third party software package, you must use the PBS resource manager to run the code on the cluster.
The PBS resource management system handles the management and monitoring of the computational workload on the Birch cluster. Users submit "jobs" to the resource management system where they are queued until the system is ready to run them. PBS selects which jobs to run, when, and where, according to a predetermined site policy meant to balance competing user needs and to maximize efficient use of the cluster resources.
To use PBS, you create a batch job command file, which you submit to the PBS server to run on the cluster. A batch job file is simply a shell script containing the set of commands you want run on some set of cluster compute nodes. It also contains directives which specify the characteristics (attributes), and resource requirements (e.g. number of compute nodes and maximum runtime) that your job needs. Once you create your PBS job file, you can reuse it if you wish or modify it for subsequent runs.
PBS also provides a special kind of batch job called interactive-batch. An interactive-batch job is treated just like a regular batch job in that it is queued up, and must wait for resources to become available before it can run. Once it is started, however, the user's terminal input and output are connected to the job in what appears to be an rlogin session to one of the compute nodes. Many users find this useful for debugging their applications or for limited computational steering.
PBS provides two user interfaces for batch job submission: a command
line interface (CLI) and a graphical user interface (GUI). Both
interfaces provide the same functionality
and you can use either one to interact with PBS. The CLI lets you type commands
at the system prompt. The GUI is a graphical point-and-click
interface; it is invoked with the command
xpbs. A screen shot of xpbs is here.
The xpbs interface is composed of three windows: the first is the
"Hosts Panel" and displays the the hostnames of the machines running PBS
servers to which jobs can be submitted. In the case of the Birch cluster, the
PBS server is running on the front-end login host birch.itc.virginia.edu and
is labeled lc1. The second window is the "Queues Panel" and displays
information about the queues managed by the server host selected in the
"Hosts Panel". It shows the single queue "workq" in this example. The
third window is the "Jobs Panel" and displays information about jobs that are
found in the queue(s) selected from the Queues listbox.
Further information about how to configure and use the xpbs
interface can be found in Chapter 5 of the
PBS Pro User Guide. The remainder of this
tutorial will focus on the PBS command line interface. More detailed
information about using PBS can be found in the PBS Pro User Guide.
To submit a job to run on the cluster, a PBS job command file must be
created. The job command file is a shell script that contains PBS directives
which are preceded by #PBS. The following is an example of a
PBS command file to run a serial job, which would require only
1 processor on one node. In this example, the executable to be run is
named serial_executable.
#!/bin/sh #PBS -l nodes=1:ppn=1 #PBS -l walltime=12:00:00 #PBS -o output_filename #PBS -j oe #PBS -m bea #PBS -M userid@virginia.edu cd $PBS_O_WORKDIR ./serial_executable
The first line identifies this file as a shell script. The next several
lines are PBS directives that must precede any commands to be executed
by the shell (e.g. the last two lines). The PBS derectives illustrated
are explained in the table below:
PBS Directive Function
#PBS -l nodes=1:ppn=1 Specifies a PBS resource requirement of
1 compute node and 1 processor per node.
#PBS -l walltime=12:00:00 Specifies a PBS resource requirement of
12 hours of wall clock time to run the job.
#PBS -o output_filename Specifies the name of the file where job
output is to be saved. May be omitted to
generate filename appended with jobid number.
#PBS -j oe Specifies that job output and error messages
are to be joined in one file.
#PBS -m bea Specifies that PBS send email notification
when the job begins (b), ends (e), or
aborts (a).
#PBS -M userid@virginia.edu Specifies the email address where PBS
notification is to be sent.
#PBS -V Specifies that all environment variables
are to be exported to the batch job.
It is not necessary to use the -j (join) directive; sometimes it is helpful
to keep the output and error files separate. If -o or
-e directives are not specified, PBS will assign a name to each consisting
of the name of the script concatenated with .o
The following is an example of a PBS email notification to the user at the
end of the job:
Note that the walltime-used information in the email should be used to
accurately estimate the walltime resource requirement in the PBS job
command file for future job submissions so that PBS can more effectively
schedule the job. When submitting a particular PBS job for the first time,
the walltime requirement should be overestimated to prevent premature
job termination. The walltime measurement corresponds closely to the
job cpu time since each job is allocated its own processor for execution.
After the PBS directives in the command file, the shell executes a change
directory command to $PBS_O_WORKDIR, a PBS variable indicating the
directory where the
PBS job was submitted and nominally where the progam executable is located.
Other shell commands can be executed as well. In the last line, the
executable itself is invoked.
If the executable is a parallel program using the the Message Passing Interface
(MPI), then it will require multiple processors of the cluster to run and this
is specified in the PBS nodes resource requirement. In addition, the MPI script
'mpiexec' is used to invoke the parallel executable. The following is an
example of a PBS command file to run a parallel (MPI) job over Myrinet:
In this case the PBS nodes resource requirement specifies 2 processors per
node on 4 nodes, for a total of 8 processors. The default behavior of
mpiexec is to use all processors assigned by PBS, so it is not necessary
to specify a processor number. See the manual page (man mpiexec)
for more information about this command.
Parallel jobs
should always specify a nodes requirement of 2 processors per node to
efficiently partition the compute nodes for these jobs.
The PBS job command file can be given any name, although it is usually
appended with a .sh extension to indicate it's a shell script, or perhaps a
.sub to indicate is is a script to be submitted by qsub. The link
pbs_script.sh is an example PBS job script that
runs the High Performance Linpack benchmark across 4 nodes using the input
file HPL.dat. You can download these to your
cluster account and use them to test PBS job submission described below.
Remember to change the userid placeholder in the PBS email directive to
your own.
There are many options to the qsub command, as can be seen by
typing man qsub at the Linux command prompt on birch.itc or by
examining the
PBS Pro User Guide. Three of the more
useful ones are the -W option for allowing specification of
additional job attributes, the -I option which declares that the job
is to be run "interactively", and the -l option which allows resource
requirements to be listed as part of the qsub command. These are
discussed below.
Specifying Job Dependencies
The -W option allows for the specification of additional job attributes. In
particular, the "-W depend=dependency_list" option to qsub defines
the dependency between multiple jobs, which is useful if the jobs need to
execute in a certain order. For example, if pbs_script2.sh should not start
executing until pbs_script1.sh successfully completes because it needs a
file that pbs_script1.sh creates, then these two jobs should be submitted
to PBS in the following manner:
Other options for arguments of the dependency list are detailed in Chapter 8
of PBS Pro User Guide as well as the online manual
page for qsub; type man qsub at the Linux command prompt.
Submitting an Interactive Job
The -I option of qsub declares that a job has to be run
"interactively." The job will be queued and scheduled as any PBS batch job,
but when executed, the standard input, output, and error streams of the job
are connected through qsub to the terminal session in which
qsub is running. Interactive jobs with PBS should be used only for
the purposes of testing and debugging the user's code, e.g. in cases using the TotalView debugger.
Once the PBS intereactive job is executed, the terminal
session will be logged into one of the compute nodes allocated by PBS. The
executable can then be invoked manually from the Linux command prompt.
To insure that a PBS interactive job is executed quickly, a small number
of nodes and a short wallclock time should be specified. These reduced resource
requirements can be listed as arguments of qsub with the
-l option.
The following is an example of running the High Performance Linpack Benchmark
as an interactive PBS job using 4 nodes and 10 minutes of walltime. Note
that the terminal session is actually logged into node compute-0-4.
Job Submission Policies
Because its primary mission is to support parallel computing, Birch is
configured to favor such jobs. Users are restricted to a maximum of
4 jobs at one time in order to keep nodes available for parallel jobs. ITC
reserves the right to make changes to the scheduling policy and/or the
queue configuration in order to increase utilization of the cluster or
to ensure that parallel jobs can be run.
The Aspen
cluster has no such restriction, and should be the primary resource for users
who must run a large number of serial jobs.
All PBS jobs submitted by users of the cluster will go to one execution
queue called
workq; the scheduler will first sort them by giving higher run priorities
to jobs requiring shorter walltime and smaller node resource requirements.
The scheduler further modifies these priorities based on a fair-share algorithm
that tries to guarantee that on average, all users will get an equal amount
of computing time. Finally, jobs that have been waiting to run for more than 24
hours will be considered "starving" and assigned a higher priority.
PBS is currently configured to limit the maximum amount of walltime a single
job can use to 168 hours. When that time limit is reached, the job will be
terminated whether it has completed or not. This insures that no one job
can monopolize cluster compute nodes indefinitely. The time limit underscores
the need for users to implement some type of save-restart mechanism in
their code so they can restart the job close to where it was stopped and not
lose all the work done up to that point. The following URL provides some
guidelines for implementing save-restart in your code:
PBS also imposes a limit on the number of processors
users can require, based on how busy the cluster is. If no jobs are waiting
in the queue to run, a user can request up to 64 processors. If the cluster
becomes busy and jobs are waiting in the queue to run, the processor limit
is reduced automatically to 32 processors in order to increase turnaround time.
The PBS configuration and scheduling policies used on the cluster will be
periodically reviewed and modified as needed to insure efficient and
equitable use of this high performance computing resource.
Researchers with extraordinary needs for the cluster, either in terms
of extended compute time or number of nodes, should contact the Research
Computing Support Group at res-consult@virginia.edu to discuss making
special arrangements to meet those needs.
The qstat -a command is used to obtain status information about jobs
submitted to PBS.
The following example shows how to use the qstat -f command to get
detailed information on a specific job using its job identification number.
For further information about the qstat command, type
man qstat on the cluster front-end machine lc0.itc or see the
PBS Pro User Guide.
For further information about the qdel command, type man qdel
on the cluster front-end machine lc0.itc or see the
PBS Pro User Guide.
In this section are a number of sample PBS command files for different types
of jobs.
A perl script call tmpsync has been installed on the Birch Linux Cluster
to allow users to run programs that generate large scratch or output
files without exceeding their home directory disk space quota. The PBS
command file below shows how tmpsync can be used with the scatter/collect
options to distribute/collect files associated with a
parallel program to/from disk space on the cluster compute nodes. Once the
program has completed, the front option
to tmpsync can then be used to copy output files from the master compute
node to /bigtmp on the frontend node (birch.itc). Files older than 72 hours
are removed from /bigtmp, so users should download their output file to their
own longer-term storage. See the section
File Transfer to
and from the Cluster for details about copying files.
Note that on the Birch
Cluster, /bigtmp is local to to birch.itc and does not exist on the compute
nodes. Also note that if the program is serial rather than parallel,
the scatter and collect operations of tmpsync would not be needed since
there would only be one execution node.
Note: in this script, there should be no spaces around the equals sign
in the line LS="/jobtmp/pbstmp.$PBS_JOBID".
If you are transferring
to and from a Unix system (this includes Linux), the
following are examples of transferring files from a directory
mydirectory on the cluster
front-end node birch.itc to a remote host, initiating the transfer either
from birch.itc or from the remote host. These examples use
the ksh line continuation character \ immediately followed
by a newline.
Mac OSX with Darwin includes scp and rsync, so these commands can be run
inside the terminal application exactly as in the Unix examples above.
From a Windows system, use SecureFX, a commercial product available to students, faculty, and staff.
The cluster runs ssh2; it does not run an ftp daemon, so sftp
is the correct protocol for file transfers to the cluster frontend.
Date: Mon, 21 Oct 2002 23:06:47 -0400
From: adm
#!/bin/sh
#PBS -l nodes=4:ppn=2
#PBS -l walltime=12:00:00
#PBS -o output_filename
#PBS -j oe
#PBS -m abe
#PBS -M userid@virginia.edu
cd $PBS_O_WORKDIR
mpiexec -comm mpich-gm executable_parallel
The PBS qsub command is used to submit job command files for
scheduling and execution. For example, to submit your job with a
PBS command file called "pbs_script.sh", the syntax would be
lc1: /home/uconsult $ qsub pbs_script.sh
1354.lc1.itc.virginia.edu
lc1: /home/uconsult $
Notice that upon successful submission of a job, PBS returns a
lc1: /home/uconsult $ qsub pbs_script1.sh
543.lc1.itc.virginia.edu
lc1: /home/uconsult $ qsub -W depend=afterok:543 pbs_script2.sh
544.frontend-0
After pbs_script1.sh is submitted, PBS returns the job identifier number, which
is then used as part of the dependence argument list when pbs_script2.sh is
submitted. The "afterok" argument in the dependency list indicates that
the job identified as 543 must complete successfully before pbs_script2.sh
will start.
lc1: /home/uconsult $ qsub -I -l nodes=2:ppn=2 -l walltime=00:10:00
qsub: waiting for job 1352.lc1.itc.virginia.edu to start
qsub: job 1352.lc1.itc.virginia.edu ready
localstorage is in /jobtmp/pbstmp.1352.lc1.itc.virginia.edu
compute-0-4: /home/uconsult $ mpirun -np 4 -machinefile $PBS_NODEFILE \
/opt/hpl-eth/bin/xhpl
============================================================================
HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
[further output not shown]
compute-0-4: /home/uconsult $ exit
lc1: /home/uconsult $
An interactive PBS job submission should require no more than 4 processors
(2 nodes, 2 processors each) for testing/debugging purposes.
In addition, an interactive PBS job will not terminate until the user exits the terminal
session. The allocated nodes will remain reserved as long as the terminal
session is open, so it is extremely important that users exit their
interactive sessions as soon as their debugging is done so that their nodes
are returned to the available pool of processors.
www.scd.ucar.edu/docs/chinook/save.html
lc1: /home/uconsult $ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1363.lc1 uconsult workq job16x2 19094 16 32 -- 00:20 R 00:02
1364.lc1 teh1m workq job12x2 7149 12 24 -- 00:16 R 00:01
1365.lc1 teh1m workq job8x2 4166 8 16 -- 00:12 R 00:00
1366.lc1 uconsult workq job20x2 -- 20 40 -- 00:28 Q --
1368.lc1 uconsult workq STDIN 30942 2 4 -- 00:10 R 00:02
lc1: /home/uconsult $
The first five fields of the display are self-explanatory. Note that job ID
1368 has a jobname of STDIN which is short for standard input, indicating
that it is an interactive job. The sixth and seventh fields titled NDS and TSK
in the above display indicate
the total number of nodes and processors respectively required by each job. The
ninth field indicates the required walltime (hrs:min.) and the last field shows the
elapsed runtime. The tenth field titled S indicates the state of the job.
The job state can have the following values:
State Definition
E Job is exiting after having run
H Job is held
Q Job is queued, eligible to run or be routed
R Job is Running
T Job is in transition (being moved to a new location)
W Job is waiting for its requested execution time to be reached
S Job is suspended
lc1: /home/uconsult $ qstat -f 1363
Job Id: 1363.lc1.itc.virginia.edu
Job_Name = job16x2
Job_Owner = uconsult@lc1.itc.virginia.edu
resources_used.cpupercent = 82
resources_used.cput = 00:01:59
resources_used.mem = 83384kb
resources_used.ncpus = 32
resources_used.vmem = 124920kb
resources_used.walltime = 00:02:33
job_state = R
queue = workq
server = lc1.itc.virginia.edu
Checkpoint = u
ctime = Fri Oct 25 03:00:41 2002
Error_Path = lc1.itc.virginia.edu:/h1/u/uc/uconsult/linux_cluster/job16x2.e1363
exec_host = compute-1-0/0+compute-0-15/0+compute-0-14/0+compute-0-13/0+comp
ute-0-12/0+compute-0-11/0+compute-0-10/0+compute-0-9/0+compute-0-8/0+co
mpute-0-7/0+compute-0-6/0+compute-0-5/0+compute-0-4/0+compute-0-3/0+com
pute-0-2/0+compute-0-1/0+compute-1-0/1+compute-0-15/1+compute-0-14/1+co
mpute-0-13/1+compute-0-12/1+compute-0-11/1+compute-0-10/1+compute-0-9/1
+compute-0-8/1+compute-0-7/1+compute-0-6/1+compute-0-5/1+compute-0-4/1+
compute-0-3/1+compute-0-2/1+compute-0-1/1
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = e
mtime = Fri Oct 25 03:00:42 2002
Output_Path = lc1.itc.virginia.edu:/h1/u/uc/uconsult/linux_cluster/16x2
Priority = 0
qtime = Fri Oct 25 03:00:41 2002
Rerunable = True
Resource_List.ncpus = 32
Resource_List.neednodes = 16:ppn=2
Resource_List.nodect = 16
Resource_List.nodes = 16:ppn=2
Resource_List.walltime = 20:00:00
session_id = 19094
Variable_List = PBS_O_HOME=/home/uconsult,PBS_O_LANG=en_US,
PBS_O_LOGNAME=uconsult,
PBS_O_PATH=/home/uconsult/bin:/usr/pbs/bin:/usr/share/mpi/bin:/uva/bin
:/usr/pgi/linux86/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X1
1R6/bin:.,PBS_O_MAIL=/var/spool/mail/uconsult,PBS_O_SHELL=/bin/ksh,
PBS_O_HOST=frontend-0,PBS_O_WORKDIR=/h1/u/uc/uconsult/linux_cluster,
PBS_O_SYSTEM=Linux,PBS_O_QUEUE=workq
comment = Job run at started on Fri Oct 25 at 03:00
etime = Fri Oct 25 03:00:41 2002
PBS provides the qdel command for deleting jobs from the system using
the job identification number, as shown below.
lc1: /home/uconsult/linux_cluster $ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1361.lc1 uconsult workq job16x2 18136 16 32 -- 48:00 R 00:01
lc1: /home/uconsult/linux_cluster $ qdel 1361
lc1: /home/uconsult/linux_cluster $ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1361.lc1 uconsult workq job16x2 18136 16 32 -- 48:00 E 00:01
#!/bin/sh
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:00
#PBS -j oe
#PBS -m ea
#PBS -M uconsult@virginia.edu
#Load module for mpi
module add mpich-gm-intel
# Define variable for local storage on compute nodes associated with the job
LS="/jobtmp/pbstmp.$PBS_JOBID"
# Copy executable (e.g. xhpl) and data files (e.g. HPL.dat) from your
# home directory to local storage on the master compute node
cd $LS
/bin/cp $HOME/xhpl .
/bin/cp $HOME/HPL.dat .
# If parallel program, synchronize local storage from master compute node
# to slave compute nodes
/usr/bin/tmpsync -scatter
# Run parallel program using Ethernet
mpiexec -comm mpich-p4 ./xhpl > xhpl_out
# If parallel program, synchronize local storage from slave compute nodes
# to master compute node
/usr/bin/tmpsync -collect
# Copy all files from local storage on master compute node to
#/bigtmp/pbstmp.$PBS_JOBID on
# front-end. where they can be examined and/or downloaded.
/usr/bin/tmpsync -front
Disk space on the home directory is extremely limited, and space on /bigtmp
is temporary. Once your jobs have run, you will need to transfer your
files to your local system for permanent storage.
File tranfer to and from the cluster should be effected using a secure method
such as scp or rsync.
Transfer from birch.itc (local source and remote destination):
/uva/bin/scp mydirectory/* \
userid@remote_host.virginia.edu:/home/userid/myoutput/.
Note: userid@ may be omitted if the user's id is the same on both
systems. The colon after the hostname is essential, however.
/uva/bin/rsync -e ssh -a mydirectory/. \
userid@remote_host.virginia.edu:/home/userid/myoutput/.
Tranfer to remote_host (remote source and local destination):
/uva/bin/scp2 userid@birch.itc.virginia.edu:mydirectory/* \
/home/userid/myoutput/.
/uva/bin/rsync -e ssh -a \
userid@birch.itc.virginia.edu:mydirectory/. \
/home/userid/myoutput/.