This tutorial is designed for researchers who are new to the Aspen Linux cluster. It covers basic information about the cluster, as well as how to create and submit batch jobs using the PBS resource management software. It also contains sample job command files that can be used as templates for running jobs under PBS.
The Aspen Linux cluster uses Red Hat Linux version 7.2 as its operating systems and the Portable Batch System (PBS) software to distribute the computational workload across the nodes. PBS is a batch job scheduling application that provides the facility for building, submitting and processing batch jobs on the cluster.
Jobs are submitted to the cluster by creating a PBS job command file
that specifies certain attributes of the job, such as how long
the job is expected to run and how many nodes of the cluster are needed (e.g.
for parallel programs). PBS then schedules when the job is to start running
on the cluster (based in part on those attributes), runs and monitors the job
at the scheduled time, and returns any output to the user once the job
completes.
Important notice for Windows users: do not use a standard Windows editor such as Notepad to edit files that will be used on the Linux or other Unix systems. The two systems use different sequences of control characters to mark the end of line (EOL). If you are using the clusters from a Windows system, there are a number of options:
When a job is submited to the cluster through PBS a new login to your account is initiated, and any initialization commands in your startup files (.profile, .variables.ksh, .kshrc etc) are executed. In this case (running in batch mode) it is necessary to disable the interactive commands such as setting tset and stty. If these precautions are not taken then error messages will be written to the batch jobs error file and your program may not run.
The recommended procedure to disable the interactive sections of the startup files is to test the environment variable PBS_ENVIRONMENT, which is set when PBS runs. If the variable has been set, meaning a PBS job has initiated the login, the interactive parts of the startup files are skipped.
Below is an example of a .profile file configured for use with PBS on the Aspen
cluster.
# The following command exports variables set here to your user shell.
set -a
# This command runs your ".variables.ksh" file.
. ${HOME}/.variables.ksh
# Exclude interactive commands & umenu if LL_JOB is TRUE (SP batch job)
# or PBS_ENVIRONMENT is set (PBS batch jobs on any architecture)
if [ -z "$PBS_ENVIRONMENT" ] ; then
# Make /home/ intial Linux command prompt directory path
cd /home/$USER
# Interactive lines such control key and terminal settings go here
# Close exclusion of interactive section (SP and PBS batch job requirement)
fi
The following link shows a complete .profile modified
to run PBS jobs using K-shell. If you are using the shell tcsh, the
following link shows a .login modified to run PBS
jobs. You should also make sure any stty commands are done inside the
PBS exclusion test in the .profile or .login.
Note: if you have trouble using the man command on Aspen, in your .variables.ksh file replace the line
PAGER=/usr/bin/morewith
PAGER=moreThis should work on all systems since more is normally in the path automatically.
Note that csh (tcsh) users may get the warning "Warning: no access to tty, thus no job control in this shell" as part of their PBS job output. This is documented on page 18 of the PBSPro User's Guide and should not affect the job itself.
To allow access to the PBS commands and manual pages, the appropriate paths have been added to the system PATH and MANPATH environment variables. Users should make sure they are including the system PATH and MANPATH variables as part of their account PATH and MANPATH variables (e.g. in .variables.ksh, PATH=${HOME}/bin:${PATH}:/home/loadl/bin:.).
Users may need to modify their PAGER variable (typically in the .variables.ksh file) to be /bin/more so that the man command will work correctly on the cluster.
module load pgi
loads the current version of the PGI compiler suite, while the command
module load pgi/4.0
loads the older version of the PGI compilers.
The modules command has a number of options, some of which are similar. For example, module add is synonymous with module load.
A full listing of the available modules can be obtained by typing
module which
Executing module which on Aspen at a particular time yields
icc/7.0 : loads the Intel C++ Compiler Environment ifc/7.0 : loads the Intel Fortran Compiler Environment imsl/5.0 : loads the IMSL scientific library java/1.5 : loads the Sun JDK Environment mpich-eth-gnu/1.2.4 : loads the mpich environment for Gnu over Ethernet mpich-eth-intel/1.2.4: loads the mpich environment for Intel over Ethernet mpich-eth-pgi/1.2.5 : loads the mpich environment for PGI over Ethernet pgi/4.0 : loads the PGI Compiler Environment pgi/5.0 : loads the PGI Compiler Environment
The Portland Group
(PGI) Compilers are licensed by ITC to run on Linux platforms at the
University. The PGI compilers available on the Aspen Cluster are:
pgcc [options] file.c (C) pgCC [options] file.C (C++) pgf77 [options] file.f (Fortran 77) pgf90 [options] file.f (Fortran 90) pghpf [options] file.f (High Performance Fortran)For a complete list of options consult the relevent compiler man page, e.g. man pgf77 from your account on aspen.itc. More detailed information about the PGI compilers can be found in the documentation on the webpage,
www.pgroup.com/doc/index.htm
Information about installing these compilers on your own Linux workstation can be found on the webpage,
wwww.itc.virginia.edu/research/pgi/
The Intel
compilers are licensed by ITC to run on Linux platforms at the
University. The Intel compilers available on the Aspen Cluster are:
icc [options] file.c (C) icc [options] file.C file.cc file.cpp (C++) ifc [options] file.f (Fortran 77) ifc [options] file.f90 (Fortran 90/95)For a complete list of options consult the relevent compiler man page, e.g. man ifc on Aspen. More detailed information about the Intel compilers can be found in the documentation on the Fortran and C/C++ Web pages.
To compile parallel programs, the open source MPI (Message Passing Interface) libraries MPICH have been provided. A module corresponding to the compiler you wish to use must be loaded in order to set up the correct environment. MPICH is specific to compiler and to networking protocol. For example, to use an MPICH compiled with the Intel compiler over the Ethernet networking protocol, which is the only protocol available on Aspen, the command would be
module load mpich-eth-intelOnce the module is loaded the following commands should be used to compile programs that use MPI code:
mpicc [options] file.c (C) mpiCC [options] file.C (C++) mpif77 [options] file.f (Fortran 77) mpif90 [options] file.f (Fortran 90)
The following webpage provides information on using the MPI libraries.
www.itc.Virginia.EDU/research/mpi/Once you have an executable version of a program you want to run, whether it's source code you've compiled yourself or a third party software package such as Matlab or Mathematica, you must use the PBS resource management software to run the code on the cluster.
The PBS resource management system handles the management and monitoring of the computational workload on the Aspen cluster. Users submit "jobs" to the resource management system where they are queued up until the system is ready to run them. PBS selects which jobs to run, when, and where, according to a predetermined site policy meant to balance competing user needs and to maximize efficient use of the cluster resources.
To use PBS, you create a batch job command file which you submit to the PBS server to run on the Aspen cluster. A batch job file is simply a shell script containing the set of commands you want run on some set of cluster compute nodes. It also contains directives which specify the characteristics (attributes), and resource requirements (e.g. number of compute nodes and maximum runtime) that your job needs. Once you create your PBS job file, you can reuse it if you wish or modify it for subsequent runs.
PBS also provides a special kind of batch job called interactive-batch. An interactive-batch job is treated just like a regular batch job, in that it is placed into the queue and must wait for resources to become available before it can run. Once it is started, however, the user's terminal input and output are connected to the job in what appears to be an rlogin session to one of the compute nodes. Many users find this useful for debugging their applications or for computational steering.
PBS provides two user interfaces for batch job submission: a command line interface (CLI) and a graphical user interface (GUI). Both interfaces provide the same functionality and you can use either one to interact with PBS. The CLI lets you type commands at the system prompt. The GUI is a graphical point-and-click interface.
The PBS graphical interface is invoked with the command
xpbs. A screen shot of xpbs is here.
The xpbs interface is composed of three windows: the first is the
"Hosts Panel" and displays the the hostnames of the machines running PBS
servers to which jobs can be submitted. In the case of the Aspen cluster, the
PBS server is running on the front-end login host aspen.itc.virginia.edu and
is labeled lc0. The second window is the "Queues Panel" and displays
information about the queues managed by the server host selected in the
"Hosts Panel". It shows the single queue "workq" on the Aspen cluster. The
third window is the "Jobs Panel" and displays information about jobs that are
found in the queue(s) selected from the Queues listbox.
Further information about how to configure and use the xpbs
interface can be found in Chapter 5 of the
PBS Pro User Guide. The remainder of this
tutorial will focus on the PBS command line interface. More detailed
information bout using PBS can be found in the PBS Pro User Guide.
To submit a job to run on the Aspen cluster, a PBS job command file must be
created. The job command file is a shell script that contains PBS
directives;
these directives are preceded by #PBS. The following is an example of
a PBS command file to run a serial job, which would only require
1 processor on one node. In this example, the executable to be run is named
serial_executable.
#!/bin/sh #PBS -l nodes=1:ppn=1 #PBS -l walltime=12:00:00 #PBS -o output_filename #PBS -j oe #PBS -m bea #PBS -M userid@virginia.edu cd $PBS_O_WORKDIR ./serial_executable
The first line identifies this file as a shell script. The next several
lines are PBS directives that must precede any commands to be executed
by the shell (e.g. the last two lines). The PBS directives illustrated are
explained in the table below:
PBS Directive Function
#PBS -l nodes=1:ppn=1 Specifies a PBS resource requirement of
1 compute node and 1 processor per node.
#PBS -l walltime=12:00:00 Specifies a PBS resource requirement of
12 hours of wall clock time to run the job.
#PBS -o output_filename Specifies the name of the file where job
output is to be saved. May be omitted to
generate filename appended with jobid number.
#PBS -j oe Specifies that job output and error messages
are to be joined in one file.
#PBS -m bea Specifies that PBS send email notification
when the job begins (b), ends (e), or
aborts (a).
#PBS -M userid@virginia.edu Specifies the email address where PBS
notification is to be sent.
#PBS -V Specifies that all environment variables
are to be exported to the batch job.
It is not necessary to use the -j (join) directive; sometimes it is helpful
to keep the output and error files separate. If -o or
-e directives are not specified, PBS will assign a name to each consisting
of the name of the script concatenated with .o
The following is an example of a PBS email notification to the user at the
end of the job:
Note that the walltime-used information in the email should be used to
accurately estimate the walltime resource requirement in the PBS job
command file for future job submissions so that PBS can more effectively
schedule the job. When submitting a particular PBS job for the first time,
the walltime requirement should be overestimated to prevent premature
job termination. The walltime measurement corresponds closely to the
job cpu time since each job is allocated its own processor for execution.
After the PBS directives in the command file, the shell executes a change
directory command to $PBS_O_WORKDIR, a PBS variable indicating the
directory where the
PBS job was submitted. Normally this will also be where the progam executable
is located. Other shell commands can be executed as well. In the last line,
the executable itself itself is invoked.
If your program was compiled with the PGI compiler or uses any of its libraries,
you will probably need to add the lines
If the executable is a parallel program using the the Message Passing Interface
(MPI), then it will require multiple processors of the cluster to run and this
is specified in the PBS nodes resource requirement. The script
'mpiexec' is used to invoke the parallel executable. The following is an
example of a PBS command file to run a parallel (MPI) job:
In this case the PBS nodes resource requirement specifies 2 processor per
node on 4 nodes for a total of 8 processors. This number of processors
is automatically used by mpiexec, by default. The code was
compiled with the Intel compiler so the corresponding mpich module is loaded
before beginning the run.
Parallel jobs
should usually specify a nodes requirement of 2 processors per node to
efficiently partition the compute nodes for these jobs.
The PBS job command file can be given any name, although it is usually
appended with a .sh extension to indicate that it is a shell script. The link
pbs_script.sh is an example PBS job script that
runs the High Performance Linpack benchmark across 4 nodes using the input
file HPL.dat. You can download these to your
cluster account and use them to test PBS job submission described below.
Remember to change the userid placeholder in the PBS email directive to
your own.
There are many options to the qsub command as can be seen by
typing man qsub at the Linux command prompt on lc0.itc or looking at
PBS Pro User Guide. Three of the more
useful ones are the -W option for allowing specification of
additional job attributes, the -I option, which declares that the job
is to be run "interactively", and the -l option, which allows resource
requirements to be listed as part of the qsub command. These are
discussed below.
Specifying Job Dependencies
The -W option allows for the specification of additional job attributes. In
particular, the "-W depend=dependency_list" option to qsub defines
the dependency between multiple jobs, which is useful if the jobs need to
execute in a certain order. For example, if pbs_script2.sh should not start
executing until pbs_script1.sh successfully completes because it needs a
file that pbs_script1.sh creates, then these two jobs should be submitted
to PBS in the following manner:
Other options for arguments of the dependency list are detailed in Chapter 8
of PBS Pro User Guide as well as the online manual
page for qsub by typing man qsub at the Linux command prompt.
Submitting an Interactive Job
The -I option of qsub declares that a job has to be run
"interactively". The job will be queued and scheduled as any PBS batch job,
but when executed, the standard input, output, and error streams of the job
are connected through qsub to the terminal session in which
qsub is running. Interactive jobs with PBS should be used only for
the purposes of testing/debugging the user's code, e.g. in cases using the
PGI or TotalView debuggers.
Once the PBS intereactive job is executed, the terminal
session will be logged into one of the compute nodes allocated by PBS. The
executable can then be invoked manually from the Linux command prompt.
As will be discussed in the next section, the PBS scheduler is configured to
favor jobs with shorter walltime and smaller node resource requirements. To
insure that a PBS interactive job is executed quickly, these reduced resource
requirements can be listed as arguments of qsub with the
-l option.
The following is an example of running the High Performance Linpack Benchmark
as an interactive PBS job using 4 nodes and requesting 10 minutes of walltime.
Note that the terminal session is actually logged into node compute-0-4.
Job Submission Policies
Users of the Aspen Cluster may submit as many jobs to PBS as they like.
The PBS scheduler will dynamically determine a user's priority based on the
the number of jobs of other users and the number of available nodes, in order
to maximize cluster usage in an equitable fashion. Any jobs in excess of
the allowed upper limit on resources (such as cpus) will wait in the queue
until a slot opens when one or more of the user's other jobs finishes.
All PBS jobs submitted by users of the cluster will go to one execution
queue called
workq; the scheduler will first sort them by giving jobs requiring shorter
walltime and smaller node resource requirements higher run priorities. The
scheduler further modifies these priorities based on a fair-share algorithm
which tries to guarantee that on average, all users will get an equal amount
of computing time. Finally, jobs waiting for more the 24 hours to run will
be considered "starving" and given higher priority.
PBS is currently configured to limit the maximum amount of walltime a single
job can use to 168 hours. When that time limit is reached, the job will be
terminated whether it has completed or not. This insures that no one job
can monopolize cluster compute nodes indefinitely and underscores the need
for users to implement some type of save-restart mechanism in their code
so they can restart the job close to where it was stopped and not lose all the
work done up to that point. The following URL provides some guidelines
for implementing save-restart in your code:
PBS also imposes a limit on the number of processors
users can require, based on how busy the cluster is. A user
can request up to 36 processors for a parallel job, though all must
become available for such a job to start; in practice this is unlikely to
occur. A maximum of 48 cpus aggregated over all jobs may be in use by a
single user.
The PBS configuration and scheduling policies used on the cluster will be
periodically reviewed and modified as needed to insure efficient and
equitable use of this high performance computing resource.
Researchers with extraordinary needs for the cluster, either in terms
of extended compute time or number of nodes, should contact the Research
Computing Support Group at res-consult@virginia.edu to discuss making
special arrangements to meet those needs.
The qstat -a command is used to obtain status information about jobs
submitted to PBS.
The following example shows how to use the qstat -f command to get
detailed information on a specific job using its job identification number.
For further information about the qstat command, type
man qstat on the cluster front-end machine aspen.itc or see the
PBS Pro User Guide.
For further information about the qdel command, type man qdel
on the cluster front-end machine lc0.itc or see the
PBS Pro User Guide.
In this section are a number of sample PBS command files for different types
of jobs.
A perl script called tmpsync has been installed on the Aspen Linux Cluster
to allow users to deal with programs that generate large scratch or output
files without exceeding their home directory disk space quota. The PBS
command file below shows how tmpsync can be used with the scatter/collect
options to distribute/collect files associated with a
parallel program to/from disk space on the cluster compute nodes. Once the
PBS job has completed, all output files from the master compute
node will be copied to /bigtmp/$PBS_JOBID on the frontend node (lc0.itc).
The variable $PBS_JOBID is assigned when the job begins and contains the
ID number, so users should make a note of all their job ID numbers.
Files older than 72 hours are removed from /bigtmp, so users should download
their output file to their own longer-term storage.
File tranfer to and from the cluster frontend should be
done using a secure method such as scp or rsync. The
following are examples of transferring files from /bigtmp on the cluster
front-end node lc0.itc to a remote host, initiating the transfer either
from lc0.itc or from the remote host. These examples use
the ksh line continuation character \ immediately followed
by a newline.
Note that if the program is serial rather than parallel,
the scatter and collect operations of tmpsync would not be needed since
there would only be one execution node.
This is a PBS job command file to run a Matlab batch job. The Matlab
program commands are in the file matlab_script.m (note the .m
extension is not included in the command syntax) and the output
of the program will go to the file matlab_output1 at the end of the
job and to matlab_output2 while the job is running.
This is a PBS job command file to run a Mathematica batch job. The Mathematica
program commands are in the file math_script and the output
of the program will go to the file math_output. These file names
are arbitrary and other names could be used.
It is sometimes useful to make the first line in the math_script
file the command If you have Mathematica commands stored in a notebook that you
would like to transfer to your math_script file, you can use one
of Mathematica's front end features to help you.
A dialog box will appear prompting you to give the file a name and location. You can use this Package Format file as the input file for your
Mathematica batch job.
This is a PBS job command file to run an Ansys batch job. The Ansys
program input is in the file ansys.in and the output
of the program will go to the file ansys.out. Output from PBS is saved
in the file ansys.msg.
Date: Mon, 21 Oct 2002 23:06:47 -0400
From: adm
source /opt/Modules/default/init/sh
module add pgi
before or after the cd into the working directory.
#!/bin/sh
#PBS -l nodes=4:ppn=2
#PBS -l walltime=12:00:00
#PBS -o output_filename
#PBS -j oe
#PBS -m abe
#PBS -M userid@virginia.edu
cd $PBS_O_WORKDIR
mpiexec -comm mpich-p4 executable_parallel
The PBS qsub command is used to submit job command files for
scheduling and execution. For example, to submit your job with a
PBS command file called "pbs_script.sh", the syntax would be
lc0: /home/uconsult $ qsub pbs_script.sh
1354.lc0
lc0: /home/uconsult $
Notice that upon successful submission of a job, PBS returns a
lc0: /home/uconsult $ qsub pbs_script1.sh
543.lc0
lc0: /home/uconsult $ qsub -W depend=afterok:543 pbs_script2.sh
544.lc0
After pbs_script1.sh is submitted, PBS returns the job identifier number which
is then used as part of the dependence argument list when pbs_script2.sh is
submitted. The "afterok" argument in the dependency list indicates that
the job identified as 543 must complete successfully before pbs_script2.sh
will start.
lc0: /home/uconsult $ qsub -I -l nodes=2:ppn=2 -l walltime=00:10:00
qsub: waiting for job 1352.lc0 to start
qsub: job 1352.lc0 ready
localstorage is in /jobtmp/pbstmp.1352.lc0
compute-0-4: /home/uconsult $ mpiexec -comm mpich-p4 \
/opt/hpl-eth/bin/xhpl
============================================================================
HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
[further output not shown]
compute-0-4: /home/uconsult $ exit
lc0: /home/uconsult $
An interactive PBS job submission should require no more than 4 processors
(2 nodes, 2 processors each) for testing/debugging purposes.
In addition, an interactive PBS job will not terminate until the user exits the terminal
session. The allocated nodes will remain reserved as long as the terminal
session is open, up to the walltime limit, so it is extremely important
that users exit their interactive sessions as soon as their debugging is
done so that their nodes are returned to the available pool of processors.
www.scd.ucar.edu/docs/chinook/save.html
lc0: /home/uconsult $ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1363.lc0 uconsult workq job16x2 19094 16 32 -- 00:20 R 00:02
1364.lc0 teh1m workq job12x2 7149 12 24 -- 00:16 R 00:01
1365.lc0 teh1m workq job8x2 4166 8 16 -- 00:12 R 00:00
1366.lc0 uconsult workq job20x2 -- 20 40 -- 00:28 Q --
1368.lc0 uconsult workq STDIN 30942 2 4 -- 00:10 R 00:02
lc0: /home/uconsult $
The first five fields of the display are self-explanatory. Note that job ID
1368 has a jobname of STDIN which is short for standard input, indicating
that its an interactive job. The sixth and seventh fields titled NDS and TSK
in the above display indicate
the total number of nodes and processors respectively required by each job. The
ninth field indicates the required walltime (hrs:min.) and the last field shows the
elapsed runtime. The tenth filed titled S indicates the state of the job.
The job state can have the following values:
State Definition
E Job is exiting after having run
H Job is held
Q Job is queued, eligible to run or be routed
R Job is Running
T Job is in transition (being moved to a new location)
W Job is waiting for its requested execution time to be reached
S Job is suspended
lc0: /home/uconsult $ qstat -f 1363
Job Id: 1363.lc0
Job_Name = job16x2
Job_Owner = uconsult@lc0
resources_used.cpupercent = 82
resources_used.cput = 00:01:59
resources_used.mem = 83384kb
resources_used.ncpus = 32
resources_used.vmem = 124920kb
resources_used.walltime = 00:02:33
job_state = R
queue = workq
server = lc0
Checkpoint = u
ctime = Fri Oct 25 03:00:41 2002
Error_Path = lc0:/h1/u/uc/uconsult/linux_cluster/job16x2.e1363
exec_host = compute-1-0/0+compute-0-15/0+compute-0-14/0+compute-0-13/0+comp
ute-0-12/0+compute-0-11/0+compute-0-10/0+compute-0-9/0+compute-0-8/0+co
mpute-0-7/0+compute-0-6/0+compute-0-5/0+compute-0-4/0+compute-0-3/0+com
pute-0-2/0+compute-0-1/0+compute-1-0/1+compute-0-15/1+compute-0-14/1+co
mpute-0-13/1+compute-0-12/1+compute-0-11/1+compute-0-10/1+compute-0-9/1
+compute-0-8/1+compute-0-7/1+compute-0-6/1+compute-0-5/1+compute-0-4/1+
compute-0-3/1+compute-0-2/1+compute-0-1/1
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = e
mtime = Fri Oct 25 03:00:42 2002
Output_Path = lc0:/h1/u/uc/uconsult/linux_cluster/16x2
Priority = 0
qtime = Fri Oct 25 03:00:41 2002
Rerunable = True
Resource_List.ncpus = 32
Resource_List.neednodes = 16:ppn=2
Resource_List.nodect = 16
Resource_List.nodes = 16:ppn=2
Resource_List.walltime = 20:00:00
session_id = 19094
Variable_List = PBS_O_HOME=/home/uconsult,PBS_O_LANG=en_US,
PBS_O_LOGNAME=uconsult,
PBS_O_PATH=/home/uconsult/bin:/usr/pbs/bin:/usr/share/mpi/bin:/uva/bin
:/usr/pgi/linux86/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X1
1R6/bin:.,PBS_O_MAIL=/var/spool/mail/uconsult,PBS_O_SHELL=/bin/ksh,
PBS_O_HOST=lc0,PBS_O_WORKDIR=/h1/u/uc/uconsult/linux_cluster,
PBS_O_SYSTEM=Linux,PBS_O_QUEUE=workq
comment = Job run at started on Fri Oct 25 at 03:00
etime = Fri Oct 25 03:00:41 2002
PBS provides the qdel command for deleting jobs from the system using
the job identification number, as shown below.
lc0: /home/uconsult/linux_cluster $ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1361.lc0 uconsult workq job16x2 18136 16 32 -- 48:00 R 00:01
lc0: /home/uconsult/linux_cluster $ qdel 1361
lc0: /home/uconsult/linux_cluster $ qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1361.lc0 uconsult workq job16x2 18136 16 32 -- 48:00 E 00:01
Tranfer from lc0.itc (local source and remote distination):
/uva/bin/scp /bigtmp/pbstmp.jobid.lc0/* \
userid@remote_host.virginia.edu:/home/userid/pbs_output/.
/uva/bin/rsync -e ssh -a /bigtmp/pbstmp.jobid.lc0/. \
userid@remote_host.virginia.edu:/home/userid/pbs_output/.
Tranfer to remote_host (remote source and local distination):
/uva/bin/scp2 userid@lc0.itc.virginia.edu:/bigtmp/pbstmp.jobid.lc0/* \
/home/userid/pbs_output/.
/uva/bin/rsync -e ssh -a \
userid@lc0.itc.virginia.edu:/bigtmp/pbstmp.jobid.lc0/. \
/home/userid/pbs_output/.
#!/bin/sh
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:00
#PBS -j oe
#PBS -m ea
#PBS -M uconsult@virginia.edu
# Define variable for local storage on compute nodes associated with the job
LS="/jobtmp/pbstmp.$PBS_JOBID"
# Copy executable (e.g. xhpl) and data files (e.g. HPL.dat) from your
# home directory to local storage on the master compute node
cd $LS
/bin/cp $HOME/xhpl .
/bin/cp $HOME/HPL.dat .
# If parallel program, synchronize local storage from master compute node
# to slave compute nodes
/usr/bin/tmpsync -scatter
# Run parallel program
mpiexec -comm mpich-p4 ./xhpl > xhpl_out
# If parallel program, synchronize local storage from slave compute nodes
# to master compute node
/usr/bin/tmpsync -collect
Note: in this script, there should be no spaces around the equals sign
in the line LS="/jobtmp/pbstmp.$PBS_JOBID".
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:02:00
#PBS -o matlab_output1
#PBS -j oe
#PBS -m ea
#PBS -M userid@virginia.edu
cd $PBS_O_WORKDIR
matlab -nojvm -nodesktop -r "matlab_script;exit" -logfile matlab_output2
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:02:00
#PBS -j oe
#PBS -m ea
#PBS -M userid@virginia.edu
cd $PBS_O_WORKDIR
math < math_script > math_output
AppendTo[$Echo, "stdout"] so that the Mathematica input lines will
also be included in the output file.
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=160:00:00
#PBS -o ansys.msg
#PBS -j oe
# Copy Ansys input file to compute node scratch space
LS="/jobtmp/pbstmp.$PBS_JOBID"
cd $LS
/bin/cp /home/mst3k/ansys/ansys.in .
ansys < $LS/ansys.in > $LS/ansys.out
This is a PBS job command file to run a Gaussian 98 batch job. The Gaussian 98 program input is in the file gaussian.in and the output of the program will go to the file gaussian.out. Output from PBS is saved in the file gaussian.msg.
#!/bin/sh #PBS -l nodes=1:ppn=1 #PBS -l walltime=160:00:00 #PBS -o guassian.msg #PBS -j oe # Copy Gaussian input file to compute node scratch space LS="/jobtmp/pbstmp.$PBS_JOBID" cd $LS /bin/cp /home/userid/gaussian/gaussian.in . # Define Gaussian scratch directory as compute node scratch space export GAUSS_SCRDIR=$LS # Load PGI module needed by the binary . /opt/Modules/default/init/sh module load pgi g98 < $LS/gaussian.in > $LS/gaussian.out
This is a PBS job command file to run a SAS batch job. The SAS program commands are in the file myfile.sas and the output of the program will go to the file myfile.out. The log file will be myfile.log.
#!/bin/sh #PBS -l nodes=1:ppn=1 #PBS -l walltime=01:00:00 #PBS -m bea #PBS -M userid@virginia.edu cd $PBS_O_WORKDIR sas myfile.sas
If you are transferring
to and from a Unix system (this includes Linux), the
following are examples of transferring files from a directory
mydirectory on the cluster
front-end node aspen.itc to a remote host, initiating the transfer either
from aspen.itc or from the remote host. These examples use
the ksh line continuation character \ immediately followed
by a newline.
Transfer from aspen.itc (local source and remote destination):
/uva/bin/scp mydirectory/* \ userid@remote_host.virginia.edu:/home/userid/myoutput/.Note: userid@ may be omitted if the user's id is the same on both systems. The colon after the hostname is essential, however. Also, if you are using Linux on your local workstation and are running OpenSSH rather than UVa's commercial SSH you should use sftp to transfer from the workstation to the clusters (scp will work in the opposite direction); sftp takes exactly the same form and commands as insecure ftp.
/uva/bin/rsync -e ssh -a mydirectory/. \ userid@remote_host.virginia.edu:/home/userid/myoutput/.Tranfer to remote_host (remote source and local destination):
/uva/bin/scp2 userid@birch.itc.virginia.edu:mydirectory/* \ /home/userid/myoutput/. /uva/bin/rsync -e ssh -a \ userid@birch.itc.virginia.edu:mydirectory/. \ /home/userid/myoutput/.
Mac OSX with Darwin includes scp and rsync, so these commands can be run inside the terminal application exactly as in the Unix examples above.
From a Windows system, use SecureFX, a commercial product available to students, faculty, and staff. The cluster runs ssh2; it does not run an ftp daemon, so sftp is the correct protocol for file transfers to the cluster frontend.