- Introduction
- saveafterjob features
- Examples
- Notes and Recommendations
- saveafterjob utilities
The saveafterjob utility on NCSA's
SGI Altix system provides
automated, guaranteed saving of output files
from batch jobs to the mass storage system.
The basic process is as follows:
- You need to specify the files that you want to be transferred
to mass storage using saveafterjob.
- The transfer occurs after the job is ended.
Now in all cases, the FTP (File Transfer Protocol) directives in saveafterjob will be executed for:
- jobs that terminate normally
- jobs that terminate abnormally, for example, for limit violations
- jobs that are killed by the system or the user (via qdel or llcancel etc.)
- jobs that are killed when the system crashes.
- With the saveafterjob command, files are not purged from the scratch filesystem until they
are successfully transferred or it has been determined that the transfer
will never happen (e.g., due to user syntax error).
- You will be notified by email after the transfer requests to mass storage
are processed by the system.
Only a small subset of the FTP directives are accepted with saveafterjob:
cd, lcd, mkdir, put, mput, tar, umask
Other FTP directives are ignored.
| saveafterjob Directive |
Description |
| cd |
change directory on UniTree |
| lcd |
change local working directory |
| mkdir |
make directory on UniTree |
| put |
send one file to UniTree |
| mput |
send multiple files to UniTree |
| umask |
get (set) umask on UniTree (see man umask) |
| tar |
create or extract a tar file from UniTree |
tar is a special built-in command that is also accepted.
IMPORTANT NOTES:
- The FTP directives specified in saveafterjob are saved by the system and
executed sometime
after job completion. It is important that the
saveafterjob commands be specified as early as possible in the
batch script, before the files to be transferred are even created.
The reason is that if the job dies prematurely before it reaches the
saveafterjob command(s) in the script, the request for saving
files is not recorded and files from the job will not be
transferred to mass storage.
The files
or file patterns specified are matched against the files after the job has
completed rather than during the job.
- It is important to issue the saveafterjob command(s) after changing to the directory ($SCR) where the job will execute since the transfer will be executed in the directory from which the saveafterjob request is made.
The -c (or --clear) option removes all previous requests.
See the saveafterjob man page for more information on the new feature, and the
ftp man page for more information on ftp directives.
- If a job creates an output file named output.dat,
the job could begin as follows:
cd $SCR
saveafterjob "put output.dat"
Then, sometime after the job finishes, the file output.dat
would be saved in the user's mass storage home directory.
- If the file needs to be saved in a directory named xyz
in UniTree, the command is:
cd $SCR
saveafterjob "mkdir xyz, cd xyz, put output.dat"
Note It is also possible to use ";" instead of "," as a delimiter.
- If the job creates subdirectories where the output file will reside, for
example, in a directory named Run under $SCR, the
syntax is:
cd $SCR
saveafterjob "lcd Run, put output.dat"
- If the job creates multiple files
*.dat,
the tar utility can be used
to combine the files into one file
job20.tar, which is saved to UniTree, the built-in tar
command in saveafterjob can be used (see the msscmd man page in the section
USING TAR for details on this syntax):
saveafterjob "tar cvf job20.tar *.dat"
Note 1: The above built-in tar command automatically uses the tar -K
option
required for files larger than 2 Gigabytes.
Note 2: To extract the files once job20.tar is saved to
UniTree, enter the following on copper:
% cd /scratch-global/$USER
% msscmd "tar xvf job20.tar"
IMPORTANT: Use of tar is strongly recommended for efficient
storage to and retrieval from UniTree. However, if the individual files are
very large (on the order of Gb), AND your access patterns are such that you
usually need to get only one or a small subset of the files at any given time
from UniTree, it may be more efficient to save the files individually.
- Alternatively, the general syntax for the above example would be:
cd $SCR
saveafterjob put '"|tar cf - *.dat"' job20.tar
Note 1: The above syntax is explained in the ftp man page in the section
FILE NAMING CONVENTIONS.
Note 2: Use this syntax if your tar command requires special options.
Note 3: Specifying a path name for tar in the above
syntax will cause the saveafterjob command to fail.
- The built-in tar command can also be used with shell variable names:
set run=abc20
cd $SCR
saveafterjob "tar cf $run.tar $run.*"
- Example on the use of --clear:
cd $SCR
saveafterjob "mput *.dat"
saveafterjob "mkdir X, cd X, mput *.chk"
queues up 2 requests. If the job later executes
saveafterjob --clear "mput *.tar"
the first 2 requests are removed and replaced by the new one.
This could be done in the case of jobs that
create intermediate files that need to be saved in the event of
premature termination of the job. If the job completes normally and the
output files are created, then the output files are saved and the
intermediate files are not necessary. In this case the saveafterjob --clear command would be issued after the execution line.
- File transfer is only guaranteed when using $SCR. Use of other file systems is not reliable.
- If your batch jobs are chained (one job submits another job before ending),
then you should not rely on the output from the first job making it
to UniTree in time to be available for the second job, if the first
job uses saveafterjob to save files.
- If you currently use mssftp in your batch scripts to save files,
you must use saveafterjob to be guaranteed safe transfer of
files.
To save files, replace:
mssftp << EOF
.
.
.
EOF
with
saveafterjob -f - << EOF
.
.
.
EOF
This is the shell here document syntax.
Note: You should use msscmd to get files in your batch
jobs rather than mssftp.
- Do not use $SCR in the put commands. For example:
saveafterjob -f - << EOF
put "|tar cf - $SCR/*.dat" dat.tar
EOF
This will result in $SCR being expanded in the file names in
dat.tar and will affect future extraction of the files.
The better syntax is:
cd $SCR
saveafterjob -f - << EOF
put "|tar cf - *.dat" dat.tar
EOF
-
Each instance of saveafterjob starts from your home directory on mss.
For example:
saveafterjob "cd run12, put run12.cpt"
saveafterjob "cd data, put res.dat"
saves run12.cpt into the directory user/run12
and res.dat into the directory user/data on unitree.
Note:It does not save res.dat into user/run12/data
-
If you try to cd to a directory on unitree that does not exist, the file will
be saved in the last valid directory from that saveafterjob command.
If this is the only cd in the command the file will be saved in your home
directory on unitree.
For example:
saveafterjob "cd data, put res.dat"
if subdirectory data does not exist res.dat will be saved in the
home directory.
or
saveafterjob "cd run12, cd data, put res.dat"
if subdirectory run12 exists, but run12/data does not
then res.dat will be saved in run12.
-
saveafterjob does not work with links or softlinks.
-
saveafterjob does not work with multiple wildcards in a single pattern.
For example it is not possible to wildcard both directories and files in a single command:
saveafterjob "tar ./*/data/*.dat"
will not work.
|