File Systems and Storage
- Directories
- Home Directories
- Scratch Directories
- User Project Directories
- TeraGrid Environment Variables
- Permanent Storage
- Transferring Files
1.1 Home Directories
The /home file system contains all user home directories, and it
is NFS-mounted on the all cluster nodes (login, gridftp, compute).
Home directories are backed up on a regular basis. Quotas on /home
are 10 GBytes per user. Use the quota
command to see your disk usage and limits.
1.2 Scratch Directories
Scratch file systems are intended for short term use and should
be
considered volatile. Please note that backups are not performed on
the scratch directories. In the event of a disk crash or file purge,
files on the scratch directories cannot be recovered. Therefore, you
should make sure to back up your files to permanent
storage as often as
significant changes are made (at least daily).
With the exception
of the Node Local scratch file system, all scratch file systems
have a subdirectory for each user.
- GPFS (General Parallel File System)
The NCSA cluster has a
GPFS
scratch file system
that uses Network Shared
Disk Server (NSD) configuration model accessible from all the cluster nodes.
There are 55 TB of scratch space available in the GPFS
NSD file system, /gpfs_scratch1. It is accessible from all
nodes in
the cluster (login, gridftp, and compute).
It is the
recommended scratch file system.
Starting May 19, 2005, files in the GPFS NSD filesystem
that have not been modified in 5 days will be purged.
Please do not attempt to circumvent this removal scheme (e.g.,
with touch). Such attempts may result in the loss of
access to the scratch file systems.
Starting November 6 2007, there will be a 5 TB quota per user in
GPFS scratch.
- NFS Scratch Directories
(September 2007) The NFS scratch directories have been removed from
service
- Node-Local Scratch Directories
The scratch directory local to each machine is /scr.
Each scratch directory has about 50GB of space available.
Files in the local scratch space are not available to any other nodes
and hence are not directly accessible to your processes running on
other
nodes as part of your job. Only processes running on the two CPUs which
make up a node have direct access to files in the local scratch space.
All files are automatically deleted after your batch job completes and
the nodes are deallocated. All files you want to save must be copied
from local scratch as part of your job. You will not be able to access
files in local scratch after your job has completed.
- TeraGrid Scratch Directory (GPFS-WAN)
TeraGrid GPFS-WAN (Global Parallel File System-Wide Area Network)
is available. See the
GPFS-WAN section of the TeraGrid documentation for details.
GPFS-WAN is not mounted by default on the mercury compute nodes.
It is available from the login nodes.
1.3 User Project Directories
The Community Software Area (CSA) of TeraGrid is space that is
primarily intended for the installation of executables and libraries
that will be used by a community of users. This directory, /usr/projects,
is accessible from all nodes on the
cluster and is being backed up. For more details and for the
request form, see the
Community
Software Area web page.
1.4 TeraGrid Environment Variables
Environment variables have been set up for the TeraGrid to help
make scripts work no matter which cluster they are run on. Please use
these variables instead of hardcoding paths.
| File System |
Variable |
| Home Directory |
$TG_CLUSTER_HOME |
| Community Software Area |
$TG_COMMUNITY |
| Default Parallel File System |
$TG_CLUSTER_PFS |
| GPFS Scratch Directory |
$TG_CLUSTER_SCRATCH |
| Node-Local Scratch Directory |
$TG_NODE_SCRATCH |
Note: On the NCSA TeraGrid cluster, these variables include the user
subdirectory for those file systems that have user subdirectories.
NCSA's mass storage system, UniTree,
is a high-speed, high-capacity data storage system available for
permanent storage. The host name to use
is mss.ncsa.teragrid.org. (mss.ncsa.uiuc.edu will also work, but it
will
be a lot slower.) The following file transfer methods are supported
from the NCSA TeraGrid system to UniTree (globus-url-copy requires a proxy
certificate while uberftp supports proxy or passwordless access
("-a MSS" authentication).
Notes:
- gsiscp is not supported.
- For users also on NCSA HPC systems, note that mssftp
and msscmd are now available [2/14/2005].
For general information on Transferring files, see the
TeraGrid
Data Transfer Overview. This section contains information
specific to the NCSA cluster.
GridFTP Servers
NCSA is running GridFTP servers on 4 dedicated machines for mercury.
gridftp-hg.ncsa.teragrid.org round-robins between those machines, so
we recommend using that name to help load balance usage. Since these
machines are dedicated to running GridFTP servers, you cannot directly
log into these machines. These machines have access to the following
file systems: $TG_CLUSTER_HOME, $TG_CLUSTER_SCRATCH, and
$TG_CLUSTER_PFS.
You should use these machines to transfer files to the NCSA cluster as
they will be faster than transferring files to the login node. This is
true even for transferring files from mass storage to NCSA's TeraGrid
cluster. You do not need to use these servers within a batch job since
the node is dedicated to your job.
Note: when transferring files through the server to the
home directory or NFS scratch directory there will be a short delay
before the file appears in the directory on the cluster. See example transferring a file from mass storage
to
the gridftp servers.
File Transfer Commands
% scp <source> <destination>
For example to copy everything in your local directory to NCSA's
TeraGrid
cluster into the directory ~/run1:
% scp * tg-login.ncsa.teragrid.org:run1
Note: If your logins are different on the two machines you must specify
your remote login with the destination:
% scp * jdoe@tg-login.ncsa.teragrid.org:run1
Note: If you are copying files between TeraGrid clusters, as with ssh,
you
must specify -obatchmode=no:
% scp -obatchmode=no * jdoe@tg-login.ncsa.teragrid.org:run1
scp with proxy certificate
If the remote machine is running an ssh server, you can use
scp to copy files to it. If you use GSI-enabled scp (gsiscp)
and have a proxy, it will use your proxy as authentication.
Otherwise, you will have to type your password. Just like
gsissh, if the server is running on a different port, you
will need to specify it on the command line. The syntax
of gsiscp is the same as with scp:
% gsiscp -o port=<port_num> <source> <destination>
For example to copy everything in the local directory to
SDSC's TeraGrid cluster into the directory ~/run1:
% gsiscp * tg-login1.sdsc.teragrid.org:run1
To copy everything in the directory ~/run1 on SDSC's TeraGrid cluster
into the current directory:
% gsiscp tg-login1.sdsc.teragrid.org:run1/\* .
globus-url-copy
globus-url-copy is the command to transfer a file between
sites using GridFTP. It is not an interactive command. The usage is:
% globus-url-copy <source> <destination>
where <source> or <destination> are of the format:
- if local file, file:<full path>
- if remote file, gsiftp://<hostname>/<full path>
Example copying a local file to SDSC's TeraGrid cluster:
% globus-url-copy file:`pwd`/prog.c \
gsiftp://tg-login1.sdsc.teragrid.org/~/run1/prog.c
Example getting a file from NCSA's mass storage system:
% globus-url-copy -tcp-bs 32768 gsiftp://mss.ncsa.teragrid.org/~/prog.c \
file:`pwd`/prog.c
Example of 3rd party transfer - issuing the command on a
third machine that transfers a file between two other machines.
% globus-url-copy gsiftp://tg-login1.sdsc.teragrid.org/~/prog.c \
gsiftp://tg-login1.caltech.teragrid.org/~/prog.c
Note: globus-url-copy does not work with wildcards and is not
recursive. The easiest way to workaround this is to tar up everything
you want to copy
into a single file, because tar does work with wildcards and is
recursive. Then use globus-url-copy to copy the single tar file to the
destination.
Example backing up a directory (~/run1) to NCSA's mass storage system:
% tar -cvf run1.tgz run1
% globus-url-copy -tcp-bs 4194394 file:`pwd`/run1.tgz \
gsiftp://mss.ncsa.teragrid.org/~/run1.tgz
uberftp
UberFTP is a GridFTP-enabled client
that supports both interactive use as well as FTP commands on the
command line. It usually requires a proxy.
If you are using it to transfer files between NCSA's cluster and the
UniTree mass storage system, you can use passwordless authentication
[add the uberftp flag "-a MSS"]. See the man page for details on usage of
UberFTP.
Examples:
- Passwordless authentication (from the NCSA cluster to UniTree only):
- Using proxy authentication:
- Get files from NCSA's mass storage:
% uberftp mss.ncsa.teragrid.org "passive; quote wait; cd run1; \
get a.out; get run1.input"
Note: "quote wait" is required when retrieving files from mass storage.
This tells uberftp to wait until the file is staged to UniTree disk and
retrieved before quitting. Otherwise it quits immediately and the file
isn't retrieved.
- Put all files with suffix .tar to NCSA's mass storage in
subdirectory run1:
% uberftp mss.ncsa.teragrid.org "passive; cd run1; mput *.tar"
- Transferring a file from mass storage
(test1/tg/a.out) to GPFS NSD scratch (/gpfs_scratch1/jdoe/3rd/a.out)
using the GridFTP servers:
uberftp mss.ncsa.teragrid.org \
"passive; lopen gridftp-hg.ncsa.teragrid.org; lcd \
/gpfs_scratch1/jdoe/3rd; cd test1/tg; get a.out"
Back to Top