NCSA Home
Contact Us | Intranet | Search

NCSA TeraGrid User Guide

 

File Systems and Storage

  1. Directories
    1. Home Directories
    2. Scratch Directories
    3. User Project Directories
    4. TeraGrid Environment Variables
  2. Permanent Storage
  3. Transferring Files

1. Directories

1.1 Home Directories

The /home file system contains all user home directories, and it is NFS-mounted on the all cluster nodes (login, gridftp, compute). Home directories are backed up on a regular basis. Quotas on /home are 10 GBytes per user. Use the quota command to see your disk usage and limits.

1.2 Scratch Directories

Scratch file systems are intended for short term use and should be considered volatile. Please note that backups are not performed on the scratch directories. In the event of a disk crash or file purge, files on the scratch directories cannot be recovered. Therefore, you should make sure to back up your files to permanent storage as often as significant changes are made (at least daily). With the exception of the Node Local scratch file system, all scratch file systems have a subdirectory for each user.

  • GPFS (General Parallel File System)

    The NCSA cluster has a GPFS scratch file system that uses Network Shared Disk Server (NSD) configuration model accessible from all the cluster nodes.

    There are 55 TB of scratch space available in the GPFS NSD file system, /gpfs_scratch1. It is accessible from all nodes in the cluster (login, gridftp, and compute). It is the recommended scratch file system.

    Starting May 19, 2005, files in the GPFS NSD filesystem that have not been modified in 5 days will be purged. Please do not attempt to circumvent this removal scheme (e.g., with touch). Such attempts may result in the loss of access to the scratch file systems.

    Starting November 6 2007, there will be a 5 TB quota per user in GPFS scratch.

  • NFS Scratch Directories

    (September 2007) The NFS scratch directories have been removed from service

  • Node-Local Scratch Directories

    The scratch directory local to each machine is /scr. Each scratch directory has about 50GB of space available.

    Files in the local scratch space are not available to any other nodes and hence are not directly accessible to your processes running on other nodes as part of your job. Only processes running on the two CPUs which make up a node have direct access to files in the local scratch space.

    All files are automatically deleted after your batch job completes and the nodes are deallocated. All files you want to save must be copied from local scratch as part of your job. You will not be able to access files in local scratch after your job has completed.

  • TeraGrid Scratch Directory (GPFS-WAN)

    TeraGrid GPFS-WAN (Global Parallel File System-Wide Area Network) is available. See the GPFS-WAN section of the TeraGrid documentation for details. GPFS-WAN is not mounted by default on the mercury compute nodes. It is available from the login nodes.

1.3 User Project Directories

The Community Software Area (CSA) of TeraGrid is space that is primarily intended for the installation of executables and libraries that will be used by a community of users. This directory, /usr/projects, is accessible from all nodes on the cluster and is being backed up. For more details and for the request form, see the Community Software Area web page.

1.4 TeraGrid Environment Variables

Environment variables have been set up for the TeraGrid to help make scripts work no matter which cluster they are run on. Please use these variables instead of hardcoding paths.

File System Variable
Home Directory $TG_CLUSTER_HOME
Community Software Area $TG_COMMUNITY
Default Parallel File System $TG_CLUSTER_PFS
GPFS Scratch Directory $TG_CLUSTER_SCRATCH
Node-Local Scratch Directory $TG_NODE_SCRATCH

Note: On the NCSA TeraGrid cluster, these variables include the user subdirectory for those file systems that have user subdirectories.

2. Permanent Storage

NCSA's mass storage system, UniTree, is a high-speed, high-capacity data storage system available for permanent storage. The host name to use is mss.ncsa.teragrid.org. (mss.ncsa.uiuc.edu will also work, but it will be a lot slower.) The following file transfer methods are supported from the NCSA TeraGrid system to UniTree (globus-url-copy requires a proxy certificate while uberftp supports proxy or passwordless access ("-a MSS" authentication).

Notes:

  • gsiscp is not supported.
  • For users also on NCSA HPC systems, note that mssftp and msscmd are now available [2/14/2005].

3. Transferring Files

For general information on Transferring files, see the TeraGrid Data Transfer Overview. This section contains information specific to the NCSA cluster.

GridFTP Servers

NCSA is running GridFTP servers on 4 dedicated machines for mercury. gridftp-hg.ncsa.teragrid.org round-robins between those machines, so we recommend using that name to help load balance usage. Since these machines are dedicated to running GridFTP servers, you cannot directly log into these machines. These machines have access to the following file systems: $TG_CLUSTER_HOME, $TG_CLUSTER_SCRATCH, and $TG_CLUSTER_PFS. You should use these machines to transfer files to the NCSA cluster as they will be faster than transferring files to the login node. This is true even for transferring files from mass storage to NCSA's TeraGrid cluster. You do not need to use these servers within a batch job since the node is dedicated to your job. Note: when transferring files through the server to the home directory or NFS scratch directory there will be a short delay before the file appears in the directory on the cluster. See example transferring a file from mass storage to the gridftp servers.

File Transfer Commands

  • scp
  • You can use normal scp to copy files to and from the TeraGrid cluster:

      % scp <source> <destination>

    For example to copy everything in your local directory to NCSA's TeraGrid cluster into the directory ~/run1:

      % scp * tg-login.ncsa.teragrid.org:run1

    Note: If your logins are different on the two machines you must specify your remote login with the destination:

      % scp * jdoe@tg-login.ncsa.teragrid.org:run1

    Note: If you are copying files between TeraGrid clusters, as with ssh, you must specify -obatchmode=no:

      % scp -obatchmode=no * jdoe@tg-login.ncsa.teragrid.org:run1
  • scp with proxy certificate
  • If the remote machine is running an ssh server, you can use scp to copy files to it. If you use GSI-enabled scp (gsiscp) and have a proxy, it will use your proxy as authentication. Otherwise, you will have to type your password. Just like gsissh, if the server is running on a different port, you will need to specify it on the command line. The syntax of gsiscp is the same as with scp:

       % gsiscp -o port=<port_num> <source> <destination>

    For example to copy everything in the local directory to SDSC's TeraGrid cluster into the directory ~/run1:

       % gsiscp * tg-login1.sdsc.teragrid.org:run1

    To copy everything in the directory ~/run1 on SDSC's TeraGrid cluster into the current directory:

       % gsiscp tg-login1.sdsc.teragrid.org:run1/\* .
  • globus-url-copy
  • globus-url-copy is the command to transfer a file between sites using GridFTP. It is not an interactive command. The usage is:

    % globus-url-copy <source> <destination>

    where <source> or <destination> are of the format:

    • if local file, file:<full path>
    • if remote file, gsiftp://<hostname>/<full path>

    Example copying a local file to SDSC's TeraGrid cluster:

    % globus-url-copy file:`pwd`/prog.c \
    gsiftp://tg-login1.sdsc.teragrid.org/~/run1/prog.c

    Example getting a file from NCSA's mass storage system:

    % globus-url-copy -tcp-bs 32768 gsiftp://mss.ncsa.teragrid.org/~/prog.c \
    file:`pwd`/prog.c

    Example of 3rd party transfer - issuing the command on a third machine that transfers a file between two other machines.

    % globus-url-copy gsiftp://tg-login1.sdsc.teragrid.org/~/prog.c \
    gsiftp://tg-login1.caltech.teragrid.org/~/prog.c

    Note: globus-url-copy does not work with wildcards and is not recursive. The easiest way to workaround this is to tar up everything you want to copy into a single file, because tar does work with wildcards and is recursive. Then use globus-url-copy to copy the single tar file to the destination.

    Example backing up a directory (~/run1) to NCSA's mass storage system:

    % tar -cvf run1.tgz run1
    % globus-url-copy -tcp-bs 4194394 file:`pwd`/run1.tgz \
    gsiftp://mss.ncsa.teragrid.org/~/run1.tgz
  • uberftp
  • UberFTP is a GridFTP-enabled client that supports both interactive use as well as FTP commands on the command line. It usually requires a proxy. If you are using it to transfer files between NCSA's cluster and the UniTree mass storage system, you can use passwordless authentication [add the uberftp flag "-a MSS"]. See the man page for details on usage of UberFTP.

    Examples:

    • Passwordless authentication (from the NCSA cluster to UniTree only):

      • Interactively connect to NCSA's mass storage [while logged onto the NCSA cluster]:
        % uberftp -a MSS mss.ncsa.teragrid.org
        uberftp> passive
        Active
        uberftp>
              
      • Put a tar file named run1.data.tar to NCSA's mass storage in subdirectory run1:
        % uberftp -a MSS mss.ncsa.teragrid.org "passive; cd run1; put run1.data.tar"
    • Using proxy authentication:

      • Get files from NCSA's mass storage:
        % uberftp mss.ncsa.teragrid.org  "passive; quote wait; cd run1; \
        get a.out; get run1.input"

        Note: "quote wait" is required when retrieving files from mass storage. This tells uberftp to wait until the file is staged to UniTree disk and retrieved before quitting. Otherwise it quits immediately and the file isn't retrieved.

      • Put all files with suffix .tar to NCSA's mass storage in subdirectory run1:
        % uberftp mss.ncsa.teragrid.org "passive; cd run1; mput *.tar"
      • Transferring a file from mass storage (test1/tg/a.out) to GPFS NSD scratch (/gpfs_scratch1/jdoe/3rd/a.out) using the GridFTP servers:
          uberftp mss.ncsa.teragrid.org \
        	    "passive; lopen gridftp-hg.ncsa.teragrid.org; lcd  \
        	       /gpfs_scratch1/jdoe/3rd; cd test1/tg; get a.out"

Back to Top