NCSA Home
Contact Us | Intranet | Search

Data Transfer

Table of Contents

  1. Data Transfer Overview
  2. Transfers on TeraGrid
  3. Transfers to and from NCSA's Mass Storage System
  4. Offsite Transfers

Data Transfer Overview

Some common data-movement tasks include:

  • Moving data from a production run off of the scratch file system.
    • Transfer to NCSA Mass Storage System (MSS). (see below).
    • Transfer from NCSA cluster to an offsite location (see below).
    • Transfer between NCSA clusters.
  • SCP
    • Advantages:
      • Recursive feature allows simple reproduction of entire directory hierarchies of files
      • Data is transmitted of a secure channel
      • Convenient way to transfer source code or other relatively small files to/from your /home directory.
    • Disadvantages:
      • Each individual file is transmitted separately. This transmission becomes an issue when network latency is high.
      • Performance is poor over wide area links due to small TCP window sizes
      • File transfers larger than 2GB are not supported.
    • Recommendations:
      • Use to transfer small files and/or directories containing source code or other relatively small file sets.
      • User for tar directories containing large numbers of files when sending over high-latency networks.
  • FTP
    • Advantages: Long-established Internet protocol and therefore widely availiable and easy to implement.
    • Disadvantages: Data is transmitted over an open channel.

Transfers on TeraGrid

It is important to remember that transfers made between TeraGrid sites have the full complement of TeraGrid tools available, including Globus GSI authentication and dedicated GridFTP servers at each site. The sites are connected over a high-bandwidth Wide Area Network (WAN). Within this framework, transfers between computing centers can be best carried out by utilizing the combined network bandwidth of several machines at the endpoints of a transfer. For more information about data transfer on TeraGrid, see the Data: Transfer Overview page.

Transfers to and from NCSA's Mass Storage System

Connectivity of each cluster into MSS varies. In general, multiple transfer streams will achieve the best aggregate transfer rates. The following utilities are installed on all production clusters at NCSA and can be used to transfer data to MSS.

  • uberftp
    • Command line or interative FTP interface.
    • Parallel streams can be enabled.
    • GSI (grid-proxy) authentication availiable for TeraGrid users.
    • Supports third-party transfers and the GridFTP protocol.
  • mssftp/msscmd
    • mssftp allows a passwordless interactive FTP session to be initiated from any NCSA production machine.
    • msscmd is a command line interface to send FTP commands to MSS.
  • globus-url-copy
    • Command line GridFTP client.
    • Newer versions allow striped transfers across mutiple servers.

Offsite Transfers: From NCSA Facilities to Remote Systems

NCSA has eliminated clear text passwords. All outside connections must be made through SSH or Kerberos-enabled Telnet.

Enabling Passwordless Login via Kerberos

  • Verify that your system uses Kerberos-enabled SSH and FTP, and has Kerberos installed. Most newer Linux distributions come with Kerberos and Kerberos-enabled SSH, including Fedora Core 3 or newer.
  • Obtain the NCSA Kerberos install package. For systems without Kerberos, install as directed. For Linux systems with Kerberos already installed, simply replace /etc/krb5.conf with the NCSA version.
  • To enable passwordless SSH logins to NCSA resources, edit /etc/ssh/ssh_config or create a ~/.ssh/config file that contains the following lines.:
            Host *
                    GSSAPIAuthentication yes
        

Delegating Grid Credentials to a Remote Workstation

  • To use GSI grid authentication from a remote workstation or non-TeraGrid cluster, the Globus Toolkit (or at least a subset therein) must be installed.
  • Grid credentials can then be passed to the remote client machine by using an existing TeraGrid- or NCSA-accepted X.509 certificate as the initial proxy.
    • Valid proxies can be issued and stored on a TeraGrid or NCSA MyProxy server then deligated to a remote system. Refer to NCSA's MyProxy Server page for instructions on configuring a local installation of MyProxy to connect to the NCSA server. Note: MyProxy is included in the Globus Toolkit.
    • Once a valid proxy certificate exists on a correctly configured host, GSI authentcation tools will automatically connect to hosts for which the user has been granted access.

Offsite Transfer Examples

SSH

The following transfers were performed from a Linux workstation outside of the NCSA domain. A valid NCSA-issued Kerberos ticket was obtained by running kinit thus enableing secure passwordless access to NCSA HPC resouces.

Copy a local directory sturcture via streaming tar onto NCSA TeraGrid.
   $ tar -cf - tst/ | ssh user@tg-login4.ncsa.teragrid.org "tar xf -"
Copy a local directory into a tarball on Tungsten cluster.
   $ tar -cf - tst/ | ssh user@tuna.ncsa.uiuc.edu "cat > tit.tar"
FTP

With a valid NCSA Kerberos ticket, users can enjoy passwordless access to the NCSA Mass Storage System from a remote workstation.

Performance

Check with the network administrator of your local site for connectivity details and possible firewall and/or network bottlenecks that can lead to unexpected or inconsistent network bandwidth or functionality. Transfers can only take place as fast as the slowest component in the network chain.