NCSA Home
Contact Us | Intranet | Search

ncsa

User Information Home
Compute Resources
Software
Data
Security
Allocations
Consulting
Training
Strategic Applications Program

NCSA's Help Desk is available 24 hours a day, seven days a week, 365 days a year:
help.ncsa.uiuc.edu
217-244-0710
help@ncsa.uiuc.edu

Scheduling Issues on the Origin2000 in IRIX 6.5

  1. Introduction
  2. Environment Variables to control Scheduling in IRIX 6.5
  3. Gang Scheduled Jobs in IRIX 6.5

1. Introduction

The following is for shared memory parallel codes. These typically
  • use the automatic parallelizing (autotasking) option of the SGI compiler (you will have compiled your code with the -apo flag, or the -pfa/-pca flag in earlier versions of the compiler)
  • use compiler directives (OpenMP, MP, PCF, etc.) to parallelize the code (you will have compiled your code with the -mp flag for these directives to be recognized)
Other programming models are not affected.

In the IRIX 6.4 operating system and before, default scheduling of parallel threads was done using gang scheduling. In this type of scheduling, threads are treated as a gang and the priority of a gang is boosted instantly when the first thread is scheduled. So all threads of the gang will be running in parallel very quickly. In a multi-user shared environment, the IRIX 6.4 scheduler tended to favor gang-scheduled jobs over non-gang scheduled jobs with the same number of processes.

Starting with the IRIX 6.5 operating system, changes were made to the scheduler to level the playing field so that gang-scheduled jobs did not get priority over non-gang scheduled jobs. Because a gang-scheduled job cannot run until sufficient processors are available so that all members of the gang can be scheduled, the turnaround time for a gang-scheduled job can be slow.

In order to fix this problem, SGI changed the default scheduling of shared memory applications to dynamic adjustment of threads between parallel regions. Here, the number of threads that are used for executing parallel regions can be adjusted by the runtime environment to provide the best throughput. In other words, a job may complete sooner running with fewer threads rather than waiting for processors to be available so all requested threads can run. Thus, paradoxical as it may seem, running on fewer processors will result in faster turnaround time for a job because of the time saved in not waiting for busy processors to be free.

2. Environment Variables to Control Scheduling in IRIX 6.5

The environment variable OMP_DYNAMIC enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. The default value is TRUE. The environment variable MPC_GANG controls the use of gang scheduling. By default, this environment variable is not set. If dynamic scheduling is disabled (OMP_DYNAMIC set to FALSE), this enables gang scheduling (i.e., MPC_GANG gets set to ON).

Note that with dynamic scheduling, an extra watchdog thread is created to monitor and adjust the number of threads between parallel regions depending on the system load. This thread consumes minimal resources.

When running in dedicated mode, SGI recommends disabling dynamic threads. In the C-shell, this is:

setenv OMP_DYNAMIC FALSE
The environment variable _DSM_VERBOSE gives messages about parameters used during execution. In the C-shell, this is:
setenv _DSM_VERBOSE
See man pe_environ for more information on these environment variables.

3. Gang Scheduled Jobs in IRIX 6.5

Some codes may still be running in gang-scheduled mode in IRIX 6.5, and therefore could suffer from poor performance.

How to tell if an executable is running gang-scheduled

  • Run a short parallel test code (that will fit within the interactive limits) on modi4.
    setenv MP_SET_NUMTHREADS 4
    a.out &
    
  • Use the ps -l command: GN in the NI column indicates the job is running gang scheduled.
    ps -l
      F S   UID        PID       PPID  C PRI NI  P    SZ:RSS      WCHAN TTY     TIME CMD
      0 S  5109    1006229    1012221  0  20 20  *    43:33    61db6ab8 ttyr22  0:00 csh 
      0 R  5109    1239977    1241269  0  20 GN 18 12749:8485         - ttyr22  1:05 a.out 
      0 R  5109    1241269    1006229  0  20 GN 17 12749:8485         - ttyr22  1:07 a.out 
      0 R  5109    1241375    1241269  0  20 GN 19 12749:8485         - ttyr22  1:07 a.out 
      0 R  5109    1241481    1241269  0  20 GN 16 12749:8485         - ttyr22  1:08 a.out 
      0 R  5109    1242115    1006229  0  20 20  3   120:65           - ttyr22  0:00 ps 
    
  • Alternatively, use the command top -U $USER: the letter g at the beginning of the pri column indicates the job is running gang scheduled.
           PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
       1241269    1241269 sjohn    g20  199M  133M run/17   1:52 13.9 99.69 a.out
       1241375    1241269 sjohn    g20  199M  133M run/19   1:52 13.9 99.56 a.out
       1241481    1241269 sjohn    g20  199M  133M run/16   1:53 13.9 99.28 a.out
       1239977    1241269 sjohn    g20  199M  133M run/18   1:49 13.0 92.66 a.out
       1240301    1240301 sjohn     20 3088K 1568K run/27   0:00  1.1  7.60 top
    

Possible reasons for some codes still running gang-scheduling in 6.5, and solutions

  1. The MPC_GANG environment variable is turned on (as explained in section 2)

    Solution:

    Check your startup files (.cshrc, .login, etc.) as well as your batch script to make sure that MPC_GANG is not turned on.

  2. Your code is using the routine mp_numthreads, which returns the number of threads, enables gang scheduling (by definition, with dynamic threads, the number of threads can be changed at any time).

    Solution:

    • Use of the environment variable MP_FREEZE in the 7.3 MIPSpro compilers. NCSA has set the site default globally to OFF to facilitate use of dynamic threads. So codes using mp_numthreads that are compiled with the 7.3 compilers should automatically use dynamic scheduling.

      See /usr/news/SGI_MIPSpro_Compilers for more information on the MIPSpro 7.3 compilers.

    • Alternatively, change the mp_numthreads calls to omp_get_num_threads (which is from the OpenMP standard). This will use the default dynamic scheduling. See the omp_threads man page for more information. This will work with your older MP directives.

  3. Your code uses routines from the parallel (mp) versions of the SGI math libraries complib or scsl. Many of the parallel versions of these routines use mp_numthreads to determine the number of threads, which enables gang scheduling.

    Solution:

    • Because the parallel routines enable gang scheduling by using mp_numthreads, you can disable gang scheduling in either complib or scsl by using the MIPSpro version 7.3 compilers (since the environment variable MP_FREEZE has been globally set to OFF as described in 2. above).

    • If using scsl, the newest version of scsl (version 1.2 beta) has been changed to use omp_get_numthreads instead of mp_numthreads, so gang scheduling is not enabled.

      Use the command:

      module load scsl.1.2beta
      
      to load scsl 1.2 beta. You don't need to recompile your code. If being used in a batch job, putting the above load command in the batch script is recommended.

      To go back to the default version of scsl:

      module unload scsl.1.2beta