Scheduling Issues on the Origin2000 in IRIX 6.5
- Introduction
- Environment Variables to control Scheduling in IRIX 6.5
- Gang Scheduled Jobs in IRIX 6.5
The following is for shared memory parallel codes. These typically
-
use the automatic parallelizing (autotasking) option of the SGI compiler
(you will have compiled your code with the -apo flag, or the
-pfa/-pca flag in earlier versions of the compiler)
-
use compiler directives (OpenMP, MP, PCF, etc.) to parallelize the code
(you will have compiled your code with the -mp flag for these
directives to be recognized)
Other programming models are not affected.
In the IRIX 6.4 operating system
and before, default scheduling of parallel threads was done using gang
scheduling. In this type of scheduling, threads are treated as a gang
and the priority of a gang is boosted instantly when the first thread is
scheduled. So all threads of the gang will be running in parallel very
quickly. In a multi-user shared environment, the IRIX 6.4 scheduler tended
to favor gang-scheduled jobs over non-gang scheduled jobs with the same
number of processes.
Starting with the IRIX 6.5 operating system, changes were made to the
scheduler to level the playing field so that gang-scheduled jobs did not
get priority over non-gang scheduled jobs. Because a gang-scheduled
job cannot run until sufficient processors are available so that all members
of the gang can be scheduled, the turnaround time for a
gang-scheduled job can be slow.
In order to fix this problem, SGI changed the default scheduling of
shared memory applications to dynamic adjustment of threads between
parallel regions. Here, the number of threads that are used for executing
parallel regions can be adjusted by the runtime environment to provide
the best throughput. In other words, a job may complete sooner
running with fewer threads rather than waiting for processors to be available
so all requested threads can run. Thus, paradoxical as it may seem, running
on fewer processors will result in faster turnaround time
for a job because of the time saved in not waiting for busy processors
to be free.
The environment variable
OMP_DYNAMIC enables or disables dynamic
adjustment
of the number of threads available for execution of parallel regions.
The default value is
TRUE. The environment variable
MPC_GANG controls
the use of gang scheduling. By default, this environment variable is
not set.
If dynamic scheduling is disabled (
OMP_DYNAMIC set to
FALSE), this
enables gang scheduling (i.e.,
MPC_GANG gets set to
ON).
Note that with dynamic scheduling, an extra watchdog thread
is created to monitor and adjust the number of threads between parallel
regions depending on the system load. This thread consumes minimal resources.
When running in dedicated mode, SGI recommends disabling dynamic threads. In
the C-shell, this is:
setenv OMP_DYNAMIC FALSE
The environment variable
_DSM_VERBOSE gives messages about
parameters used during execution. In the C-shell, this is:
setenv _DSM_VERBOSE
See
man pe_environ for more information on these environment
variables.
Some codes may still be running in gang-scheduled mode in IRIX 6.5, and
therefore could suffer from poor performance.
How to tell if an executable is running gang-scheduled
- Run a short parallel test code (that will fit within
the interactive limits) on modi4.
setenv MP_SET_NUMTHREADS 4
a.out &
- Use the ps -l command: GN in the NI
column indicates the job is running gang scheduled.
ps -l
F S UID PID PPID C PRI NI P SZ:RSS WCHAN TTY TIME CMD
0 S 5109 1006229 1012221 0 20 20 * 43:33 61db6ab8 ttyr22 0:00 csh
0 R 5109 1239977 1241269 0 20 GN 18 12749:8485 - ttyr22 1:05 a.out
0 R 5109 1241269 1006229 0 20 GN 17 12749:8485 - ttyr22 1:07 a.out
0 R 5109 1241375 1241269 0 20 GN 19 12749:8485 - ttyr22 1:07 a.out
0 R 5109 1241481 1241269 0 20 GN 16 12749:8485 - ttyr22 1:08 a.out
0 R 5109 1242115 1006229 0 20 20 3 120:65 - ttyr22 0:00 ps
- Alternatively, use the command top -U $USER: the letter
g at the beginning of the pri column indicates the
job is running gang scheduled.
PID PGRP USERNAME PRI SIZE RES STATE TIME WCPU% CPU% COMMAND
1241269 1241269 sjohn g20 199M 133M run/17 1:52 13.9 99.69 a.out
1241375 1241269 sjohn g20 199M 133M run/19 1:52 13.9 99.56 a.out
1241481 1241269 sjohn g20 199M 133M run/16 1:53 13.9 99.28 a.out
1239977 1241269 sjohn g20 199M 133M run/18 1:49 13.0 92.66 a.out
1240301 1240301 sjohn 20 3088K 1568K run/27 0:00 1.1 7.60 top
Possible reasons for some codes still running gang-scheduling in 6.5, and
solutions
- The MPC_GANG environment variable is turned on
(as explained in section 2)
Solution:
Check your startup files (.cshrc, .login, etc.) as
well as your batch script to make sure that MPC_GANG is not
turned on.
- Your code is using the routine mp_numthreads, which returns
the number
of threads, enables gang scheduling (by definition, with
dynamic threads, the number of threads can be changed at any time).
Solution:
- Your code uses routines from the parallel (mp) versions of the SGI math
libraries complib or scsl. Many of the parallel
versions of these routines use mp_numthreads to determine the
number of threads, which enables gang scheduling.
Solution: