The VProf visual profiler from
Sandia National Laboratories
is available on the NCSA Xeon cluster (Tungsten), TeraGrid Itanium
2 cluster (Mercury), and SGI Altix (Cobalt).
VProf was developed by
Curtis Janssen.
The
VProf Home Page provides more information and a brief user guide.
In many cases, you'll need to make no source code changes to use VProf.
You'll only need to recompile and relink your application. The
recompilation step is necessary in order to include symbol information
that VProf uses. Refer to the
VProf User Guide
for details.
Two screenshots of VProf in action are available
here.
[an error occurred while processing this directive]
The VProf package consists of a library and two tools: the graphical tool
vprof and the text-based tool cprof.
VProf allows you to profile your application in several different ways:
-
Statistical sampling of the program counter (as with traditional profilers),
using the
profil(3) subroutine. This is the default
method of profiling.
-
Using hardware performance counter statistics gathered through the
PAPI library
from the Innovative Computing Laboratory at the University
of Tennessee-Knoxville.
-
A direct interface to the x86 Linux "perfctr"
performance
counter kernel patch
developed by
Mikael Pettersson.
(this option is not available on Tungsten as the perfctr support
within VProf 0.12 does not include Pentium 4 performance counters)
[an error occurred while processing this directive]
-
/usr/apps/tools/vprof/bin
/usr/projects/perftools/vprof/bin (on Mercury)
-
The command-line utilities vprof and cprof.
-
/usr/apps/tools/vprof/lib
/usr/projects/perftools/vprof/lib (on Mercury)
-
Libraries and object file that you can link with your application.
[an error occurred while processing this directive]
The performance counters on a number of different platforms,
including Xeon, support hardware interrupt on overflow. You can
think of this in the same way that you might with traditional time-based
profiling. With time-based profiling, an "alarm" is set
periodically. When that alarm expires after some amount of time
(the threshold), the program counter is sampled and the position
in the program is noted. In the same way, hardware counters that support
interrupt on overflow can be programmed to notify software when a given
number of a particular event (for example, level 2 cache misses) have
occurred. This is a very flexible generalization of traditional statistical
profiling techniques.
Unfortunately, setting the proper "threshold" at which
the alarm will go off is not uniform across events (for example,
level 1 cache misses probably occur far more frequently than
translation lookaside buffer misses). You will need to adjust
the threshold yourself according to the particular hardware event
that you are monitoring.
At NCSA, VProf has been modified to accept the
environment variable VMON_FREQ. You can set this
variable to any integer value, which will be used during the
run of your program as the interrupt threshold. You'll probably
want to experiment with this variable to find a value that gives
you the best results: too high of a value will result in an
inexact profile, while too low of a value will likely slow down
the execution of your application significantly due to excessive
calls to the interrupt handler (it's probably best to start high
and decrease the value until you are getting reasonable results).
By default, VMON_FREQ is set to 100000.
[an error occurred while processing this directive]
Here are basic instructions for preparing your program for profiling
and then using the VProf graphical or text-based tools. You can find
more detailed information in the
VProf User Guide.
- Important note
-
If possible, you should link your application
statically (with the option -static). As mentioned
in the VProf documentation, if routines in shared libraries are sampled,
they will be outside of the range of VProf's profiling buffer and
no information about the event will be recorded. If, when linking
statically, you encounter link-time errors referring to missing symbols
with "pthread" in their name, then you should try adding the
flag -lpthread to the end of your link line.
If you cannot link your application statically, you can still run
your dynamically-linked program and obtain a profile but you should
be aware of the possibility of missing samples.
- Tip
-
Because VProf can make use of the PAPI library, you may
want to review the
NCSA PAPI page for more complete instructions on that software
and the steps to follow when building a program that uses PAPI.
This page also contains a listing of the PAPI events that are available
on the Xeon platform (these are the values that you might supply to VProf
using the VMON environment variable).
- Generating VProf profiles with PerfSuite
-
You can also use the tools psrun and psprocess, which are
part of PerfSuite, to
generate VProf-format profiles from your application. This provides
a way to use cprof and/or vprof without the need for
relinking. Note that psrun requires dynamic linking and therefore
will only generate VProf profiles that contain information from your main
program (you can use psprocess independently from VProf
to view shared library profile data).
Compiling and linking a single-processor program
To compile and link the single-processor program "myprog.f"
for VProf-profiling on the Tungsten cluster:
% ifc -c -g myprog.f
% ifc -static -o myprog myprog.o \
/usr/apps/tools/vprof/lib/vmonauto_gcc.o \
-L/usr/apps/tools/vprof/lib -L/usr/apps/tools/papi3/lib \
-lvmon -lpapi
After your program completes successfully, you should have a single file
named "vmon.out" that contains the result of profiling.
Compiling and linking an MPI program
To compile and link the MPI program "mpiprog.f"
for VProf-profiling on the Tungsten cluster:
- ChaMPIon/Pro
-
% cmpifc -c -g mpiprog.f
% cmpifc -static -o mpiprog mpiprog.o \
/usr/apps/tools/vprof/lib/vmonauto_pmpi.o \
-L/usr/apps/tools/vprof/lib -L/usr/apps/tools/papi3/lib \
-lvmon -lpapi
- MPICH-GM
-
% mpif77 -c -g mpiprog.f
% mpif77 -static -o mpiprog mpiprog.o \
/usr/apps/tools/vprof/lib/vmonauto_pmpi.o \
-L/usr/apps/tools/vprof/lib -L/usr/apps/tools/papi3/lib \
-lvmon -lpapi -lpthread
After your program completes successfully, you should have one or
more output files in your working directory. Each will be named
"vmon.out.ID", where "ID" is an integer that
corresponds to the MPI task ID assigned during the run.
To enable automatic profiling of your program (as shown in these
examples), you should link in an additional object file, depending
on the programming model in use. You can choose from:
-
vmonauto_gcc.o
-
For use with serial applications, this file causes profiling to
start when your application begins and terminates profiling just before
your program exits.
-
vmonauto_pmpi.o
-
For use with MPI applications, this file will cause profiling to begin
when
MPI_Init() is called and will terminate profiling
when MPI_Finalize() is called.
Running Your Program and Obtaining a VProf Profile
Before you run a VProf-linked program, you should select the
type of profiling you'd like by setting the environment variable
VMON appropriately (refer to the
VProf User Guide
for details).
For example, to obtain a profile based on the number of total
floating point operations during the run of your program as measured
by PAPI, you would enter:
% setenv VMON PAPI_FP_OPS
- Note:
-
MPICH-GM only passes the environment variables DISPLAY and
LD_LIBRARY_PATH to the remote tasks by default, so to accomplish
the setting of the VMON environment variable with MPICH-GM, you
may want to use the following form for your MPI job launch command:
mpirun.ch_gm VMON=PAPI_FP_OPS -np X mpiprog
If you've linked your application dynamically, you'll also want to set up
your environment properly so that the PAPI shared library can be located at
runtime. On Tungsten, you can use the SoftEnv package for this:
% soft add +papi3
Then run your program as you normally would. If all goes well, you
will have one or more VProf profiles in your working directory as
described above.
Viewing The Results
To view the profiles, all you need to do is to invoke cprof or
vprof, supplying the name of your executable program followed
by the names of the "vmon.out" files that you'd like to
view (you can also do this from inside vprof). For example,
% cprof -e myprog vmon.out
In this example, we provide the option -e, which asks
cprof to display "everything" in the profiles.
Without this option, you'll receive a brief summary of the information
contained in the profiles. cprof supports the option -h
which will display a summary of this and other options that you can use
to tailor the output according to your needs.
Note that you can supply multiple VProf profiles on the command line
(for example, when working with output from multiple MPI tasks) and
VProf's tools will present the results in an aggregate form. Here's an
example:
% cprof -e myprog vmon.out.0 vmon.out.1 vmon.out.2 vmon.out.3
[an error occurred while processing this directive]
Please refer to the
VProf User Guide
for additional information about VProf.