NCSA Home
Contact Us | Intranet | Search

Debugging in the ChaMPIon/Pro environment (tungsten)

The ChaMPIon/Pro MPI environment on the Xeon Cluster (tungsten) supports a couple of debugging options. The gdb or idb debuggers can be invoked by cmpirun as in the first example below and each rank will start in its own xterm, or the MPI environment can print debug output for MPI calls similar to the strace facility showing arguments as addresses in the MPI calls of your code. See the cmpirun man page for more information about these options.


Debugging with gdb or idb [requires xterms, scales to about 8 ranks]

Debugging with traced MPI calls [scales to any size, verbose output possible]

cmpirun with gdb or idb debuggers

Make sure that you can display an xterm by setting your DISPLAY variable properly.
CMPI exports the user environment to each MPI process so you shouldn't need to modify your ${HOME}/.soft file or shell resource files.
cmpirun can support parallel debugging with gdb or idb with the following flags.

 -gdb <list>   ----> run gdb on list of processes
or
 -idb <list>   ----> run idb on list of processes
where <list> should be a comma-separated set of ranks (e.g. "0,1,2,3") or any valid Perl list (e.g. "0..3").
Using an interactive batch session, the following will get you started:
% bsub -Is -n4 $SHELL   # (default wall clock is 30 minutes) 
% cmpirun -lsf -gdb "0..3" ./foo

OR

% cmpirun -lsf -idb "0..3" ./foo
You should get an xterm for each MPI process.
The processes will start up automatically and run to completion or to where a runtime error is found.
The usual gdb or idb prompt will appear in the problem process's xterm.
Note that idb is started with the gdb interface enabled so it will accept gdb commands.

TIP: A .gdbinit file in the working directory is read by the gdb debugger so you can rig each rank to stop at a certain line by doing something like:
     echo "break 15" > .gdbinit
...if you want each rank to stop at your source code line 15.

cmpirun and setting MSTI_PRINT

Note, this example was run from a batch job.  Use an interactive batch job [bsub -Is -n2 -W00:30 -q debug] or
include the setenv command in your batch script to run with traced MPI calls using MSTI_PRINT.

[arnoldg@tuna121 ~/mpi]$ setenv MSTI_PRINT "API,DEBUG"                                                                          
[arnoldg@tuna121 ~/mpi]$ cmpirun -np 2 allall 2 2 2
MPI [0] MPI_Init(argc=0xbffff110, argv=0xbffff114)
MPI [0] MPI_Init(argc=0xbfffd810, argv=0xbfffd814)
MPI [1] MPI_Comm_rank(comm=0, rank=0xbfffd7a8)
MPI [1] MPI_Comm_size(comm=0, size=0xbfffd7ac)
MPI [1] MPI_Barrier(comm=0)
MPI [0] MPI_Comm_rank(comm=0, rank=0xbffff0a8)
MPI [0] MPI_Comm_size(comm=0, size=0xbffff0ac)
MPI [0] MPI_Barrier(comm=0)
MPI [1] MPI_Irecv(buf=0x80567c8, count=2048, datatype=0, source=0, tag=99, comm=0, request=0x8051b38)
MPI [1] MPI_Irecv(buf=0x80567c8, count=2048, datatype=0, source=0, tag=99, comm=0, request=0x8051b3c)
MPI [1] MPI_Barrier(comm=0)
MPI [0] MPI_Irecv(buf=0x80567c8, count=2048, datatype=0, source=1, tag=99, comm=0, request=0x8051b40)
MPI [0] MPI_Irecv(buf=0x80567c8, count=2048, datatype=0, source=1, tag=99, comm=0, request=0x8051b44)
MPI [0] MPI_Barrier(comm=0)
MPI [0] MPI_Irecv(buf=0xbffed748, count=32, datatype=0, source=1, tag=55, comm=0, request=0xbfffd128)
MPI [0] MPI_Wtime()
MPI [0] MPI_Send(buf=0x8056fd0, count=2048, datatype=0, dest=1, tag=99, comm=0)
MPI [0] MPI_Send(buf=0x8056fd0, count=2048, datatype=0, dest=1, tag=99, comm=0)
MPI [0] MPI_Wait(request=0xbfffd128, status=0xbffff068)
MPI [1] MPI_Waitall(count=2, array_of_requests=0x8051b38, array_of_statuses=0x8052208)
MPI [1] MPI_Send(buf=0xbffebe48, count=32, datatype=0, dest=0, tag=55, comm=0)
MPI [1] MPI_Barrier(comm=0)
MPI [0] MPI_Wtime()
MPI [1] MPI_Irecv(buf=0xbffebe28, count=32, datatype=0, source=0, tag=55, comm=0, request=0xbfffb82c)
MPI [1] MPI_Wtime()
MPI [1] MPI_Send(buf=0x8056fd0, count=2048, datatype=0, dest=0, tag=99, comm=0)
MPI [1] MPI_Send(buf=0x8056fd0, count=2048, datatype=0, dest=0, tag=99, comm=0)
MPI [1] MPI_Wait(request=0xbfffb82c, status=0xbfffd768)
Node 0 Complete...
MPI [0] MPI_Barrier(comm=0)
MPI [0] MPI_Waitall(count=2, array_of_requests=0x8051b40, array_of_statuses=0x8052228)
MPI [0] MPI_Send(buf=0xbffed728, count=32, datatype=0, dest=1, tag=55, comm=0)
MPI [0] MPI_Barrier(comm=0)
MPI [1] MPI_Wtime()
Node 1 Complete...
MPI [1] MPI_Barrier(comm=0)
MPI [0] MPI_Finalize()
MPI [1] MPI_Finalize()
[arnoldg@tuna121 ~/mpi]$