IBM Books

Hitchhiker's Guide


Tuning the performance of a parallel application

There are two approaches to tuning the performance of a parallel application.

Both of these techniques yield comparable results. The difference is in the tools that are used in each of the approaches, and how they are used.

Note:
It may not be possible to use some tools in a parallel environment in the same way that they're used in a sequential environment. This may be because the tool requires root authority and POE restricts the root ID from running parallel jobs. Or, it may be because, when the tool is run in parallel, each task attempts to write into the same files, thus corrupting the data. tprof is an example of a tool that falls into both of these categories.

With either approach, you use the standard sequential tools in the traditional manner. When you tune an application and then parallelize it, observe the communication performance, how it affects the performance of each of the individual tasks, and how the tasks affect each other. For example, does one task spend a lot of time waiting for messages from another? If so, perhaps you need to rebalance the workload. Or if a task starts waiting for a message long before it arrives, perhaps it could do more algorithmic processing before waiting for the message. When an application is parallelized and then tuned, you need a way to collect the performance data in a manner that includes both communication and algorithmic information. That way, if the performance of a task needs to be improved, you can decide between tuning the algorithm or tuning the communication.

This section will not deal with standard algorithmic tuning techniques. Rather, we will discuss some of the ways PE can help you tune the parallel nature of your application, regardless of the approach you take. To illustrate this, we'll use two examples.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]