 |
  |
Since parallelizing Path Integral Monte Carlo (PIMC) on the SGI/CRAY's Origin2000, we've seen a performance rate that is five times higher than before and we expect it to increase further. We achieved this spectacular improvement in performance by altering our program in two ways. First we optimized the serial code. Using pixie we generated a profile of the amount of time spent in each routine. The profile is nearly flat, indicating the program spends time in a relatively large number of routines, which makes it more difficult to gain a considerable speedup. Beginning at top of the profile, we studied the performance of each subroutine. Most speedups were gained by better utilizing the cache. In some cases we simply interchanged the loops and array indices. In other cases we introduced temporary arrays, which increased the number of cache hits.
The other way in which we improved performance was by altering our MPI code--written in Fortran 77. The system of paths is split among several processors, each working on a particular section. The boundaries are fixed temporarily, but after a number of Monte Carlo steps, new boundaries are introduced and the system is redistributed among the processors. Some Monte Carlo steps, particularly redistribution, require a lot of communication. We improved the scalability by minimizing the number and size of the messages. In some cases it was much faster to recompute a particular array than communicating it.
As a result of these changes, we achieved a good speedup on as many as eight processors.

|