next up previous
Next: Non-Symmetric Problem Up: Results Previous: Results

Symmetric Problem

Previously it was mentioned that PETSc has a number of options for the multigrid preconditioner. Figure 1 shows the difference in scalability between two options for the coarse grid solve when using multigrid as a preconditioner to GMRES(30) on a structured grid on the IA32 cluster. Keeping the number of unknowns per processor fixed at $\frac{1}{4}$ million, we would see a horizontal line for ideal scaling. The orange plot shows a direct method, LU factorization, for the coarse grid solve. The blue plot shows an iterative method using the preconditioner only which is set to Block Jacobi. LU does better than the iterative method until 128 processors, when the time for the coarse grid solve blows up to around 700 seconds with LU. This is most likely the result of the poor scaling of the direct solver. Multigrid methods can tolerate an approximate (and cheaper) coarse grid solver and so we will use the block Jacobi preconditioner on the coarse grid for all other PETSc results presented here.

Doing the same scaling study on IA32 using the more standard preconditioning methods implemented in PETSc on an unstructured grid, it becomes more evident how important a good preconditioner is. Figure 2 shows that Jacobi doesn't work very well as a preconditioner on the model problem, but Block Jacobi does. CG preconditioned with Block Jacobi is the best choice from these options.

Figure 3 shows the difference between GMRES preconditioned with multigrid with Block Jacobi on the coarse grid, and CG preconditioned with Block Jacobi. Both are implemented in PETSc on IA32. Indeed, the multigrid preconditioned method scales much better than the others. This demonstrates the better algorithmic scalability of multigrid.

Figure 4 shows hypre solving a structured grid problem on two different architectures. As before, scalability is tested using $\frac{1}{4}$ million unknowns per processor up to 256 processors. The architectures show similar single-processor performance, although the theoretical peak flop rate of the Linux cluster is higher than that of the SGI Origin2000. The slope of the curves is small for the Linux cluster, meaning that it scales well. However, the wall clock time goes up more for the SGI Origin2000 as number of processors increases. This is probably due to the better network connection in the Linux clusters. Comparing the two numerical solvers, GMRES with multi-grid preconditioners is performing better than multi-grid alone. The results from the IA64 cluster have been omitted from the plotted results. The reason for this is that the IA64 clusters are not presently showing improved times when compared to the IA32 clusters. We are investigating what might be causing this.


next up previous
Next: Non-Symmetric Problem Up: Results Previous: Results
John Fettig 2002-09-13