Applications of Parallel Programming in Statistics

Vanessa Beddo
Ph.D., 2002
Advisor: Jan de Leeuw

Statistics in the past has been plagued with massive data sets and computationally demanding algorithms. Parallel computing is one strategy for tackling these problems. Parallel computing is a powerful tool that allows a programmer to employ several machines, or processors, during program execution. This means that an algorithm must be partitioned in such a manner as to be made optimal for the statistical problem at hand. Machine architecture should be the first consideration in the development of an efficient parallel procedure. This collection of parallel statistical algorithms has been optimized for distributed memory machines and can be implemented on a cluster of workstations. This architecture, also referred to as a Multiple Instruction, Multiple Data (MIMD) type, has a high inter-node communication overhead. As a result, many types of algorithms are adversely affected by cluster designs. However, the design of all the algorithms in this dissertation achieve impressive, if not optimal, speedups using the Message Passing Interface (MPI).