Abstractions, Accomodations and Applications:
Chris Myers,
Computational Science and Engineering Research Group
,
Cornell Theory Center
Thoughts on Developing Object-Oriented Software
Using Someone Else's Class Library
As an example of such an affiliation, I have been using (since mid-1994) the LPARX programming system developed by Scott Baden, Scott Kohn and co-workers at UC-San Diego, and very recently I have been experimenting with KeLP, the imminent successor to LPARX, being developed by Baden and Steve Fink. Although I interact considerably with Baden and his group, my efforts have largely been those of an interested and grateful user: interested because I find that LPARX/KeLP does a nice job of creating "intuitive and useful abstractions" for a certain class of problems, and grateful because their high-level constructs spare me from many of the low-level details of message passing that plague much of distributed memory programming. These abstractions and constructs give me the freedom to work toward developing still higher-level objects, at the algorithmic and application levels, in which some understanding and appreciation of the mathematical and scientific phenomena of interest can be put to use. I am developing applications programming interfaces (APIs) to sit on top of LPARX/KeLP, to enable the solution of partial differential equations (PDEs) on parallel computers. I am also incorporating training about parallel object-oriented methods in general and LPARX in particular into the Cornell Theory Center's educational materials.
LPARX has aided me considerably in developing parallelized codes for solving PDEs. This is because LPARX: (1) provides useful abstractions for the manipulation of block-structured domains (primarily through its Region calculus) and (2) uses those abstractions to provide a high-level programming interface for interprocessor communication of the sort that arises in such calculations (block-copy-on-intersection). I will not describe the details of LPARX, but refer rather to the user's guide and related documentation. Instead, I will describe some issues that have arisen in my use of LPARX, to illustrate how such a package can be used and extended.
Because LPARX supports the creation of distributed grids of arbitrary
objects (and provides support for interprocessor communication of
those objects), it is reasonably straightforward to develop a C++ class
describing a two-component real vector field and then
to create a parallelized grid of such elements
(an LPARX XArray
), distributed across multiple
processors according to a specified decomposition. Simple
LPARX applications often work directly with XArray
s,
but I found it useful to work with a derived class, a
DistributedCartesianMesh
(publicly derived from
XArray1(Grid2(Vec2)
), because there are
operations that need to be performed on the
DistributedCartesianMesh
that are not appropriate for the
more general XArray
. For example, I need
to compute strains and local strain energies, but those computations
require other information (such as parameters in the Ginzburg-Landau model)
that have no meaning in the context of the more
general distributed grid.
For simple user-defined objects (containing no pointers), LPARX utilities generate all the necessary code to pass data between processors. In this example, boundary data needs to be passed between neighboring blocks at each time step, involving the communication of one-dimensional arrays of two-component vectors (the faces of each block). Communication of such objects is carried out with the same high-level programming interface as is used for communication of native datatypes.
XWindow
class that one can use, e.g.,
to display the values of a field defined on a grid. The
XWindow
class is somewhat unwieldy, however, if one
wishes to display data distributed across multiple processors. One
can designate a processor to be the "visualization node" that displays
all such data, but then one must pass alot of data from all processors
to the visualization node. Alternatively, one can have every process
open an XWindow
,
and display only its portion of the total data.
A cleaner solution, exploiting the client-server nature of X Windows,
involves deriving a DistributedXWindow
class from the
base XWindow
class and arranging for all processes to
draw to a common X Window (via a broadcast of the Window ID from one
processor to all the others). As long as each processing node is
capable of making X client calls (as is the case on our SP2),
visualization can be done without the need for expensive data copies
to a master node. LPARX's support for a shared grid index space
enables the coordination of all the grid patches, so that each patch
is placed appropriately in the global XWindow
space.
This allows for real-time animation of time-evolving data on
distributed grids.
FastIndex
macros which have been added to LPARX provide a
workaround; as a result, with the slightly ungainly code that results,
one can write C++ numerical kernels and regain the factor of 3 to 4 in
execution speed. There is some promise more generally that the Photon C++
compilers being developed by Kuck
and Associates may remedy such problems. Their benchmarks on the
"Haney kernels" are
promising, although I have yet to test their compiler.
Importantly, KeLP does not present too different an interface for the
applications programmer. A few new objects have been created
(FloorPlan, MotionPlan, Mover
), but the old objects
(Point, Region, Grid, XArray
) remain, and the API is
quite familiar. The major differences, involving the use of
MotionPlan
s and Mover
s to perform
communication in KeLP, will tend to be localized to specific routines,
and the high-level block-copy-on-intersection method of specifying
communication patterns remains as a cornerstone of the environment.
KeLP is accompanied by a finite-difference API, or it can be used as a substrate for the development of other problem-specific APIs. But KeLP, like LPARX, uses preprocessors and macro substitution to enable the use of different data types and dimensionalities, in lieu of C++ templates. I feel that a general-purpose API for finite-difference methods needs to allow flexibly for the creation of grids of different objects, and the preprocessor approach currently in place is rather limiting in that regard. So I have been working to develop a templatized overlayer to sit on top of KeLP. This has involved defining template classes, which - for specific dimensionalities and datatypes - derive from the concrete classes that KeLP defines. Parts of this have been easy and straightforward, while other parts have been painful and (as yet) unsuccessful, partly (I suspect) because the software was not designed to be extended in such a manner.
Why would such an infrastructure be useful? As an example, explicit and implicit time-stepping schemes are not conceptually so different, but their implementations are vastly different, the latter involving the solution of a coupled set of (possibly nonlinear) equations, which can be nontrivial on a distributed memory computer. Furthermore, different implicit schemes may require alternative solution methods: for example, the optimal algorithm may depend dramatically on the range of a spatial derivative operator. I envision, therefore, that one be able specify the details of the finite-difference approximations and time-stepping schemes in terms of objects such as stencils and molecules, without having to work through the algebraic consequences of such approximations prior to coding. A system of equations will be generated (and solved) using such information. This obviously requires the definition of an object describing a system of equations which can interface (at a lower level) with a number of different solvers; the potential for interoperability with existing libraries such as IML++ and ScaLAPACK certainly exists here, although I have not explored such avenues in any detail.
There are performance considerations that must be dealt with in such a system. My stencil object would allow one to specify, for example, a finite-difference Laplacian operator (to a given order of accuracy) in the same manner that one would describe the stencil for that operator. Queries to the stencil object would provide the numerical coefficients required to calculate the Laplacian. Unfortunately, this type of query is likely to be buried in an inner loop, and accessing the stencil elements will impose a greater overhead than would arise in the case where the stencil coefficients are hard-coded into the subroutine. Whether or not such a general-purpose stencil can be efficiently implemented remains to be seen. Perhaps a solution involving symbolic code generation (with hard-wired coefficients) will prove to be an effective compromise.