Abstractions, Accomodations and Applications:
Thoughts on Developing Object-Oriented Software Using Someone Else's Class Library

Chris Myers, Computational Science and Engineering Research Group , Cornell Theory Center

I. Introduction

Despite the fact that object-oriented programming has the potential to facilitate code reuse by enabling the creation of high-level, problem-specific object environments, such methods have not been widely adopted within the computational science community. (Curiously, the difficulties associated with programming on parallel architectures seems to have spurred recent developments in object definition.) There has not been to date a great deal of effort within the community to agree upon common abstractions and object definitions that reflect the underlying structure of the scientific and mathematical problems at hand. Such activities can be nontrivial, requiring expertise in applications, algorithms and fundamental aspects of computer science. Without intuitive and useful abstractions, however, widespread object reuse will continue to be rather slow in developing. The need for such breadth of expertise suggests that affiliated groups can exploit the hierarchical nature of object-oriented languages to focus on individual layers within a larger software environment, whereby appropriately chosen object abstractions can enable accomodations at object interfaces.

As an example of such an affiliation, I have been using (since mid-1994) the LPARX programming system developed by Scott Baden, Scott Kohn and co-workers at UC-San Diego, and very recently I have been experimenting with KeLP, the imminent successor to LPARX, being developed by Baden and Steve Fink. Although I interact considerably with Baden and his group, my efforts have largely been those of an interested and grateful user: interested because I find that LPARX/KeLP does a nice job of creating "intuitive and useful abstractions" for a certain class of problems, and grateful because their high-level constructs spare me from many of the low-level details of message passing that plague much of distributed memory programming. These abstractions and constructs give me the freedom to work toward developing still higher-level objects, at the algorithmic and application levels, in which some understanding and appreciation of the mathematical and scientific phenomena of interest can be put to use. I am developing applications programming interfaces (APIs) to sit on top of LPARX/KeLP, to enable the solution of partial differential equations (PDEs) on parallel computers. I am also incorporating training about parallel object-oriented methods in general and LPARX in particular into the Cornell Theory Center's educational materials.

II. Experiences with LPARX and KeLP

I am interested in solving a broad range of nonlinear, time-dependent PDEs arising in models of physical systems, and have recently studied: frictional slip on earthquake faults, shape deformation and memory effects in martensites, defect formation and electrical noise generation in sliding charge density wave conductors, and instabilities in rapid fracture. The dimensionalities, geometries, boundary conditions and nonlinearities of such models can vary, but the basic algorithmic structure that I have relied on remains constant (finite difference methods, parallelized via coarse-grained domain decomposition, with alternating computation and communication steps).

LPARX has aided me considerably in developing parallelized codes for solving PDEs. This is because LPARX: (1) provides useful abstractions for the manipulation of block-structured domains (primarily through its Region calculus) and (2) uses those abstractions to provide a high-level programming interface for interprocessor communication of the sort that arises in such calculations (block-copy-on-intersection). I will not describe the details of LPARX, but refer rather to the user's guide and related documentation. Instead, I will describe some issues that have arisen in my use of LPARX, to illustrate how such a package can be used and extended.

A. Grids of user-defined objects: simulations of elastic media

Shape-memory alloys are "smart" materials that can be "trained" to remember various shapes by applying particular thermal or stress cycles. With Jim Sethna of Cornell's Laboratory of Atomic and Solid State Physics (LASSP), I have been investigating a two-dimensional Ginzburg-Landau model describing the time-evolution of a nonlinear, deformable elastic medium, intended to describe such shape changes. The model can be phrased as a PDE describing the time evolution of a two-component displacement field, which we integrate with finite-difference methods, requiring us to specify the displacement of each site of a square lattice.

Because LPARX supports the creation of distributed grids of arbitrary objects (and provides support for interprocessor communication of those objects), it is reasonably straightforward to develop a C++ class describing a two-component real vector field and then to create a parallelized grid of such elements (an LPARX XArray), distributed across multiple processors according to a specified decomposition. Simple LPARX applications often work directly with XArrays, but I found it useful to work with a derived class, a DistributedCartesianMesh (publicly derived from XArray1(Grid2(Vec2)), because there are operations that need to be performed on the DistributedCartesianMesh that are not appropriate for the more general XArray. For example, I need to compute strains and local strain energies, but those computations require other information (such as parameters in the Ginzburg-Landau model) that have no meaning in the context of the more general distributed grid.

For simple user-defined objects (containing no pointers), LPARX utilities generate all the necessary code to pass data between processors. In this example, boundary data needs to be passed between neighboring blocks at each time step, involving the communication of one-dimensional arrays of two-component vectors (the faces of each block). Communication of such objects is carried out with the same high-level programming interface as is used for communication of native datatypes.

B. Multigrid and related multilevel methods

The example cited above involves a single uniform distributed grid, but the LPARX group emphasizes that its software is well-suited to algorithms requiring multilevel structured grids, such as multigrid and adaptive mesh refinement (AMR) with nested structured grids. LPARX facilitates the manipulation of single-level grid-like objects because it provides a structural abstraction which separates the layout of the data from the data itself. In order for multilevel calculations to be simple, a software environment needs to implement the same sort of structural abstraction. Therefore, I have been developing multilevel "meshes" describing data layouts and multilevel "fields" describing data on those meshes in a manner analogous to the single-level objects defined in LPARX. I have begun developing multigrid solvers to be used in conjunction with implicit time-stepping schemes for PDEs, with the eventual goal of using structured AMR to resolve localized, coherent dynamical structures that arise in many driven, nonlinear systems.

C. Distributed animation

Graphical displays are extremely useful for developing and debugging software and exploring dynamics in spatially-extended systems. This is even truer for methods using domain decomposition on parallel architectures, where errors that arise are often "geometric" in nature (incorrect communication patterns between blocks, lack of synchronization between processors, etc.). The LPARX distribution contains a simple XWindow class that one can use, e.g., to display the values of a field defined on a grid. The XWindow class is somewhat unwieldy, however, if one wishes to display data distributed across multiple processors. One can designate a processor to be the "visualization node" that displays all such data, but then one must pass alot of data from all processors to the visualization node. Alternatively, one can have every process open an XWindow, and display only its portion of the total data.

A cleaner solution, exploiting the client-server nature of X Windows, involves deriving a DistributedXWindow class from the base XWindow class and arranging for all processes to draw to a common X Window (via a broadcast of the Window ID from one processor to all the others). As long as each processing node is capable of making X client calls (as is the case on our SP2), visualization can be done without the need for expensive data copies to a master node. LPARX's support for a shared grid index space enables the coordination of all the grid patches, so that each patch is placed appropriately in the global XWindow space. This allows for real-time animation of time-evolving data on distributed grids.

D. Performance issues

LPARX has been structured to interoperate with Fortran numerical kernels that efficiently compute on individual grid patches, and the LPARX group uses such an approach in the applications it develops. I have wanted to avoid writing in Fortran, so I have worked to develop applications entirely in C++ without suffering (excessive) performance penalties. Unfortunately, even basic compiler optimizations like strength reduction to efficiently calculate addresses of multidimensional arrays do not get carried out by most C++ compilers. As a result, inner loops which iterate through such an array can be slowed down by a factor of 3 to 4. Fortunately, the FastIndex macros which have been added to LPARX provide a workaround; as a result, with the slightly ungainly code that results, one can write C++ numerical kernels and regain the factor of 3 to 4 in execution speed. There is some promise more generally that the Photon C++ compilers being developed by Kuck and Associates may remedy such problems. Their benchmarks on the "Haney kernels" are promising, although I have yet to test their compiler.

III. Current and future directions

A. Whither LPARX? KeLP washes ashore.

KeLP, the soon-to-be successor to LPARX, will improve interprocessor communication considerably by using an inspector/executor model rather than the Asynchronous Message Streams which underly LPARX. I have been working with a pre-release version of KeLP for a couple of months.

Importantly, KeLP does not present too different an interface for the applications programmer. A few new objects have been created (FloorPlan, MotionPlan, Mover), but the old objects (Point, Region, Grid, XArray) remain, and the API is quite familiar. The major differences, involving the use of MotionPlans and Movers to perform communication in KeLP, will tend to be localized to specific routines, and the high-level block-copy-on-intersection method of specifying communication patterns remains as a cornerstone of the environment.

KeLP is accompanied by a finite-difference API, or it can be used as a substrate for the development of other problem-specific APIs. But KeLP, like LPARX, uses preprocessors and macro substitution to enable the use of different data types and dimensionalities, in lieu of C++ templates. I feel that a general-purpose API for finite-difference methods needs to allow flexibly for the creation of grids of different objects, and the preprocessor approach currently in place is rather limiting in that regard. So I have been working to develop a templatized overlayer to sit on top of KeLP. This has involved defining template classes, which - for specific dimensionalities and datatypes - derive from the concrete classes that KeLP defines. Parts of this have been easy and straightforward, while other parts have been painful and (as yet) unsuccessful, partly (I suspect) because the software was not designed to be extended in such a manner.

B. Some further abstractions for solving PDEs

Let me conclude by mentioning some other abstractions that I feel should be incorporated into a general-purpose API for integrating PDEs using finite-difference methods and multilevel techniques. I perceive a need for an environment that supports the flexible specification and manipulation of various time-stepping schemes and finite-difference approximations. The theory of finite-difference methods has developed a rich and convenient syntax - centered around stencils and computational molecules - used to derive and explain specific approximations and schemes. Computational objects that reflect this syntax would benefit the applications programmer who wishes to implement specific algorithms or explore the relative performance of different algorithms.

Why would such an infrastructure be useful? As an example, explicit and implicit time-stepping schemes are not conceptually so different, but their implementations are vastly different, the latter involving the solution of a coupled set of (possibly nonlinear) equations, which can be nontrivial on a distributed memory computer. Furthermore, different implicit schemes may require alternative solution methods: for example, the optimal algorithm may depend dramatically on the range of a spatial derivative operator. I envision, therefore, that one be able specify the details of the finite-difference approximations and time-stepping schemes in terms of objects such as stencils and molecules, without having to work through the algebraic consequences of such approximations prior to coding. A system of equations will be generated (and solved) using such information. This obviously requires the definition of an object describing a system of equations which can interface (at a lower level) with a number of different solvers; the potential for interoperability with existing libraries such as IML++ and ScaLAPACK certainly exists here, although I have not explored such avenues in any detail.

There are performance considerations that must be dealt with in such a system. My stencil object would allow one to specify, for example, a finite-difference Laplacian operator (to a given order of accuracy) in the same manner that one would describe the stencil for that operator. Queries to the stencil object would provide the numerical coefficients required to calculate the Laplacian. Unfortunately, this type of query is likely to be buried in an inner loop, and accessing the stencil elements will impose a greater overhead than would arise in the case where the stencil coefficients are hard-coded into the subroutine. Whether or not such a general-purpose stencil can be efficiently implemented remains to be seen. Perhaps a solution involving symbolic code generation (with hard-wired coefficients) will prove to be an effective compromise.