Excessive energy consumption is a major constraint when designing and deploying the next generation of supercomputers. Minimizing energy consumption of high performance computing requires novel energy-conscious technologies at multiple layers from architecture, system support, and applications. One obstacle that hinders the exploration of these new technologies is the lack of tools and systems that can provide accurate, fine-grained, and real-time power and energy measurement for technology evaluation and verification.

This project bridges the gap by building Marcher, a heterogeneous high performance computing infrastructure equipped with cutting-edge power-efficient accelerators including Intel Many Integrated Cores and Nvidia Graphics Processing Units, power-aware memory systems, hybrid storage with hard disk drives and solid state disks, and high performance interconnects. The Marcher system supports the development of two complementary component-level power measurement tools for major computer components: (i) pluggable Power Data Acquisition Card (PODAC) for direct and decomposed power measurement and (ii) Software Power Meter (SoftMeter) that indirectly estimates the power consumption of systems where direct measurement is not feasible or too costly.

Upon completion of this project, both PODAC and SoftMeter will be made available to a broader community and researchers to establish their own power-aware systems. Marcher will be open to external research groups and provide users with comprehensive and detailed performance and power profiles to aid the research in energy efficient software design and system development.

This project addresses optimizing energy efficiency in the execution of parallel algorithms. High energy cost is a salient constraint when running large scale parallel applications on the next generation of supercomputers that contain heterogeneous multicore processors and interconnections, motivating a rethinking of conventional approaches to modeling, designing and scheduling parallel tasks by taking energy-efficiency into consideration. In this project, we collaborate with Marquette University and UC-Riverside to explore energy-efficient parallel task design and scheduling as well as develop a power profiling tool that can measure decomposed runtime power consumption of different computing components (e.g. processors, memory, networks and disks).