Automatic Generation of Algorithms for High-Speed Reliable Lossy Compression


Martin Burtscher (PI)
Sheng Di (Co-PI)
Franck Cappello (senior advisor)
Noushin Azami (Ph.D. student)
Brandon Burtchell (Ph.D. student)
Alex Fallin (Ph.D. student)
Yiqian Liu (Ph.D. student)
Kai Zhao (collaborator)

Project Summary

Fast reliable data compression is urgently needed for many leading-edge scientific instruments and for exascale high-performance computing applications because they produce vast amounts of data at extremely high rates. The goal of this project is to develop a high-speed reliable lossy-compression framework named LC that meets three critical needs: (i) improving the trustworthiness of lossy compression methods and the data reduction quality, (ii) increasing the compression/decompression speed to match the high data generation/acquisition rates, and (iii) supporting progressive compression and decompression with multiple levels of resolution to meet the demands of today's leading scientific applications and instruments.

The project comprises the following three research thrusts. (1) To address the trustworthiness and data-reduction-quality issues, the LC framework will allow users to synthesize customized algorithms for the coding stage in the lossy compression pipeline, optimizing the quality-of-interest preservation and compression ratio. To this end, LC will provide a very large tradeoff space with numerous coding algorithms to choose from and automatically emit the code of the optimal configuration with reliable execution time bounds. (2) To address the speed challenge, we will develop lightweight error-bounded decorrelation strategies, high-speed data predictors, efficient quantization methods, and a new class of encoders called 'essentially lossless' that will compress faster and better than the current state of the art. We will also parallelize the LC framework as well as the generated compression/decompression codes both for CPUs and GPUs and will create algorithms that are portable across heterogeneous architectures. (3) To enable users to build their own multi-resolution progressive compressors, we will extend LC to support the generation of progressive algorithms that meet user requirements adaptively by employing a hierarchical block-wise tree-based structure that can suppress subtrees on demand. The resulting fast reliable lossy compression framework will greatly benefit the many scientific applications that need not only high trustworthiness but also high performance.

DOE Press release

Texas State press release

Overview slide


N. Azami and M. Burtscher. "Compressed In-memory Graphs for Accelerating GPU-based Analytics." 12th SC Workshop on Irregular Applications: Architectures and Algorithms. November 2022. [pdf] [pptx]

Code Releases

Compressed In-memory Graphs for Accelerating GPU-based Analytics: MPLG code

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Research (ASCR), under contracts DE-SC0022223 and DE-AC02-06CH11357.

Official Texas State University Disclaimer