Abstract
A crude but commonly used technique for compressing ordered scientific data consists of simply retaining every sth datum (with a value of s = 10 generally the default) and discarding the remainder. Should the value of a discarded datum be required afterwards, an approximation is generated by linear interpolation of the two nearest retained values. Despite the widespread use of this and similar techniques, there is little by way of theoretical analysis of their expected performance. First, we quantify the accuracy achieved by linear interpolation when approximating values discarded by decimation, obtaining both deterministic bounds in terms of appropriate smoothness measures of the data and probabilistic bounds in terms of statistics of the data. Second, we investigate the efficiency of the lossless compression scheme consisting of decimation coupled with encoding of the interpolation errors. In particular, we bound the expected compression ratio in terms of the appropriate measures of the data. Finally, we provide numerical illustrations of the practical performance of the algorithm on some real datasets.
Original language | English |
---|---|
Pages (from-to) | B732-B757 |
Journal | SIAM Journal on Scientific Computing |
Volume | 39 |
Issue number | 4 |
DOIs | |
State | Published - 2017 |
Funding
∗Submitted to the journal’s Computational Methods in Science and Engineering section July 25, 2016; accepted for publication (in revised form) April 4, 2017; published electronically August 30, 2017. http://www.siam.org/journals/sisc/39-4/M108624.html Funding: This work was partially supported by DOE Storage Systems and Input/Output for Extreme Scale Science project, announcement LAB 15-1338, and DOE and UT–Battelle, LLC, contract DE-AC05-00OR22725. †Division of Applied Mathematics, Brown University, 182 George St., Providence, RI 02912 and Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 (mark [email protected]). ‡Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 ([email protected]). §Division of Applied Mathematics, Brown University, 182 George St., Providence, RI 02912 (ben [email protected]).
Keywords
- 68P30
- 94A24
- Decimation
- Lossless compression
- Lossy compression
- Predictive coding AMS subject classifications. 95B65