Optimising the processing and storage of visibilities using lossy compression

  • Richard Dodson
  • , Alexander Williamson
  • , Qian Gong
  • , Pascal Elahi
  • , Andreas Wicenec
  • , María J. Rioja
  • , Jieyang Chen
  • , Norbert Podhorszki
  • , Scott Klasky
  • , Martin Meyer

Research output: Contribution to journalArticlepeer-review

Abstract

The next-generation radio astronomy instruments are providing a massive increase in sensitivity and coverage, largely through increasing the number of stations in the array and the frequency span sampled. The two primary problems encountered when processing the resultant avalanche of data are the need for abundant storage and the constraints imposed by I/O, as I/O bandwidths drop significantly on cold storage. An example of this is the data deluge expected from the SKA Telescopes of more than 60 PB per day, all to be stored on the buffer filesystem. While compressing the data is an obvious solution, the impacts on the final data products are hard to predict. In this paper, we chose an error-controlled compressor - MGARD - and applied it to simulated SKA-Mid and real pathfinder visibility data, in noise-free and noise-dominated regimes. As the data have an implicit error level in the system temperature, using an error bound in compression provides a natural metric for compression. MGARD ensures the compression incurred errors adhere to the user-prescribed tolerance. To measure the degradation of images reconstructed using the lossy compressed data, we proposed a list of diagnostic measures, exploring the trade-off between these error bounds and the corresponding compression ratios, as well as the impact on science quality derived from the lossy compressed data products through a series of experiments. We studied the global and local impacts on the output images for continuum and spectral line examples. We found relative error bounds of as much as 10%, which provide compression ratios of about 20, have a limited impact on the continuum imaging as the increased noise is less than the image RMS, whereas a 1% error bound (compression ratio of 8) introduces an increase in noise of about an order of magnitude less than the image RMS. For extremely sensitive observations and for very precious data, we would recommend a error bound with compression ratios of about 4. These have noise impacts two orders of magnitude less than the image RMS levels. At these levels, the limits are due to instabilities in the deconvolution methods. We compared the results to the alternative compression tool DYSCO, in both the impacts on the images and in the relative flexibility. MGARD provides better compression for similar error bounds and has a host of potentially powerful additional features.

Original languageEnglish
Article numbere093
JournalPublications of the Astronomical Society of Australia
Volume42
DOIs
StatePublished - Jul 29 2025

Keywords

  • Techniques: interferometric
  • astronomical instrumentation
  • methods and techniques
  • methods: data analysis

Fingerprint

Dive into the research topics of 'Optimising the processing and storage of visibilities using lossy compression'. Together they form a unique fingerprint.

Cite this