[3dem] volume compression standard
Greg Pintilie
gregdp at gmail.com
Wed Apr 29 06:29:41 PDT 2026
> On Apr 29, 2026, at 3:52 AM, Tim Gruene via 3dem <3dem at ncmir.ucsd.edu> wrote:
>
> On Tue, 28 Apr 2026 15:40:23 +0000
> "Ludtke, Steven J. via 3dem" <3dem at ncmir.ucsd.edu> wrote:
>
>> Also, just to point out that float16 is not a great choice for final
>> maps for several different reasons:
>
> Interesting discussion. Could anyone explain the reason for storing
> data as floats? There are detectors that produce output as floats, like
> charge integrating detectors used at free electron lasers, but for TEM,
> they are not used, are they? An most detectors produce integer output,
> don't they? And the needed dynamic range is probably not very high in
> TEM imaging either, is it?
>
I’ll have a go… technically a 10 bit int and a 10 bit float can represent almost same number of unique numbers (float has some reserved bits for NaN, inf, etc.). With float, the points are more closely spaced together around 0 (due to the exponent part), and spaced further apart for large numbers. A float can hence represent larger and smaller numbers than an int can, but it’s not just about the dynamic range as you say, floats trade uniform precision for dynamic range, and better precision around 0 in particular. Which one is appropriate for what, and why instruments use ints I am not sure about, but for numerical processing, probably float wins.
I like a lot that Steven pointed out that in a visually and information related way, a float16 and float10 make very little difference. However in implementing various numerical methods, we get better results when using the maximum precision possible to start and during processing, even if the input data does not have this precision to begin with - due to the intermediary averaging, interpolation, etc. Having the highest precision to start with also helps. I.e. if I start from a compressed image and process it, the results may (and probably will) be worse if starting with an uncompressed image (unless the compression is lossless of course).
There is also an argument for needing the full noisy volume, rather than a masked one, because some methods may try to characterize the background noise in ‘denoising’ or local resolution estimation (Remap for example).
So, sorry Daniel, but you and all of us will probably have to keep waiting to upload the highest precision possible and full data rather than compressed, at least that’s what I think method developers would want, and for archival purposes.
In terms of downloading the data for visualization purposes, compressed formats are amazing. I am still amazed seeing a 10MB HDF file look pretty much the same as a 2GB map in MRC format. I can also see why CZI is adopting the zarr format, because for tomograms and SEM data you basically start seeing the image right away rather than waiting to download huge file as you are going through slices, e.g. with neuroglancer, and the datasets from SEM in particular can be huge, 100GB, even higher than 1TB.
What would be great is if EMDB, say, would produce (or accept) various compression formats as well as the maximum precision data, and the visualization software or developer gets to pick which one to get. Producing each format from ‘raw’ data is possible, rather than relying on the depositors to upload it... I think there are good libraries these days for the widely used formats, but of course maintaining and checking various formats and libraries is also not trivial.
One of my favorite quotes in CS is
“The wonderful thing about standards is that there are so many of them to choose from"
(by Andrew S. Tanenbaum in his book, Computer Networks, 1980)
Greg
> Best,
> Tim
>
> --
> --
> Tim Gruene
> Head of the Core Facility Crystal Structure Analysis
> Faculty of Chemistry
> University of Vienna
>
> Phone: +43-1-4277-70202
>
> https://urldefense.com/v3/__https://ccsa.univie.ac.at__;!!Mih3wA!DI9uSfMYL-v5HA8qCjDFJtiebOPu9ckwa-XJzppWXfOjEM-OO36kRuBGAzp8PpL-Lr7B9I0Fax4VXJeR$
>
> GPG Key ID = A46BEE1A
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
More information about the 3dem
mailing list