[3dem] volume compression standard
Ludtke, Steven J.
sludtke at bcm.edu
Tue Apr 28 08:24:06 PDT 2026
Yes, I'm not trying to make a big push for HDF5. It does work for this purpose, it is a usable cross-discipline standard, and CryoEM lacks a format which supports compression natively, but there are tradeoffs. I know CZII is trying really hard to push for ZARR, but I can't imagine they will get much more uptake than HDF5 ever did, as in several respects it is even worse than HDF.
However, it is quite possible to get similar results by using bit truncation to render volumes as 10 or 12 bit in MRC format then using your compression algorithm of choice. The key factor is that by bit reduction you get rid of the pure noise bits which also makes the data dramatically more compressible. For masks, they compress down to virtually nothing, which is critical if you want to do something like a binary mask representation of tomogram segmentation.
We used GZIP (mainly) in that paper largely because it is so standard, and from a librarian/archivist perspective long term support is the most important aspect of an archive, but clearly there are better (faster and more effective) compression algorithms out there.
The major takeaway is that you can both prove and demonstrate that 4-5 bits is sufficient to retain all information in noisy CryoEM images, and for final reconstructions 10-12 bits is demonstrably sufficient.
On Apr 28, 2026, at 9:42 AM, Daniel Larsson <daniel.larsson at icm.uu.se> wrote:
***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
________________________________
Thank you for the reply Steven. Obviously greater minds than mine has thought about this before. EMAN2 has been a champion in file-formats and the HDF5 format is as you say a very flexible general-purpose data container format. I quickly skimmed your paper. From my understanding, one could reduce the bit depth from 32 bit to 12 bit while still being able to maintain the information contents of the file for single-particle data. That is a significant save. Even going to 16 bit, as Guillaume suggested on the CCPEM list, would cut the file size in half. Adding lossless compression algorithms could save you even more, albeit it is difficult to efficiently compress noisy solvent regions. Offlist, it was suggested that the bzip algorithm is able to compress further. /Daniel
From: "Ludtke, Steven J." <sludtke at bcm.edu<mailto:sludtke at bcm.edu>>
Date: Tuesday, 28 April 2026 at 15:07
To: Daniel Larsson <daniel.larsson at icm.uu.se<mailto:daniel.larsson at icm.uu.se>>
Cc: "3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>" <3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>>, Collaborative Computational Project in Electron cryo-Microscopy <CCPEM at JISCMAIL.AC.UK<mailto:CCPEM at JISCMAIL.AC.UK>>
Subject: Re: [3dem] volume compression standard
Just for the sake of argument, I'll throw this out there:
https://urldefense.com/v3/__https://pmc.ncbi.nlm.nih.gov/articles/PMC9645247/__;!!Mih3wA!E0w78TSHIYtE83xQg9q6eylyYScVD6AQFyGKK3nTFmY2tRsq_dMqw4PLQ_lYDdi9nKkg5cwbqb1_8ABwWUk$ <https://urldefense.com/v3/__https://pmc.ncbi.nlm.nih.gov/articles/PMC9645247/__;!!GKvG1w!eBokV4Uz0MQ0H7rP-_4SpLdSKD6JCgyd6qeHtJxFgdMb3gL1XqQtvVWXjcd1A9A44Ls6snSC8JJWjvWxCmSQaW8l3w$>
gzipping only saves significant space if one discritizes the values stored in a float format or uses an int representation.
On Apr 28, 2026, at 7:38 AM, Daniel Larsson via 3dem <3dem at ncmir.ucsd.edu> wrote:
***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
________________________________
Would it not be very beneficiary of the community could agree on a loss-less compression standard for EM volumes (maps and masks)?
High-resolution maps take up significant disk space, often around 500-800 MB for large complexes. Considering that there are many volumes associated with each reconstruction (half-maps, full map, sharpened map, masks, etc) and multiply this by all the reconstructions for a typical project, and it adds up significantly. In addition, both RELION and CryoSPARC save volumes for intermediate iterations. These large files take a long time to download/upload, e.g. transferring between computers or when depositing structures. Opening maps for visualization in ChimeraX and Coot is also slowed down by having to read large files from disk.
So, we need a compression standard. Zip compression is fast and can reduce the file size considerably without changing the information in the file. Mask files are reduced to just a few percent of the original size (example 537 MB to 9 MB). What we need is:
1. Software generating volumes should by default write these in a lossless compressed format
2. Software visualizing volumes should accept and on-the-fly uncompress maps in memory
3. Refinement packages should accept and on-the-fly uncompress maps in memory
4. Repositories should accept compressed maps and un/recompress according to their own needs
Actors that immediately comes to my mind are CryoSPARC, RELION, Warp, CryoDRNG, IMOD, ChimeraX, PyMOL, Coot, Blender Molecular nodes, Phenix, Servalcat, wwPDB/EMDB. The exact format is up for debate, but why not something simple such as gzipped mrc files?
Best regards,
Daniel (written while impatiently waiting for files being uploaded to the wwPDB)
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: https://urldefense.com/v3/__http://www.uu.se/om-uu/dataskydd-personuppgifter/__;!!Mih3wA!E0w78TSHIYtE83xQg9q6eylyYScVD6AQFyGKK3nTFmY2tRsq_dMqw4PLQ_lYDdi9nKkg5cwbqb1_u98EvJ0$ <https://urldefense.com/v3/__http:/www.uu.se/om-uu/dataskydd-personuppgifter/__;!!Mih3wA!BUNKAv9U59Pq6jXGpg_HasLYL0bl-XFcbmYiAET8ZiKt_ru2lzuFBJJ-SqqvDQ3tw1khyJypThew9ThNQozonIt7z2lJrA$>
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: https://urldefense.com/v3/__http://www.uu.se/en/about-uu/data-protection-policy__;!!Mih3wA!E0w78TSHIYtE83xQg9q6eylyYScVD6AQFyGKK3nTFmY2tRsq_dMqw4PLQ_lYDdi9nKkg5cwbqb1_zFy4DT4$ <https://urldefense.com/v3/__http:/www.uu.se/en/about-uu/data-protection-policy__;!!Mih3wA!BUNKAv9U59Pq6jXGpg_HasLYL0bl-XFcbmYiAET8ZiKt_ru2lzuFBJJ-SqqvDQ3tw1khyJypThew9ThNQozonIusG45pUg$> _______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!GKvG1w!c4KeXym_7XyZmm8YaD37WyGM_5btxOKmUoJ71fQnhHEHYyN7mE3s75UPV0cWhg8nk-lDiwJcXfWQbcBjv_kd$<https://urldefense.com/v3/__https:/mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!GKvG1w!c4KeXym_7XyZmm8YaD37WyGM_5btxOKmUoJ71fQnhHEHYyN7mE3s75UPV0cWhg8nk-lDiwJcXfWQbcBjv_kd$>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20260428/b01b6c9d/attachment-0001.html>
More information about the 3dem
mailing list