[3dem] [ccpem] MRC file format (Compressing cryo-EM data to 8-bits/pix and beyond)

Marin van Heel marin.vanheel at googlemail.com
Fri Jun 19 02:14:16 PDT 2015


Hi Axel

Come to think about it... I don't think that amount of data is actually 
very much!

Let's assume the maximum exposure you want your biological cryo-EM to 
suffer is ~100 el/A2. (This irrespective of whether you measure for 1 
sec or for one hour). Then, what you would be interested in beyond ~1000 
frames, are single bit ~4096x4096 pixel images. (We are not interested 
in the blasted remains of what once was our sample at > 100el/A2). Each 
of these YES/NO images would record whether in a given time interval ONE 
electron has arrived in a given pixel. (Of course, each of these binary 
images will contain 90% zeros.) The maximum amount of data you would 
ever need to collect from one area of the sample would thus be: 16 
Megabit * 1024 = 16 Gigabit = 2 Gigabyte.Since, however, this data would 
contain 90% zeroes it should be loss-less-compressible to less than 200 
Megabyte. AU costs? A 100 AU (£/$/Eu) hard-drive should hold more that 
10,000 of these worst-case-scenario images. That is not much different 
from collecting 10 frames per movie. Since your productivity - at 10 sec 
overall exposure - will have dropped ten-fold compared to a 1-sec 
overall exposure, you actually will spend less money on your data 
storage per day. That is simply because you will be collecting less 
information per day, and thus need to collect data over a ten times 
longer period. But... your equipment continues to cost 5000 AUs per day!

Hope this helps,

Marin



On 17/06/2015 19:46, Axel Brilot wrote:
> Dear John, Marin, Pawel,
>
> Those raw K2 counts are already the result of compression (which may 
> not be lossless). The raw data coming off a K2 camera chip is at 400 
> fps, and not counted. I wonder what the RU cost of storing 
> uncompressed data at 400 fps for an ~10s exposure, never mind the 
> infrastructure that would have to be built to accommodate moving all 
> that data.
>
> Best wishes,
>
> Axel
>
>
> On Tue, Jun 16, 2015 at 9:02 AM, John Rubinstein 
> <john.rubinstein at utoronto.ca <mailto:john.rubinstein at utoronto.ca>> wrote:
>
>     ‎Dear Marin and Pawel,
>
>     For the K2 camera that outputs counts (e.g. 1, 2, 3, 4, etc) there
>     is no loss of information in storing these numbers  as 4 bit or 8
>     bit as long as you don't exceed the highest integer that the data
>     type can hold. Any excess bits just hold 0s. Storing as 32 bits
>     does not cost much but it also has no purpose.
>
>     Best wishes,
>     John
>
>     Sent from my BlackBerry 10 smartphone.
>     *From: *Marin van Heel
>     *Sent: *Tuesday, June 16, 2015 11:22 AM
>     *To: *Tom Houweling; CCPEM at JISCMAIL.AC.UK
>     <mailto:CCPEM at JISCMAIL.AC.UK>
>     *Cc: *3DEM
>     *Subject: *Re: [3dem] [ccpem] MRC file format (Compressing cryo-EM
>     data to 8-bits/pix and beyond)
>
>
>     Dear All,
>
>     For various reasons I don’t think this line of reasoning is very
>     productive. The data compression to 8 or even 4 bits as has been
>     suggested in this discussion can only lead to loss of data (see
>     below). It may also represent poor management of the available EM
>     resources.
>
>     Point by point:
>
>     A) Advanced cryo-EM equipment costs of the order of ~5000 AUs
>     (Arbitrary Units: $/Eu/£) per day to own and operate, and will
>     generate up to ~ 2Tbyte of cryo-EM data per 24h.  The costs of
>     storing this precious data for “eternity” will not exceed 100 AUs
>     per day, that is, one or two percent of the tax-payers total
>     investment in your data collection. NOT storing that raw data may
>     NOT be a good idea for economic reasons alone (just in case you,
>     for example, need to repeat the experiment to get the data back).
>
>     B) Compressing all the raw data to save space can make sense as
>     long as the compression is loss-less
>     (https://en.wikipedia.org/wiki/Lossless_compression). The
>     compression (after movie alignment) as suggested, however, may
>     lead to a significant information loss.
>
>     C) The dynamic range of a raw image is mainly determined by the
>     low-frequency components of the data. Scaling the min-max
>     densities from 0-255 for compression/truncation to 8 bit data,
>     changes the data representation from image to image. The
>     high-resolution information we are interested is has a contrast of
>     probably less than 0.1% of the strong low-frequency components.
>     The signal we are interested in is thus already much smaller than
>     the discretisation error of 1:256 of the A-to-D conversion. That
>     does not mean one will not be able to fish that information from
>     the discretisation and Poisson noise in the raw data… But it will
>     certainly suffer.  The grey scales will change from image to image
>     purely dependent on whether there is, for example, an ice crystal
>     somewhere in the field of view. High-pass filtering will remove
>     the large-scale details thus also increase the dynamic range
>     available for the high-res frequency data components.
>
>     D) Note that the fact that you manage to get a 3D structure out is
>     no proof that you have not lost information. It is merely proof
>     for the fact that there was enough left over to create a
>     reasonable 3D that satisfies you.
>
>     E) There are also other reasons for never deleting the original
>     data such as validation! You may be challenged – as has happened
>     in the recent past (PNAS 2013) - to show the original data set to
>     prove it is what you claim it is and was collected on the
>     instrumentation you claim it was taken on. (In the PNAS cases the
>     original data has still not been released).
>
>     F) What one can or wants to do with the raw data changes over
>     time. Many new movie alignment algorithms have been proposed
>     recently; access to exactly the same raw data is essential for
>     validation of the new algorithms. (You may even get more out of
>     your data!)
>
>     G) The raw data characterizes the camera (and validates the data
>     set as per E) and allow you to correct for its flaws
>     (http://www.nature.com/srep/2015/150611/srep10317/full/srep10317.html).
>     You may also want to see whether the camera itself deteriorated
>     over time.
>
>     H) Especially when the raw data are of some integer type, (and you
>     are using data with a limited dynamic range), the data on disk
>     will be written in a highly redundant fashion.  You may then use
>     loss-less compression algorithms to reduce the size of your data
>     without suffering any information loss. You may always compress
>     the data, you may never compromise on its information content!
>
>     Cheers, Marin
>
>     ========================================
>
>     On 04/06/2015 00:15, Tom Houweling wrote:
>>     What I meant is that Relion appears to have no problem reading 16
>>     bit and 8 bit formats, therefore converting to 32bit floating
>>     point images should not be necessary.
>>
>>     However, the verdict on loss of resolution reducing the data to 8
>>     bits is still out. I’m motivated by conserving disk space.
>>
>>     I’m currently reprocessing a good dataset that yielded a high
>>     resolution structure. But this time I converted the aligned
>>     stacks of 32bit per pixel to just 8 by the following method:
>>
>>     1)Calculate the mean and std. deviation
>>     2)Cutoff at +/- 3 std dev
>>     3)Set lowest value to 0 and highest to 255
>>
>>     Tom
>>
>>
>>>     On Jun 3, 2015, at 10:58 AM, Amedee des Georges
>>>     <adesgeorges at GMAIL.COM <mailto:adesgeorges at GMAIL.COM>> wrote:
>>>
>>>     Dear Tom,
>>>
>>>     Did you see any decrease in resolution with 8bit vs 16? How did
>>>     it look?
>>>     It’s obviously an advantage to use 8bits for storage if it
>>>     doesn’t decrease image quality significantly.
>>>
>>>     Best,
>>>
>>>     Amedee
>>>
>>>     On Jun 3, 2015, at 1:44 PM, Tom Houweling
>>>     <tom.houweling at BERKELEY.EDU <mailto:tom.houweling at BERKELEY.EDU>>
>>>     wrote:
>>>
>>>>     We have successfully processed MRC images and stacks in Relion
>>>>     that were in 16 bit mode 6 and also in the non MRC sanctioned
>>>>     mode 5 (8 bit unsigned).
>>>>
>>>>     —Tom
>>>>
>>>>
>>>>>     On Jun 3, 2015, at 10:22 AM, Rémi Fronzes
>>>>>     <remi.fronzes at PASTEUR.FR <mailto:remi.fronzes at PASTEUR.FR>> wrote:
>>>>>
>>>>>     Dear All,
>>>>>
>>>>>     Maybe a silly question but still worth asking.
>>>>>     Is it a problem to extract and use in relion particles from
>>>>>     16bits MRC images (i.e. collected using EPU) ?
>>>>>     Or do we have to convert the micrographs in 32 bits MRC format.
>>>>>
>>>>>     Cheers
>>>>>
>>>>>     Rémi
>>>>>
>>>>>
>>>>>     Rémi Fronzes
>>>>>     G5 biologie structurale de la sécrétion bactérienne, institut
>>>>>     Pasteur
>>>>>     CNRS UMR 3528, institut Pasteur
>>>>>
>>>>>     Office: +33 (0)145688864
>>>>>     Lab: +33 (0) 145688863
>>>>>     Mobile: +33 (0) 688263992
>>>>>     Email:remi.fronzes at pasteur.fr <mailto:remi.fronzes at pasteur.fr>
>>>>>
>>>>>     25 rue du Docteur Roux
>>>>>     Bâtiment Metchnikoff, 3ème étage
>>>>>     75015 Paris, France
>>>>>
>>>>
>>>>     --
>>>>     Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard
>>>>     Hughes Medical Institute
>>>>     University of California Berkeley, 708D Stanley Hall, Berkeley,
>>>>     CA 94720
>>>>
>>>>
>>>
>>
>>     --
>>     Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard
>>     Hughes Medical Institute
>>     University of California Berkeley, 708D Stanley Hall, Berkeley,
>>     CA 94720
>>
>>
>
>     -- 
>     ================================================================
>
>          Prof Dr Ir Marin van Heel
>
>          Professor of Cryo-EM Data Processing
>
>          Leiden University
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20150619/08eb04ee/attachment-0001.html>


More information about the 3dem mailing list