[3dem] [ccpem] MRC file format (Compressing cryo-EM data to 8-bits/pix and beyond)

Marin van Heel marin.vanheel at googlemail.com
Wed Jun 17 14:30:35 PDT 2015


Yes that was understood John. The point is that with loss-less 
compression you can achieve even smaller files than with 4/8/16/32 bit 
integers, because those, as you state correctly, are largely filled with 
zeroes.

Cheers

Marin


On 16/06/2015 18:02, John Rubinstein wrote:
> ‎Dear Marin and Pawel,
>
> For the K2 camera that outputs counts (e.g. 1, 2, 3, 4, etc) there is 
> no loss of information in storing these numbers  as 4 bit or 8 bit as 
> long as you don't exceed the highest integer that the data type can 
> hold. Any excess bits just hold 0s. Storing as 32 bits does not cost 
> much but it also has no purpose.
>
> Best wishes,
> John
>
> Sent from my BlackBerry 10 smartphone.
> *From: *Marin van Heel
> *Sent: *Tuesday, June 16, 2015 11:22 AM
> *To: *Tom Houweling; CCPEM at JISCMAIL.AC.UK
> *Cc: *3DEM
> *Subject: *Re: [3dem] [ccpem] MRC file format (Compressing cryo-EM 
> data to 8-bits/pix and beyond)
>
>
> Dear All,
>
> For various reasons I don’t think this line of reasoning is very 
> productive. The data compression to 8 or even 4 bits as has been 
> suggested in this discussion can only lead to loss of data (see 
> below). It may also represent poor management of the available EM 
> resources.
>
> Point by point:
>
> A) Advanced cryo-EM equipment costs of the order of ~5000 AUs 
> (Arbitrary Units: $/Eu/£) per day to own and operate, and will 
> generate up to ~ 2Tbyte of cryo-EM data per 24h.  The costs of storing 
> this precious data for “eternity” will not exceed 100 AUs per day, 
> that is, one or two percent of the tax-payers total investment in your 
> data collection. NOT storing that raw data may NOT be a good idea for 
> economic reasons alone (just in case you, for example, need to repeat 
> the experiment to get the data back).
>
> B) Compressing all the raw data to save space can make sense as long 
> as the compression is loss-less 
> (https://en.wikipedia.org/wiki/Lossless_compression). The compression 
> (after movie alignment) as suggested, however, may lead to a 
> significant information loss.
>
> C) The dynamic range of a raw image is mainly determined by the 
> low-frequency components of the data. Scaling the min-max densities 
> from 0-255 for compression/truncation to 8 bit data, changes the data 
> representation from image to image. The high-resolution information we 
> are interested is has a contrast of probably less than 0.1% of the 
> strong low-frequency components. The signal we are interested in is 
> thus already much smaller than the discretisation error of 1:256 of 
> the A-to-D conversion. That does not mean one will not be able to fish 
> that information from the discretisation and Poisson noise in the raw 
> data… But it will certainly suffer.  The grey scales will change from 
> image to image purely dependent on whether there is, for example, an 
> ice crystal somewhere in the field of view. High-pass filtering will 
> remove the large-scale details thus also increase the dynamic range 
> available for the high-res frequency data components.
>
> D) Note that the fact that you manage to get a 3D structure out is no 
> proof that you have not lost information. It is merely proof for the 
> fact that there was enough left over to create a reasonable 3D that 
> satisfies you.
>
> E) There are also other reasons for never deleting the original data 
> such as validation! You may be challenged – as has happened in the 
> recent past (PNAS 2013) - to show the original data set to prove it is 
> what you claim it is and was collected on the instrumentation you 
> claim it was taken on. (In the PNAS cases the original data has still 
> not been released).
>
> F) What one can or wants to do with the raw data changes over time. 
> Many new movie alignment algorithms have been proposed recently; 
> access to exactly the same raw data is essential for validation of the 
> new algorithms. (You may even get more out of your data!)
>
> G) The raw data characterizes the camera (and validates the data set 
> as per E) and allow you to correct for its flaws 
> (http://www.nature.com/srep/2015/150611/srep10317/full/srep10317.html). You 
> may also want to see whether the camera itself deteriorated over time.
>
> H) Especially when the raw data are of some integer type, (and you are 
> using data with a limited dynamic range), the data on disk will be 
> written in a highly redundant fashion.  You may then use loss-less 
> compression algorithms to reduce the size of your data without 
> suffering any information loss. You may always compress the data, you 
> may never compromise on its information content!
>
> Cheers, Marin
>
> ========================================
>
> On 04/06/2015 00:15, Tom Houweling wrote:
>> What I meant is that Relion appears to have no problem reading 16 bit 
>> and 8 bit formats, therefore converting to 32bit floating point 
>> images should not be necessary.
>>
>> However, the verdict on loss of resolution reducing the data to 8 
>> bits is still out. I’m motivated by conserving disk space.
>>
>> I’m currently reprocessing a good dataset that yielded a high 
>> resolution structure. But this time I converted the aligned stacks of 
>> 32bit per pixel to just 8 by the following method:
>>
>> 1)Calculate the mean and std. deviation
>> 2)Cutoff at +/- 3 std dev
>> 3)Set lowest value to 0 and highest to 255
>>
>> Tom
>>
>>
>>> On Jun 3, 2015, at 10:58 AM, Amedee des Georges 
>>> <adesgeorges at GMAIL.COM <mailto:adesgeorges at GMAIL.COM>> wrote:
>>>
>>> Dear Tom,
>>>
>>> Did you see any decrease in resolution with 8bit vs 16? How did it 
>>> look?
>>> It’s obviously an advantage to use 8bits for storage if it doesn’t 
>>> decrease image quality significantly.
>>>
>>> Best,
>>>
>>> Amedee
>>>
>>> On Jun 3, 2015, at 1:44 PM, Tom Houweling 
>>> <tom.houweling at BERKELEY.EDU <mailto:tom.houweling at BERKELEY.EDU>> wrote:
>>>
>>>> We have successfully processed MRC images and stacks in Relion that 
>>>> were in 16 bit mode 6 and also in the non MRC sanctioned mode 5 (8 
>>>> bit unsigned).
>>>>
>>>> —Tom
>>>>
>>>>
>>>>> On Jun 3, 2015, at 10:22 AM, Rémi Fronzes <remi.fronzes at PASTEUR.FR 
>>>>> <mailto:remi.fronzes at PASTEUR.FR>> wrote:
>>>>>
>>>>> Dear All,
>>>>>
>>>>> Maybe a silly question but still worth asking.
>>>>> Is it a problem to extract and use in relion particles from 16bits 
>>>>> MRC images (i.e. collected using EPU) ?
>>>>> Or do we have to convert the micrographs in 32 bits MRC format.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Rémi
>>>>>
>>>>>
>>>>> Rémi Fronzes
>>>>> G5 biologie structurale de la sécrétion bactérienne, institut Pasteur
>>>>> CNRS UMR 3528, institut Pasteur
>>>>>
>>>>> Office: +33 (0)145688864
>>>>> Lab: +33 (0) 145688863
>>>>> Mobile: +33 (0) 688263992
>>>>> Email:remi.fronzes at pasteur.fr <mailto:remi.fronzes at pasteur.fr>
>>>>>
>>>>> 25 rue du Docteur Roux
>>>>> Bâtiment Metchnikoff, 3ème étage
>>>>> 75015 Paris, France
>>>>>
>>>>
>>>> --
>>>> Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard 
>>>> Hughes Medical Institute
>>>> University of California Berkeley, 708D Stanley Hall, Berkeley, CA 
>>>> 94720
>>>>
>>>>
>>>
>>
>> --
>> Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard 
>> Hughes Medical Institute
>> University of California Berkeley, 708D Stanley Hall, Berkeley, CA 94720
>>
>>
>
>
> -- 
> ================================================================
>
>      Prof Dr Ir Marin van Heel
>
>      Professor of Cryo-EM Data Processing
>
>      Leiden University
>
>


-- 
================================================================

     Prof Dr Ir Marin van Heel

     Professor of Cryo-EM Data Processing

     Leiden University
     NeCEN Building Room 05.27
     Einsteinweg 55
     2333 CC Leiden
     The Netherlands
      
     Tel. NL: +31(0)715271424 // Mobile NL: +31(0)652736618
     Skype:    Marin.van.Heel
     email:  marin.vanheel(A_T)gmail.com
     and:    mvh.office(A_T)gmail.com


----------------------------------------------

     Emeritus Professor of Structural Biology

     Imperial College London
     Faculty of Natural Sciences
     Biochemistry Building (Room 512)
     South Kensington Campus
     London SW7 2AZ,  UK
     email:  m.vanheel(A_T)ic.ac.uk

     Tel. UK:   +44(0)2075945316 //Mobile: +44(0)7941540625

----------------------------------------------
     Visiting Professor at:

     Laboratório Nacional de Nanotecnologia - LNNano
     CNPEM/ABTLuS, Campinas, Brazil
     Brazilian mobile phone  +55-19-983189143

------------------------------------------------------------------

I receive many emails per day and, although I try,
there is no guarantee that I will actually read each incoming email.
Moreover, our Spam filters can be strikt and sometimes make
legitimate emails disappear (try the gmail accounts, alternatively)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20150617/c0ee30bb/attachment-0001.html>


More information about the 3dem mailing list