[3dem] [ccpem] MRC file format (Compressing cryo-EM data to 8-bits/pix and beyond)

Wed Jun 17 10:46:08 PDT 2015

Dear John, Marin, Pawel,

Those raw K2 counts are already the result of compression (which may not be
lossless). The raw data coming off a K2 camera chip is at 400 fps, and not
counted. I wonder what the RU cost of storing uncompressed data at 400 fps
for an ~10s exposure, never mind the infrastructure that would have to be
built to accommodate moving all that data.

Best wishes,

Axel

On Tue, Jun 16, 2015 at 9:02 AM, John Rubinstein <
john.rubinstein at utoronto.ca> wrote:

> ‎Dear Marin and Pawel,
>
> For the K2 camera that outputs counts (e.g. 1, 2, 3, 4, etc) there is no
> loss of information in storing these numbers  as 4 bit or 8 bit as long as
> you don't exceed the highest integer that the data type can hold. Any
> excess bits just hold 0s. Storing as 32 bits does not cost much but it also
> has no purpose.
>
> Best wishes,
> John
>
> Sent from my BlackBerry 10 smartphone.
>   *From: *Marin van Heel
> *Sent: *Tuesday, June 16, 2015 11:22 AM
> *To: *Tom Houweling; CCPEM at JISCMAIL.AC.UK
> *Cc: *3DEM
> *Subject: *Re: [3dem] [ccpem] MRC file format (Compressing cryo-EM data
> to 8-bits/pix and beyond)
>
> Dear All,
>
> For various reasons I don’t think this line of reasoning is very
> productive. The data compression to 8 or even 4 bits as has been suggested
> in this discussion can only lead to loss of data (see below). It may also
> represent poor management of the available EM resources.
>
> Point by point:
>
> A) Advanced cryo-EM equipment costs of the order of ~5000 AUs (Arbitrary
> Units: $/Eu/£) per day to own and operate, and will generate up to ~ 2Tbyte
> of cryo-EM data per 24h.  The costs of storing this precious data for
> “eternity” will not exceed 100 AUs per day, that is, one or two percent of
> the tax-payers total investment in your data collection. NOT storing that
> raw data may NOT be a good idea for economic reasons alone (just in case
> you, for example, need to repeat the experiment to get the data back).
>
> B) Compressing all the raw data to save space can make sense as long as
> the compression is loss-less (
> https://en.wikipedia.org/wiki/Lossless_compression). The compression
> (after movie alignment) as suggested, however, may lead to a significant
> information loss.
>
> C) The dynamic range of a raw image is mainly determined by the
> low-frequency components of the data. Scaling the min-max densities from
> 0-255 for compression/truncation to 8 bit data, changes the data
> representation from image to image. The high-resolution information we are
> interested is has a contrast of probably less than 0.1% of the strong
> low-frequency components. The signal we are interested in is thus already
> much smaller than the discretisation error of 1:256 of the A-to-D
> conversion. That does not mean one will not be able to fish that
> information from the discretisation and Poisson noise in the raw data… But
> it will certainly suffer.  The grey scales will change from image to image
> purely dependent on whether there is, for example, an ice crystal somewhere
> in the field of view. High-pass filtering will remove the large-scale
> details thus also increase the dynamic range available for the high-res
> frequency data components.
>
> D) Note that the fact that you manage to get a 3D structure out is no
> proof that you have not lost information. It is merely proof for the fact
> that there was enough left over to create a reasonable 3D that satisfies
> you.
>
> E) There are also other reasons for never deleting the original data such
> as validation! You may be challenged – as has happened in the recent past
> (PNAS 2013) - to show the original data set to prove it is what you claim
> it is and was collected on the instrumentation you claim it was taken on.
> (In the PNAS cases the original data has still not been released).
>
> F) What one can or wants to do with the raw data changes over time. Many
> new movie alignment algorithms have been proposed recently; access to
> exactly the same raw data is essential for validation of the new
> algorithms. (You may even get more out of your data!)
>
> G) The raw data characterizes the camera (and validates the data set as
> per E) and allow you to correct for its flaws (
> http://www.nature.com/srep/2015/150611/srep10317/full/srep10317.html).
> You may also want to see whether the camera itself deteriorated over time.
>
> H) Especially when the raw data are of some integer type, (and you are
> using data with a limited dynamic range), the data on disk will be written
> in a highly redundant fashion.  You may then use loss-less compression
> algorithms to reduce the size of your data without suffering any
> information loss. You may always compress the data, you may never
> compromise on its information content!
>
> Cheers, Marin
>
> ========================================
>
> On 04/06/2015 00:15, Tom Houweling wrote:
>
> What I meant is that Relion appears to have no problem reading 16 bit and
> 8 bit formats, therefore converting to 32bit floating point images should
> not be necessary.
>
>  However, the verdict on loss of resolution reducing the data to 8 bits
> is still out. I’m motivated by conserving disk space.
>
>  I’m currently reprocessing a good dataset that yielded a high resolution
> structure. But this time I converted the aligned stacks of 32bit per pixel
> to just 8 by the following method:
>
>  1) Calculate the mean and std. deviation
>  2) Cutoff at +/- 3 std dev
>  3) Set lowest value to 0 and highest to 255
>
>  Tom
>
>
>  On Jun 3, 2015, at 10:58 AM, Amedee des Georges <adesgeorges at GMAIL.COM>
> wrote:
>
>  Dear Tom,
>
>  Did you see any decrease in resolution with 8bit vs 16? How did it look?
> It’s obviously an advantage to use 8bits for storage if it doesn’t
> decrease image quality significantly.
>
>  Best,
>
>  Amedee
>
>  On Jun 3, 2015, at 1:44 PM, Tom Houweling <tom.houweling at BERKELEY.EDU>
> wrote:
>
>  We have successfully processed MRC images and stacks in Relion that were
> in 16 bit mode 6 and also in the non MRC sanctioned mode 5 (8 bit unsigned).
>
>  —Tom
>
>
>  On Jun 3, 2015, at 10:22 AM, Rémi Fronzes <remi.fronzes at PASTEUR.FR>
> wrote:
>
>  Dear All,
>
>  Maybe a silly question but still worth asking.
> Is it a problem to extract and use in relion particles from 16bits MRC
> images (i.e. collected using EPU) ?
> Or do we have to convert the micrographs in 32 bits MRC format.
>
>  Cheers
>
>  Rémi
>
>
>  Rémi Fronzes
> G5 biologie structurale de la sécrétion bactérienne, institut Pasteur
> CNRS UMR 3528, institut Pasteur
>
>  Office: +33 (0)145688864
> Lab: +33 (0) 145688863
> Mobile: +33 (0) 688263992
> Email: remi.fronzes at pasteur.fr
>
>  25 rue du Docteur Roux
> Bâtiment Metchnikoff, 3ème étage
> 75015 Paris, France
>
>
>  --
> Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard
> Hughes Medical Institute
> University of California Berkeley, 708D Stanley Hall, Berkeley, CA 94720
>
>
>
>
>  --
> Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard
> Hughes Medical Institute
> University of California Berkeley, 708D Stanley Hall, Berkeley, CA 94720
>
>
>
>
> --
> ================================================================
>
>     Prof Dr Ir Marin van Heel
>
>     Professor of Cryo-EM Data Processing
>
>     Leiden University
>
>
>
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20150617/6ff5b6bf/attachment-0001.html>