[3dem] [ccpem] Converting MRC to tiff

Marin van Heel marin.vanheel at googlemail.com
Wed Jun 6 10:21:14 PDT 2018


Dear All,

Steve Ludke (and others) are right: the best thing to do is to store the 
integer counting files for compact long-term storage, and do any “gain 
correction” later, prior to processing. Movies collected in counting 
mode is what the discussion is about here! However, gain correction (as 
per the camera manufacture’s procedures) are always sub-optimal and I 
prefer to speak of “camera normalization” (see below), but let us start 
at the beginning:

(1) The IMAGIC format, from its early beginnings[van Heel & Keegstra 
1981], was designed for handling large sets of images (“stacks”) with 
each image in the stack (stored in the “.IMG” file) having its own 
header (stored in a separate ‘.HED’ file). The same format also allows 
for multidimensional data, say stacks of 3D volumes, or stacks of 
thousands of movies, each containing, say 128 frames. The files can be 
huge (~terabytes) but fast random access of all images is guaranteed by 
all images having the same size, giving predictability of where each one 
starts. The stack approach has since proliferated to other systems like 
SPIDER and MRC, but only as an add-on and never as the core of the 
system, expecting all programs by default to conveniently loop over all 
images or 3Ds in the stack. Compressing an IMAGIC image stack containing 
a huge set of movies, is trivial and very effective using standard 
ZIP-type compressors.

(2) The MRC format [Crowther 1996] is a derivative of the earlier CCP4 
format for 3D X-ray volume data. The MRC format is not optimized for 
storing many 2D images since it only supports ONE image and ONE header 
encoded in ONE file. Because of the limitations of the MRC format many 
variants and extensions have been introduced for specific purposes by 
many different researchers that are difficult to keep track of leading 
to incompatibilities. The extension of the MRC format to store a large 
number of individual images (stacks) is primitive while all images share 
the same header. The MRC stack was adopted by some for storing cryo-EM 
movies. The individual frame images contain a low number of electron 
counts, implying that most of the bits in the movie are “0” and thus the 
movie can easily be compressed to a minimal size by any standard “ZIP” 
program.

(3) The TIFF (= Tagged Image File Format) is a popular format in its 
simplest form, but complicated and not universal in all its extensions. 
Many variants with specific TAGs exist that make no sense outside a 
given application. TIFFs exist that can hold multiple images (stacks) 
but don’t count on standard imaging programs being able to look at 
anything more than the first image in the stack. The enthusiasm Steve 
and others expressed reflects only the storing a stack of 8-bits/pixel 
movies stored in TIFF compressed mode, exploiting loss-less compression 
options available in standard libraries.However, having hundreds or 
thousands of such individual TIFF movies floating around in a single 
directory is as chaotic as is having hundreds or thousands of MRCs movie 
files in a directory. The ZIP compression will actually allow you to 
organize such an overall mess a bit better into a single sub-directory/file.

(4) Data compression of a stack of images (loss-less) makes that each 
compressed image has a different size and that thus each image begins at 
an unpredictable point in the file (see (2) above). For random access to 
the compressed images this requires an additional level of 
organization/complexity which may even require reading/decompressing the 
whole file in the computer memory. So far, I have preferred to loss-less 
compress raw counting images (MRCs or IMAGIC byte format or…) separate 
from any real data processing.

(5) The Camera manufacturer’s “gain corrections” are normally 
sub-optimal and our “camera normalization” based on the collected data 
sets can considerably improve the data set even after an a-priori 
standard gain correction [Afanasyev 2015 Sci. Rep.].The floating-point 
images needed for correction are the average image and the standard 
deviation image of a large data set and these are normally derived from 
the data set itself. The camera normalization will remove spurious 
correlations between images (say, movie frames) and will thus, for 
example, allow better movie alignments, based on smaller subareas of the 
movies. We have not yet seen a case where the manufacturers gain 
correction could not be improved upon based on the spurious correlations 
between the corrected images measured by FRC [Afanasyev 2015]. The 
a-posteriori camera correction can, in fact, replace the manufactures 
gain correction in many/most cases.

(6) Finally, for a better understanding of the fundamentals, let us 
reverse the thinking and perform what theoretically would be the best 
possible data collection experiment: Let us register the arrival of each 
individual electron of a movie! Each electron hits somewhere on a 4kx4k 
chip (= 2**12 x 2**12 = 2**24 pixels possible), arriving in one of the 
128 movie frames (= 2**7 possibilities) to achieve an average 
accumulated dose per pixel of ~64 electrons (= 2**6). The size of such 
an ideal data set of individual electron arrivals would thus 
be:~2**(12+12+7+6) = 2**37 bits; or,  ~512 Mbyte. Had you measured your 
128 movie frames at the byte level (16Mbyte/frame), you would have ended 
up with a movie of ~2 Tbyte. Thus, by counting the arrival of the 
individual electrons, you have achieved a “compression rate” of 75%. 
However, if you would already have been satisfied with an average count 
of only 16 electrons per pixel, your ideal collected movie data set 
would have shrunk to 128 Mbyte and, everything else being the same, your 
“loss-less compression rate” would have been 94%! What I am trying to 
say here is that those compression rates are NOT a virtue of your 
favorite compression algorithm but are due to the intrinsic redundancies 
of your movie data!

Bottom lines: (A) Use loss-less compression of your counting files and 
store them forever! (B) The on-the-fly loss-less compression of 
floating-point images probably causes more problems than the limited 
saving of space is worth. (C) Since in cryo-EM the noise is the carrier 
of the information, stay away from any compression algorithm that is not 
loss-less: it may kill your information!! (D) Loss-less compression of 
huge stacks of final-results floating-point data sets will probably also 
save you a significant amount of space (in the order of 10-40%).

My few cents (may save you thousands… ;) ).

Marin


=======================================================================================

On 03/06/2018 21:31, Nicolas, William (William) wrote:
> Hey Yehuda,
>
> As Marin is saying this isn’t going to save any space to convert MRC 
> in TIFF. However, if you happen to want to perform this task, this is 
> how you can do it:
>
> Both methods involve ImageJ. You can either use Bioformat plugin to 
> open .mrc although I don’t like doing that. Instead there is another 
> plugin called U759_inputoutput.jar 
> (http://www.cmib.fr/en/download/softwares/input-output.html) that 
> allows you to open .mrc with ImageJ. Then all you got to do is save as 
> > TIFF.
> Check if metadata are kept by doing so.
>
> Cheers,
>
> William Nicolas, HHMI Postdoc.
>
> *_Jensen Laboratory - Meyerowitz Laboratory_*
> Division of Biology and Biological Engineering
> 1200 East California Blvd
> Postal code: 156-29
> California Institute of Technology
> Pasadena, CA 91125, USA
>
>
>
>
>
>> Le 3 juin 2018 à 03:18, Marin van Heel <marin.vanheel at googlemail.com 
>> <mailto:marin.vanheel at googlemail.com>> a écrit :
>>
>>
>> Dear Yehuda Halfon
>>
>> The em2em converter (Image-Science.de <http://image-science.de/>) 
>> should do the trick but storing (4-bit/8-bit?) MRC movies in tiff is 
>> not a good idea: very few programs can handle tiff stacks. Moreover 
>> if the TIFFs are not compressed you will not necessarily win any 
>> space. You are probably best off using a standard loss-less 
>> compression program ("zip"), and convert them back when you need it 
>> again. You must always keep your original raw data "forever" in a 
>> loss-less form.
>>
>> Marin van Heel
>>
>>
>> On 03/06/2018 06:02, Yehuda Halfon wrote:
>>> Hi there,
>>>
>>> We have a bunch of MRC movie files the are eating at out storage, 
>>> and since more are coming I was wondering if there is a good way to 
>>> convert them into tiff to save space?
>>>
>>> I know that the best way is to save them directly as tif from EPU/ 
>>> serialEM and that is what we will to in the future. But we need to 
>>> find a solution to the ones we have now.
>>>
>>> Thanks,
>>>
>>> Yehuda Halfon
>>>
>>> ------------------------------------------------------------------------
>>>
>>> To unsubscribe from the CCPEM list, click the following link:
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>>>
>>
-- 

==============================================================

     Prof Dr Ir Marin van Heel

     Laboratório Nacional de Nanotecnologia - LNNano
     CNPEM/LNNano, Campinas, Brazil

--------------------------------------------------
     Emeritus Professor of Cryo-EM Data Processing
     Leiden University
--------------------------------------------------
     Emeritus Professor of Structural Biology
     Imperial College London
--------------------------------------------------

I receive many emails per day and, although I try,
there is no guarantee that I will actually read each incoming email.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20180606/169e4c6d/attachment-0001.html>


More information about the 3dem mailing list