[3dem] Data storage and compression

Ludtke, Steven J. sludtke at bcm.edu
Thu Aug 29 05:32:44 PDT 2019


EMPIAR is great, and while they say they will take whatever people provide, if all of the raw data (even if it were limited to 'good' data) from all of the Krios around the world were archived in EMPIAR, there would be massive bandwidth and storage issues. When used as it is now, as an archive for important reference data sets, it's great, but I'm not sure it's a viable strategy for archiving everything produced in the CryoEM community.

back of the envelope:
2000 images/day * 500 MB/compressed counting movie * 100 krios ~ 100 TB/day

even a dedicated 10 Gb network running flat out couldn't keep up.

--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <sludtke at bcm.edu<mailto:sludtke at bcm.edu>>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem<http://www.bcm.edu/biochem>)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu<http://cryoem.bcm.edu>)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr<http://www.bcm.edu/research/cibr>)



On Aug 29, 2019, at 7:04 AM, Takanori Nakane <tnakane at mrc-lmb.cam.ac.uk<mailto:tnakane at mrc-lmb.cam.ac.uk>> wrote:

Hi,

Most people just opt for the "hard drive on a shelf" method for completed
projects, which has advantages (cheap/simple) and disadvantages (what
happens if the drive dies)...

After publication of your structures, I recommend raw data to be deposited
in EMPIAR.
Not only is it useful for reproducibility, education and method development,
it also serves as an additional layer of backup. You might drop your disk,
water might leak from the ceiling, etc. Having backups in a physically
distant
place is a good practice.

Best regards,

Takanori Nakanori

Julien,
are you referring to the raw data, or are you trying to archive all of the
files associated with a project?

Counting-mode movies are generally stored and archived as compressed tiff
stacks, though if they are collected on a Falcon, there are issues with
this, as good compression is achieved only pre-normalization (or
post-normalization if you decide you are willing to switch back to an
integer format).

If you want to perfectly archive everything exactly as it is (losslessly),
some compression algorithms may do very slightly better than others, but
pretty much any of the commonly used algorithms will do about the same.
Usually the slower ones will do slightly better, but you have to decide if
it's worth the CPU time the compression takes.  By definition, the noisier
the data is, the less compressible it is, unless you are willing to invoke
"lossy" compression and throw away some of the bits of pure noise.

Most people just opt for the "hard drive on a shelf" method for completed
projects, which has advantages (cheap/simple) and disadvantages (what
happens if the drive dies)...

--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <sludtke at bcm.edu<mailto:sludtke at bcm.edu><mailto:sludtke at bcm.edu>>
        Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology
(www.bcm.edu/biochem<http://www.bcm.edu/biochem><http://www.bcm.edu/biochem>)
Academic Director, CryoEM Core
(cryoem.bcm.edu<http://cryoem.bcm.edu/><http://cryoem.bcm.edu<http://cryoem.bcm.edu/>>)
Co-Director CIBR Center
(www.bcm.edu/research/cibr<http://www.bcm.edu/research/cibr><http://www.bcm.edu/research/cibr>)



On Aug 29, 2019, at 6:30 AM, Julien Bous
<julien.bous at etu.umontpellier.fr<mailto:julien.bous at etu.umontpellier.fr><mailto:julien.bous at etu.umontpellier.fr>>
wrote:



Dear Community,

I have a question about the best way to store my data once SPA projects
are achieved. Can you advise me about which compression format is to
prefer?

Thank you for your interest,

Julien


_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu><mailto:3dem at ncmir.ucsd.edu>
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GWA2IF6nkq8sZMXHpp1Xpg&m=-Yu84q3MdcWvESYXpaK7NQdEWch6tE1eG9IVNTjLay4&s=NSrJg_YgFffwLELO1auXSC6yYLEsGHVoNV5TI_1eBqM&e=

_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwIDJg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GWA2IF6nkq8sZMXHpp1Xpg&m=NIpw6RIeeyKxoYDz2eZPHOcIZvNm9VytdzBFUEtQ-10&s=UG1BMTIotgpZVcqSlW0cd0tfnpxgEo9l3RLHUfU2ODc&e=

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20190829/36c2fbca/attachment-0001.html>


More information about the 3dem mailing list