[3dem] Advice on storage server

Ludtke, Steven J. sludtke at bcm.edu
Wed Feb 14 18:07:00 PST 2024


If you don't expect to need to access it again, ie - purely an emergency backup, Amazon Glacier is a cost effective solution, as long as you have $ to continue paying for it. 100 TB of deep archive Glacier storage would run about $1200/year (+ additional cost if you need to retrieve it).

If you are storing it for possible additional processing, then you want the storage to be "close" in data transfer terms to the processing power. ie - if you are processing in the cloud, then storing the data in the cloud makes sense. Clearly you would not want to process the data directly from cloud storage. Keep in mind the relative speeds of transfer for different devices/transfer methods:

M.2 SSD -> 2-4 GB/s
8 drive RAID array with spinning platters directly on the machine -> ~1 GB/s
SATA SSD -> 0.6 GB/s
single spinning platter on machine -> 0.15 GB/s
gigabit network remote access -> 0.1 GB/s
less than gigabit remote access (cloud at typical institutions) -> <0.1 GB/s

For size comparison, a 4k x 4k x 1k tomogram at 8 bits is 16 GB, so opening that from an M.2 SSD might take 4-8 seconds, whereas opening the same file over a gigabit NAS would take almost 3 minutes.

Personally, I have a 12 bay Synology NAS box with a 10 Gb network card in it under my desk. With 16 TB drives and RAID6 this gives about 150 TB of usable storage space, which you can access at ~1 GB/s. Cost ~$5000, with an expected drive life of ~5 years, ie - expect you will have to periodically replace bad drives occasionally after the first few years.

It's worth noting here that at $5000, with an expected life of ~5 years before you start having to pay for more drives, this is $1000/year and gives high speed access, compared to the $1200/year for deep Glacier storage above. However, the Glacier storage has much better reliability than a single RAID6 array with no additional backup.

Anyway, some food for thought  :^)

---
Steven Ludtke, Ph.D. <sludtke at bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center


On Feb 14, 2024, at 6:14 PM, Jobichen <jobichenc at yahoo.com> wrote:

Dear All,
We are looking for some suggestions on storing the raw datasets/movies. What will be best option for storing around 100TB of movies/processed data.
What will be pros/cons of having own storage server vs cloud storage options.
Thank you for your time.
Jobi



_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GWA2IF6nkq8sZMXHpp1Xpg&m=-fMPusn_TT7DVAUweasDDQG4kEyzhEAyjRtShGQYPmx9cRVoBtVsmUUqEMrMPs9w&s=LBeNcMDu7IJx1_Y7BTp2_JFhuug6w0oVJobkLUozOFc&e=

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240215/63b4d42f/attachment.html>


More information about the 3dem mailing list