[3dem] Advice on storage server

Matthias Wolf matthias.wolf at oist.jp
Thu Feb 15 01:24:51 PST 2024


In addition to the university HPC cluster, we use these in my lab at OIST for our own cryo-EM storage needs :


  *
Archival: Qualstar Q24 LTO8 24-tape library, single drive (~100$/ 12TB uncompressed cartridge, lifetime >30 yrs), connected to Supermicro storage server by 6 Gbps SAS PCIe card, ~20 TB/day.
https://urldefense.com/v3/__https://www.backupworks.com/qualstar-Q24-LTO-8-Tape-libraries.aspx__;!!Mih3wA!AqQRGpcslECVpaex9vo4dVX5T8zAd44Z8fmQIknMvTitF-MbRHg6WMgKvV2L45_eUX2EqUDIFeYVPLCuJqtwD0xJHco$ 
  *
Near-line storage: 800 TB, 60x 16TB HDD) SuperStorage 2x Xeon, 512 GB, 4x 10Gb ethernet, TrueNAS, Archiware P5, mounted read-only by NFS on all workstations.
https://urldefense.com/v3/__https://www.truenas.com/truenas-core/__;!!Mih3wA!AqQRGpcslECVpaex9vo4dVX5T8zAd44Z8fmQIknMvTitF-MbRHg6WMgKvV2L45_eUX2EqUDIFeYVPLCuJqtwf_G5HhQ$ 
https://urldefense.com/v3/__https://www.archiware.com/products/p5-archivehttps:/*www.archiware.com/products/p5-archive__;Lw!!Mih3wA!AqQRGpcslECVpaex9vo4dVX5T8zAd44Z8fmQIknMvTitF-MbRHg6WMgKvV2L45_eUX2EqUDIFeYVPLCuJqtwoHGXxyw$ <https://urldefense.com/v3/__https://www.archiware.com/products/p5-archive__;!!Mih3wA!AqQRGpcslECVpaex9vo4dVX5T8zAd44Z8fmQIknMvTitF-MbRHg6WMgKvV2L45_eUX2EqUDIFeYVPLCuJqtwu0b4Bos$ >
https://urldefense.com/v3/__https://www.supermicro.com/en/products/system/storage/2u/ssg-540p-e1ctr60h__;!!Mih3wA!AqQRGpcslECVpaex9vo4dVX5T8zAd44Z8fmQIknMvTitF-MbRHg6WMgKvV2L45_eUX2EqUDIFeYVPLCuJqtwhckoT10$ 
  *
Online storage: DDN file server (was too expensive!): 700 TB (120 HDD x 4/6 TB), 40Gb infiniband connected to HPC cluster, read-write.
  *
Flash storage (e.g. local GPU data cache): 8x 4TB NVMe SSD RAID0 PCIe4 card (we have PCIe3 version and a 4-drive PCIe4) for processing to keep those GPUs busy.
https://urldefense.com/v3/__https://www.highpoint-tech.com/nvme-individual/ssd7540__;!!Mih3wA!AqQRGpcslECVpaex9vo4dVX5T8zAd44Z8fmQIknMvTitF-MbRHg6WMgKvV2L45_eUX2EqUDIFeYVPLCuJqtwlFNIDis$ 

I move data in stages from online > near line > tape, as they age.
The supermicro server running TrueNAS has been very reliable. 3 drives failed over the last 4 years. Easy to hot-swap after it sends you an email. TrueNAS is highly recommended. Archiware is great for archival. It keeps a catalog of everything. Once the tape library is full, I move the barcoded tapes to a shelf in temperature-controlled long-term storage room. If a tape is not found in the library, archiware will tell you the barcode and you simply need to load it.

   Matthias



________________________________
From: 3dem <3dem-bounces at ncmir.ucsd.edu> on behalf of Ludtke, Steven J. <sludtke at bcm.edu>
Sent: Thursday, February 15, 2024 12:57 PM
To: Jobichen <jobichenc at yahoo.com>
Cc: 3DEM Mailing List <3dem at ncmir.ucsd.edu>
Subject: Re: [3dem] Advice on storage server

I should add that for long term backup, the most typical strategy is the convenient but unsafe "drives on a shelf". That would be a one-time purchase of ~$2k, but the chances that all of the drives work and you can fully recover the data in 5 or 10 years may be a little marginal. Worth noting also that portable USB drives as opposed to drives designed to be internal drives in a PC have massively lower reliability ratings in general. Also note that SSD's lose data over time if they aren't plugged in to a power source periodically for a "refresh".

---
Steven Ludtke, Ph.D. <sludtke at bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center


On Feb 14, 2024, at 8:06 PM, Ludtke, Steven J. <sludtke at bcm.edu> wrote:

If you don't expect to need to access it again, ie - purely an emergency backup, Amazon Glacier is a cost effective solution, as long as you have $ to continue paying for it. 100 TB of deep archive Glacier storage would run about $1200/year (+ additional cost if you need to retrieve it).

If you are storing it for possible additional processing, then you want the storage to be "close" in data transfer terms to the processing power. ie - if you are processing in the cloud, then storing the data in the cloud makes sense. Clearly you would not want to process the data directly from cloud storage. Keep in mind the relative speeds of transfer for different devices/transfer methods:

M.2 SSD -> 2-4 GB/s
8 drive RAID array with spinning platters directly on the machine -> ~1 GB/s
SATA SSD -> 0.6 GB/s
single spinning platter on machine -> 0.15 GB/s
gigabit network remote access -> 0.1 GB/s
less than gigabit remote access (cloud at typical institutions) -> <0.1 GB/s

For size comparison, a 4k x 4k x 1k tomogram at 8 bits is 16 GB, so opening that from an M.2 SSD might take 4-8 seconds, whereas opening the same file over a gigabit NAS would take almost 3 minutes.

Personally, I have a 12 bay Synology NAS box with a 10 Gb network card in it under my desk. With 16 TB drives and RAID6 this gives about 150 TB of usable storage space, which you can access at ~1 GB/s. Cost ~$5000, with an expected drive life of ~5 years, ie - expect you will have to periodically replace bad drives occasionally after the first few years.

It's worth noting here that at $5000, with an expected life of ~5 years before you start having to pay for more drives, this is $1000/year and gives high speed access, compared to the $1200/year for deep Glacier storage above. However, the Glacier storage has much better reliability than a single RAID6 array with no additional backup.

Anyway, some food for thought  :^)

---
Steven Ludtke, Ph.D. <sludtke at bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center


On Feb 14, 2024, at 6:14 PM, Jobichen <jobichenc at yahoo.com> wrote:

Dear All,
We are looking for some suggestions on storing the raw datasets/movies. What will be best option for storing around 100TB of movies/processed data.
What will be pros/cons of having own storage server vs cloud storage options.
Thank you for your time.
Jobi



_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GWA2IF6nkq8sZMXHpp1Xpg&m=-fMPusn_TT7DVAUweasDDQG4kEyzhEAyjRtShGQYPmx9cRVoBtVsmUUqEMrMPs9w&s=LBeNcMDu7IJx1_Y7BTp2_JFhuug6w0oVJobkLUozOFc&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240215/17bdcd02/attachment.html>


More information about the 3dem mailing list