[3dem] Advice for hardware of compute nodes

Rodarte, Justas V jrodarte at fredhutch.org
Thu May 20 10:47:40 PDT 2021


Hi all,

My apologies to jump in on this conversation with something outside the scope of the question, but Kilian's question about AVX-512 considerations in a build is related to an issue our lab is having with a workstation.

We recently got a workstation from a vendor with 2 RTX 3090s, 128GB ram, and a 10980XE processor on an ASUS x299 SAGE 10GbE. Since the start, it's had issues with running 2D and 3D classification and refinement jobs, primarily in cryosparc or programs using Cryosparc engine (ie Scipion). The system will get 3-5 iterations into a job, before spontaneously restarting. It doesn't post a kernel panic error, nor does it post any errors in system logs besides orphaned processes and unrelated process errors from the crash. Normal stress tests of both CPU and GPU haven't returned any issues.

I've traced the issue, potentially, to an issue with AVX-512 processing causing the CPU to trip some cutoff and shutdown the system. Changing some overclocking settings in the BIOS to limit voltage/clock speed has helped, but the issue still happens, just not as soon in the job (iteration ~15). Does anyone have a recommended general BIOS configuration for computers doing Cryo-EM processing with Cryosparc and related programs? Thank you for any possible help!

Best,
Justas
________________________________
From: 3dem <3dem-bounces at ncmir.ucsd.edu> on behalf of Guillaume Gaullier <guillaume.gaullier at icm.uu.se>
Sent: Thursday, May 20, 2021 10:33:54 AM
To: Israel Fernandez <israel.elotro at gmail.com>
Cc: 3dem at ncmir.ucsd.edu <3dem at ncmir.ucsd.edu>
Subject: Re: [3dem] Advice for hardware of compute nodes

In my case it was 768 GB total (64 GB * 12 slots I believe, in a Supermicro case). But yes, 512 GB already qualifies as a "ridiculous amount" in my book. You could also call it "future proof" (as in: future programs implementing more sophisticated analyses than we use now may need more RAM), which is a better vocabulary to use as a justification for your spendings. :-)

Guillaume


On 20 May 2021, at 16:14, Israel Fernandez <israel.elotro at gmail.com<mailto:israel.elotro at gmail.com>> wrote:

Ridiculous amount of RAM could mean 512Gb?

On Thu, 20 May 2021 at 13:21, Guillaume Gaullier <guillaume.gaullier at icm.uu.se<mailto:guillaume.gaullier at icm.uu.se>> wrote:
Hello Kilian,

Regarding number of cores versus frequency, I would suggest going for more cores: some job types in RELION are not GPU-accelerated but scale very well with more MPI processes (motion correction, bayesian polishing and CTF refinement), so for those you will benefit a lot more from having many cores than from having fewer faster cores. And 2.8-3.7 GHz is already plenty.

Another general advice would be: don’t be cheap with RAM and storage, both in terms of speed and amount. Fast CPUs and GPUs are no use if you can’t feed them your data fast enough (they would spend a lot of their time waiting for inputs). The amount of RAM is also easy to overlook, thinking you will only ever use RELION and cryoSPARC which can both use an SSD cache efficiently, but newer programs don’t always have this capability (I am thinking about cryoDRGN in particular), and it would be frustrating to have to wait until they implement this or find workarounds (RAM upgrade, or not using your entire dataset, etc.) in the meantime. Last year I bought a workstation for the lab I work at with what seemed like a ridiculous amount of RAM, and this year I’m thinking it was a good idea in hindsight, since I’ve been happily running cryoDRGN with large box sizes and many particles without any problem (besides that it takes a long time to run, of course… but I haven’t been limited by RAM when cryoDRGN loads the entire dataset).

I have no idea about your other questions, but there are probably compromises to make between all options (if not technical ones, at least in terms of budget).

I hope this helps, good luck!

Guillaume


On 20 May 2021, at 11:46, Kilian Schnelle <kilian.schnelle at uni-osnabrueck.de<mailto:kilian.schnelle at uni-osnabrueck.de>> wrote:

Hello everyone,

I am currently thinking about which hardware to get for compute nodes mostly used for cryosparc and Relion with Sturm and would be glad if someone could give me an advice.

I was hovering over maybe something like the G292-Z43 servers from Gigabyte with 2-4 RTX 3090, 64-128 GB Ram and a NVMe SSD for cache. (Someone actually has experience with an exported NVMe SSD cache for all nodes? Worth over having one in every node?) I am not sure which CPU to get though I was thinking of:

- 2x AMD Epyc 7543 32c/64t, 2,8-3,7 GHz, 256 MB L3, ~3700$/each

Or higher frequency and less cores?

- 2x AMD Epyc 74F3 24c/48t, 3,2-4,0 GHz, 256 MB L3, ~2900$/each

Or even just a single socket CPU with a different server like the 7532P?(Would have to calculate the PCIe lanes I need for everything but, would PCIe 4.0 x8 or x16 for the GPUs even make a difference?, I mean its same as 3.0 x16 in throughput in theory)

- AMD Epyc 7543P 32c/64t, 2,8-3,7 GHz, 256 MB L3, ~2700$

Is it even worth going for the 256 MB L3 cache? Or is it worth going for Team Blue CPUs because of AVX 512?

Any insides would be highly appreciated.


Best wishes
Kilian
_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=dFvFJX5sHRsUBJKImx2V5A9YiXfLtbd74YGjtYoLJBA&m=YjjTo2YhgevPitB-oXhkELYmq4TGhs3bQrhz-0KIKFc&s=Ok8bSNa5sw18gqEj4krxQ2m9Kjw1Zrn9dHSF1GqMEoM&e=>









När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: https://urldefense.com/v3/__http://www.uu.se/om-uu/dataskydd-personuppgifter/__;!!Mih3wA!RLdXhDpDNXfDhN-R38COC8KNhnggwLxn3TeKbE_EE8coS2SJF6FOqHEYzLNs5Z1OwA$ <https://urldefense.com/v3/__http://www.uu.se/om-uu/dataskydd-personuppgifter/__;!!Mih3wA!W1Uag_Pu-I6ca5jvZWv9Sa-me5dBrMSCy_wySw8GrmNJHJu6GbwIbrA5eNoI_OYasg$>

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: https://urldefense.com/v3/__http://www.uu.se/en/about-uu/data-protection-policy__;!!Mih3wA!RLdXhDpDNXfDhN-R38COC8KNhnggwLxn3TeKbE_EE8coS2SJF6FOqHEYzLO1HUffuQ$ <https://urldefense.com/v3/__http://www.uu.se/en/about-uu/data-protection-policy__;!!Mih3wA!W1Uag_Pu-I6ca5jvZWv9Sa-me5dBrMSCy_wySw8GrmNJHJu6GbwIbrA5eNoVP56dCQ$>
_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=dFvFJX5sHRsUBJKImx2V5A9YiXfLtbd74YGjtYoLJBA&m=YjjTo2YhgevPitB-oXhkELYmq4TGhs3bQrhz-0KIKFc&s=Ok8bSNa5sw18gqEj4krxQ2m9Kjw1Zrn9dHSF1GqMEoM&e=>
--
Israel S Fernandez
Columbia University, New York City, NY, USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20210520/dca41bc7/attachment-0001.html>


More information about the 3dem mailing list