[3dem] Utilizing the Xeon Phi
matthias.wolf at oist.jp
Tue Nov 18 02:14:39 PST 2014
Hi Dewight, Alexis,
Just to chime in after Alexis' message - I did compile frealign on a Phi 7120 about one year ago during a visit by Intel. While the procedure was straight forward, I did not attempt any non-standard optimizations. Out of the box, the performance was rather disappointing and at the time I decided it were better to use standard multi-core Xeon processors.
Compared to Xueming Li's GPU version of frealign, the xeon phi I tested was no competition - I use a 16-GPU box (8x nvidia GTX590 in a Tyan barebone), which accelerates the program ~1500-fold as compared to a single 2.7 GHz Xeon core.
While the concept of the phi is nice - it feels like having a little linux cluster in your PC to which you can ssh and run multi-threaded programs, it has clear limitations: the one I tested had only 16GB memory, which makes large reconstructions problematic. The 61 (Intel Atom-derived) cores per board run at only 1GHz and they have a small cache. Now this is not much different to GPUs, but there are many more cores on most GPUs. Maybe with the right optimizations, the phi would be a worthy adversary, but I did not have the time to find out.
Regarding Intel vs AMD, I agree 100% with Steve Ludke's statements. I tested a 32-core Opteron system against the latest quad core Xeon a couple years ago and while roughly comparable at single-threaded performance, the Xeon scaled linearly with the number of threads (frealign-mp), whereas the Opteron quickly saturated (more than 12 cores were useless) and its performance was significantly lower. I believe this has to do with AMDs interconnects having lower bandwidth than Intel's hypertransport. In particular the E-series Xeons are really very good.
Finally (this came up in a previous thread) - there is no problem operating nvidia gaming GPUs in headless mode with linux - my box is sitting in a rack in the datacenter and I simply ssh to it. Actually, the gaming cards use the same chips as their corresponding quadro or Tesla relatives, less ECC memory. They are usually even higher clocked than the more expensive "professional" cards, but the chief difference is that the GTX series has less memory. So unless you need quad-buffered graphics for windowed stereo and a lot of memory, there is no point in buying anything else. The main issue is to feed them with data, which requires SSD-raid, and providing sufficient current. Cooling can be alleviated by removing their on-board fans in a good rack-mounted case, which brings the temps down by 20-30C.
Matthias Wolf, PhD MPharm - Assistant Professor
Molecular Cryo-Electron Microscopy Unit
Okinawa Institute of Science and Technology Graduate University
1919-1 Tancha, Onna-son, Kunigami-gun
Okinawa 904-0495, Japan
From: 3dem-bounces at ncmir.ucsd.edu [mailto:3dem-bounces at ncmir.ucsd.edu] On Behalf Of Alexis Rohou
Sent: Tuesday, November 18, 2014 2:11 PM
To: 3dem at ncmir.ucsd.edu
Subject: Re: [3dem] Utilizing the Xeon Phi
As far as I know, none of the 3DEM packages have been adapted to run on Phi boards. This means you could run them (provided you recompiled them using the Intel compilers) but only in native mode, which involves SSH'ing onto the boards. And even then, without optimization, you'd probably get worse performance than on a top-of-the-range Xeon chip. However I guess if you pack enough cards per node you might get improved density for your cluster.
The topic of Phi boards was brought up at the NRAMM meeting last week at Scripps and it seemed no-one had tried them yet.
Here at Janelia we bought a Phi 7200 to test out, but haven't got round to doing much with it because of the time required to investigate program optimization and the relatively meager prospective gains.
So, bottom line: don't go for a cluster with Phi boards, because none of the 3DEM software will be ready for them.
Hope this helps,
On 11/12/2014 10:08 AM, Dewight R. Williams wrote:
Has anyone performed 3D single particle reconstruction on the new Intel Xeon Phi boards? When you performed this work did the software need to be recompiled or was it implemented through standard openMPI? What software were you using Frealign, Relion, Xmipp, EMAN2, etc? Thanks, I'm debating on which architecture I want to invest in for a local cluster and any feedback on these questions would be very appreciated.
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
Tel. +1 571 209 4000 x3485
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the 3dem