[3dem] GPU-ERROR - Relion-3.0-beta
Dieter Blaas
dieter.blaas at meduniwien.ac.at
Mon Feb 4 21:20:15 PST 2019
Dear all,
I did a test install of the latest version of Relion-3.0-beta on a
relatively potent workstation (CentOS Linux release 7.5.1804) and
everything runs fine on all 64 CPUs. However, when running e.g. Class2D
using 2 or 4 GPUs (set to 5 MPIs/4 Threads and 3 MPIs/2 Threads,
respectively) I receive the error below immediately upon starting the
run. The 4 GPUs are 1080 Ti (11 GB). To the best of my knowledge none of
the possibilities below applies. I am not very familiar with compilation
and cannot exclude that I did something wrong. However, there was no
error during the compilation with cmake3 followed by make and I
previously installed this version of relion on two other machines
without any issue!
Thanks for hints, bw Dieter
-------------------------------------------
ERROR: unknown error in
/home/blaas/software/RELIONCUDA/relion-3.0_beta/src/ml_optimiser_mpi.cpp
at line 128 (error-code 30)
in:
/home/blaas/software/RELIONCUDA/relion-3.0_beta/src/acc/cuda/cuda_settings.h,
line 67
=== Backtrace ===
/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41)
[0x448f81]
/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi()
[0x45e513]
/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x2284)
[0x468a54]
/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(main+0xb79)
[0x4367b9]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fc01f87a3d5]
/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi()
[0x439b0f]
==================
ERROR:
A GPU-function failed to execute.
If this occured at the start of a run, you might have GPUs which
are incompatible with either the data or your installation of relion.
If you
-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
this may happen.
-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
at least compute 3.5. You may be trying to use a GPU older than
this. If you have multiple generations, try specifying --gpu <X>
with X=0. Then try X=1 in a new run, and so on. The numbering of
GPUs may not be obvious from the driver or intuition. For a list
of GPU compute generations, see
en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
as to not require this, and may thus have unforeseen requirements
when run in this mode. If you think it is nonetheless necessary,
please consult the developers with this error.
More information about the 3dem
mailing list