[3dem] GPU-ERROR - Relion-3.0-beta
Dieter Blaas
dieter.blaas at meduniwien.ac.at
Tue Feb 5 19:18:22 PST 2019
Hi Takanori and Dario,
> *Thanks a lot for your hints!*
As I pointed out, I can, unfortunately, not change the CUDA version
neither the nvidia driver, which is currently 410.78 (the newest one is
410.93). The parameter "-DCUDA_ARCH=61" did not remove the error.
Could it be that I made a mistake (maybe something missing in the path?)
although I do not get any error on compilation....
> "echo $PATH" gives me:
> /usr/lib64/openmpi3/bin:/home/blaas/bin/:/home/blaas/software/RELIONCUDA/relion-3.0_beta/build/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin/
> and:
> "echo $LD_LIBRARY_PATH" gives me:
> /home/blaas/software/usr/include:/home/blaas/software/usr/lib64:/usr/lib64:/home/blaas/software/RELIONCUDA/relion-3.0_beta/build/lib:/usr/lib64/openmpi/lib:/usr/lib64:/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/lib:/usr/lib64/openmpi/lib
>
>
> "nvcc --version" gives me: Cuda compilation tools, release 10.0,
> V10.0.130
and "cat /usr/local/cuda/version.txt" gives me: CUDA Version 10.0.130
>
> *Do you believe that updating the Nvidia-driver and/or downgrading the
> CUDA version might fix the problem? Does anybody use CUDA-10 yet?**
> ***
> bw Dieter
>
>
> ------------------------------------------------------------------------
> Dieter Blaas,
> Max F. Perutz Laboratories
> Medical University of Vienna,
> Inst. Med. Biochem., Vienna Biocenter (VBC),
> Dr. Bohr Gasse 9/3,
> A-1030 Vienna, Austria,
> Tel: 0043 1 4277 61630,
> Fax: 0043 1 4277 9616,
> e-mail: dieter.blaas at meduniwien.ac.at
> ------------------------------------------------------------------------
>
> Am 05.02.2019 um 08:45 schrieb Takanori Nakane:
>> Hi,
>>
>> Which version of nvcc did you use for compilation?
>> Do you have the right version of CUDA runtime in LD_LIBRARY_PATH?
>>
>> Best regards,
>>
>> Takanori Nakane
>>
>> On 2019/02/05 5:20, Dieter Blaas wrote:
>>> Dear all,
>>>
>>> I did a test install of the latest version of Relion-3.0-beta
>>> on a relatively potent workstation (CentOS Linux release 7.5.1804)
>>> and everything runs fine on all 64 CPUs. However, when running e.g.
>>> Class2D using 2 or 4 GPUs (set to 5 MPIs/4 Threads and 3 MPIs/2
>>> Threads, respectively) I receive the error below immediately upon
>>> starting the run. The 4 GPUs are 1080 Ti (11 GB). To the best of my
>>> knowledge none of the possibilities below applies. I am not very
>>> familiar with compilation and cannot exclude that I did something
>>> wrong. However, there was no error during the compilation with
>>> cmake3 followed by make and I previously installed this version of
>>> relion on two other machines without any issue!
>>>
>>> Thanks for hints, bw Dieter
>>>
>>> -------------------------------------------
>>>
>>> ERROR: unknown error in
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/src/ml_optimiser_mpi.cpp
>>> at line 128 (error-code 30)
>>> in:
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/src/acc/cuda/cuda_settings.h,
>>> line 67
>>> === Backtrace ===
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41)
>>> [0x448f81]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi()
>>> [0x45e513]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x2284)
>>> [0x468a54]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(main+0xb79)
>>> [0x4367b9]
>>> /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fc01f87a3d5]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi()
>>> [0x439b0f]
>>> ==================
>>> ERROR:
>>>
>>> A GPU-function failed to execute.
>>>
>>> If this occured at the start of a run, you might have GPUs which
>>> are incompatible with either the data or your installation of relion.
>>> If you
>>>
>>> -> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
>>> and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
>>> this may happen.
>>>
>>> -> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
>>> at least compute 3.5. You may be trying to use a GPU older than
>>> this. If you have multiple generations, try specifying --gpu
>>> <X>
>>> with X=0. Then try X=1 in a new run, and so on. The
>>> numbering of
>>> GPUs may not be obvious from the driver or intuition. For a
>>> list
>>> of GPU compute generations, see
>>>
>>> en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
>>>
>>> -> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
>>> as to not require this, and may thus have unforeseen
>>> requirements
>>> when run in this mode. If you think it is nonetheless
>>> necessary,
>>> please consult the developers with this error.
>>>
>>>
>>> _______________________________________________
>>> 3dem mailing list
>>> 3dem at ncmir.ucsd.edu
>>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>
>> _______________________________________________
>> 3dem mailing list
>> 3dem at ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20190206/38a0110a/attachment-0001.html>
More information about the 3dem
mailing list