[3dem] GPU-ERROR - Relion-3.0-beta

Dieter Blaas dieter.blaas at meduniwien.ac.at
Tue Feb 5 19:18:22 PST 2019


Hi Takanori and Dario,
> *Thanks a lot for your hints!*

As I pointed out, I can, unfortunately, not change the CUDA version 
neither the nvidia driver, which is currently 410.78 (the newest one is 
410.93). The parameter "-DCUDA_ARCH=61" did not remove the error.

Could it be that I made a mistake (maybe something missing in the path?) 
although I do not get any error on compilation....

> "echo $PATH" gives me:
> /usr/lib64/openmpi3/bin:/home/blaas/bin/:/home/blaas/software/RELIONCUDA/relion-3.0_beta/build/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin/ 
> and:
> "echo $LD_LIBRARY_PATH" gives me:
> /home/blaas/software/usr/include:/home/blaas/software/usr/lib64:/usr/lib64:/home/blaas/software/RELIONCUDA/relion-3.0_beta/build/lib:/usr/lib64/openmpi/lib:/usr/lib64:/home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/lib:/usr/lib64/openmpi/lib 
>
>
> "nvcc --version" gives me: Cuda compilation tools, release 10.0, 
> V10.0.130
and "cat /usr/local/cuda/version.txt" gives me: CUDA Version 10.0.130
>
> *Do you believe that updating the Nvidia-driver and/or downgrading the 
> CUDA version might fix the problem? Does anybody use CUDA-10 yet?**
> ***
> bw Dieter
>
>
> ------------------------------------------------------------------------
> Dieter Blaas,
> Max F. Perutz Laboratories
> Medical University of Vienna,
> Inst. Med. Biochem., Vienna Biocenter (VBC),
> Dr. Bohr Gasse 9/3,
> A-1030 Vienna, Austria,
> Tel: 0043 1 4277 61630,
> Fax: 0043 1 4277 9616,
> e-mail: dieter.blaas at meduniwien.ac.at
> ------------------------------------------------------------------------
>
> Am 05.02.2019 um 08:45 schrieb Takanori Nakane:
>> Hi,
>>
>> Which version of nvcc did you use for compilation?
>> Do you have the right version of CUDA runtime in LD_LIBRARY_PATH?
>>
>> Best regards,
>>
>> Takanori Nakane
>>
>> On 2019/02/05 5:20, Dieter Blaas wrote:
>>> Dear all,
>>>
>>>      I did a test install of the latest version of Relion-3.0-beta 
>>> on a relatively potent workstation (CentOS Linux release 7.5.1804) 
>>> and everything runs fine on all 64 CPUs. However, when running e.g. 
>>> Class2D using 2 or 4 GPUs (set to 5 MPIs/4 Threads and 3 MPIs/2 
>>> Threads, respectively) I receive the error below immediately upon 
>>> starting the run. The 4 GPUs are 1080 Ti (11 GB). To the best of my 
>>> knowledge none of the possibilities below applies. I am not very 
>>> familiar with compilation and cannot exclude that I did something 
>>> wrong. However, there was no error during the compilation with 
>>> cmake3 followed by make and I previously installed this version of 
>>> relion on two other machines without any issue!
>>>
>>> Thanks for hints, bw Dieter
>>>
>>> -------------------------------------------
>>>
>>> ERROR: unknown error in 
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/src/ml_optimiser_mpi.cpp 
>>> at line 128 (error-code 30)
>>> in: 
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/src/acc/cuda/cuda_settings.h, 
>>> line 67
>>> === Backtrace  ===
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) 
>>> [0x448f81]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi() 
>>> [0x45e513]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x2284) 
>>> [0x468a54]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi(main+0xb79) 
>>> [0x4367b9]
>>> /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fc01f87a3d5]
>>> /home/blaas/software/RELIONCUDA/relion-3.0_beta/ttt/bin/relion_refine_mpi() 
>>> [0x439b0f]
>>> ==================
>>> ERROR:
>>>
>>> A GPU-function failed to execute.
>>>
>>>   If this occured at the start of a run, you might have GPUs which
>>> are incompatible with either the data or your installation of relion.
>>> If you
>>>
>>>      -> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
>>>         and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
>>>         this may happen.
>>>
>>>      -> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
>>>         at least compute 3.5. You may be trying to use a GPU older than
>>>         this. If you have multiple generations, try specifying --gpu 
>>> <X>
>>>         with X=0. Then try X=1 in a new run, and so on. The 
>>> numbering of
>>>         GPUs may not be obvious from the driver or intuition. For a 
>>> list
>>>         of GPU compute generations, see
>>>
>>> en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
>>>
>>>      -> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
>>>         as to not require this, and may thus have unforeseen 
>>> requirements
>>>         when run in this mode. If you think it is nonetheless 
>>> necessary,
>>>         please consult the developers with this error.
>>>
>>>
>>> _______________________________________________
>>> 3dem mailing list
>>> 3dem at ncmir.ucsd.edu
>>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>
>> _______________________________________________
>> 3dem mailing list
>> 3dem at ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20190206/38a0110a/attachment-0001.html>


More information about the 3dem mailing list