[3dem] relion3.0 3d multi-body problem

Craig Yoshioka yoshiokc at ohsu.edu
Wed Aug 15 10:15:31 PDT 2018


I’d run the last round(s) in CPU not GPU.  the 385G of RAM won’t help when you are limited to the 11GB the 1080Tis have for holding working data.


On Aug 15, 2018, at 10:09 AM, Xu, Tinghai (Peter) <Tinghai.Xu at vai.org<mailto:Tinghai.Xu at vai.org>> wrote:

Dear all,
When I using the relion3.0’s 3d multi-body on HPC node (4X GPU 1080Ti with 385G memory), it always get out of memory error. I used almost all the default setting, just use GPU acceleration with Number of MPI procs:5 Number of threads:2.
I tried with/without Pre-read all particles in to RAM.

Here is the error massege:
“ERROR: out of memory in /primary/vari/software/relion/relion-3.0_beta/src/acc/acc_projector_impl.h at line 62 (error-code 2)
in: /primary/vari/software/relion/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 67
=== Backtrace  ===
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x4417c1]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi() [0x642239]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi(_ZN12AccProjector9setMdlDimEiiiiiii+0x19d) [0x642a2d]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi(_ZN14MlDeviceBundle22setupFixedSizedObjectsEv+0x30b) [0x60813b]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x1585) [0x4475e5]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xaa) [0x454d9a]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi(main+0xb15) [0x433175]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab9776b35]
/primary/vari/software/relion/relion-3.0_beta/build/bin/relion_refine_mpi() [0x4364af]
==================
ERROR:

A GPU-function failed to execute.

If this occured at the start of a run, you might have GPUs which
are incompatible with either the data or your installation of relion.
If you

                -> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
                   and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
                   this may happen.

                -> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
                   at least compute 3.5. You may be trying to use a GPU older than
                   this. If you have multiple generations, try specifying --gpu <X>
                   with X=0. Then try X=1 in a new run, and so on. The numbering of
                   GPUs may not be obvious from the driver or intuition. For a list
                   of GPU compute generations, see

                   en.wikipedia.org/wiki/CUDA#Version_features_and_specifications<http://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications>

                -> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
                   as to not require this, and may thus have unforeseen requirements
                   when run in this mode. If you think it is nonetheless necessary,
                   please consult the developers with this error.

If this occurred at the middle or end of a run, it might be that

                -> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
                   subject to many restrictions, and relion is written to work within
                   common restraints. If you have exotic data or settings, unexpected
                   configurations may occur. See also above point regarding
                   double precision.
If none of the above applies, please report the error to the relion
developers at    github.com/3dem/relion/issues<http://github.com/3dem/relion/issues>
”
The run.out message for the last Iteration is:
“Auto-refine: Iteration= 13
Auto-refine: Resolution= 22.3816 (no gain for 13 iter)
 Auto-refine: Changes in angles= 1.37635 degrees; and in offsets= 0.424773 pixels (no gain for 1 iter)
 Auto-refine: Refinement has converged, entering last iteration where two halves will be combined...
Auto-refine: The last iteration will use data to Nyquist frequency, which may take more CPU and RAM.
Estimating accuracies in the orientational assignment ...
1.33/1.33 min ............................................................~~(,_,">
Auto-refine: Estimated accuracy angles= 2.338 degrees; offsets= 0.956 pixels
Auto-refine: Angular step= 0.9375 degrees; local searches= true
Auto-refine: Offset search range= 2.151 pixels; offset step= 0.717 pixels
CurrentResolution= 22.3816 Angstroms, which requires orientationSampling of at least 10 degrees for a particle of diameter 250 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 1260
OrientationalSampling= 1.875 NrOrientations= 140
TranslationalSampling= 1.434 NrTranslations= 9
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 40320
OrientationalSampling= 0.9375 NrOrientations= 1120
TranslationalSampling= 0.717 NrTranslations= 36
=============================
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.
”
When I just continue from Iteration 12, it still pops out the same error:

Does anyone have come across this situation?  Thank you so much.

Best Regards,
Tinghai (Peter) Xu

333 Bostwick Ave., N.E., Grand Rapids, Michigan 49503
Phone: 616-234-5787 | Email: Tinghai.Xu at vai.org<mailto:Tinghai.Xu at vai.org>
_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20180815/c44a8067/attachment-0001.html>


More information about the 3dem mailing list