[3dem] Which resolution?

Mon Feb 24 01:48:06 PST 2020

Dear all,

For the newcomers to the cryo-EM field who may be more familiar with 
X-ray crystallography and who may not be familiar with this longstanding 
discussion, five observations:

1) For large numbers of Fourier components per shell, the FSC=0.143 
criterion is correct and equivalent to the FOM=0.5 criterion used in 
protein crystallography. From personal experience in RELION it has 
worked well in terms of expected behaviour: alpha-helices become tubular 
densities around 9-10A, beta-strands become separated at 4.7A, RNA base 
pairs at 3.6A, etc.

2) Marin is correct that there is an argument for a variable threshold 
over a fixed one: when the number of (independent) Fourier components 
per shell drops (because the shell lies closer to the origin of the 
Fourier transform, i.e. is at lower spatial frequency; or in case of 
symmetry) one needs to raise the threshold.

3) However, the amount by how much the threshold changes for typically 
cases does, in my opinion, not warrant the language used to make point 
2) every other year or so. Please see here 
https://twitter.com/SjorsScheres/status/935182696325763072/ for a plot 
of the frequency-dependent behaviour of the 1/2-bit criterion (without 
symmetry). It asymptotically approaches 0.172. It was chosen (somewhat 
arbitrarily) over the twice as large 1-bit criterion because it was 
closer to 0.143. However, for a typical case where the diameter of the 
particle D is half the box size L (see Marin's paper here: 
https://www.sciencedirect.com/science/article/pii/S1047847705001292) 
<https://www.sciencedirect.com/science/article/pii/S1047847705001292>the 
1/2-bit criterion deviates less from 0.172 than 0.172 itself deviates 
from 0.143 for any Fourier shell that is more than 25 shells away from 
the origin. So, for typical single-particle reconstructions with box 
sizes of say 200-400 pixels used nowadays, and resolutions around say 
half-Nyquist, the frequency-dependent threshold will affect the 
resolution estimate very little, or at least much less than the 
arbitrariness in the 1/2-bit criterion itself. Perhaps someone should 
propose to multiply the 1/2-bit curve by 0.83 so that one gets a 
frequency-dependent threshold that asymptotically reaches 0.143? ;-)

4) There are cases where the frequency-dependent threshold does become 
more relevant, e.g. for very low resolutions or when using much smaller 
box sizes. The latter occurs for example in sub-tomogram averaging or in 
(I suspect) methods some people use for local-resolution calculation.

5) The estimated overall resolution based on whatever FSC-criterion you 
favour is just a number. Besides different criteria, there are also 
subtle differences in the way different programs calculate the half-maps 
and correct for masking effects on the FSC curves. Therefore, don't 
obsess over how this number changes in its decimal: what really matters 
is the quality (interpretability) of your map.

Hope that helps,
Sjors

PS: I am not on CCP4BB, but I would be OK with someone who is forwarding 
this message there.

Marin van Heel wrote:
>
> Hi Carlos Oscar and Jose-Maria,
>
> I choose to answer you guys first, because it will take little of my 
> time to counter your criticism and because I have long since been less 
> than amused by your published, ill-conceived criticism:
>
> “*/Marin, I always suffer with your reference to sloppy statistics. If 
> we take your paper of 2005 where the 1/2 bit criterion was proposed, 
> Eqs. 4 to 15 have completely ignored the fact that you are dealing 
> with Fourier components, that are complex numbers, and consequently 
> you have to deal with random variables that have TWO components, which 
> moreover the real and imaginary part are not independent and, in their 
> turn, they are not independent of the nearby Fourier coefficients so 
> that for computing radial averages you would need to account for the 
> correlation among coefficients/*”//
>
> I had seen this argumentation against our (2005) paper in your 
> manuscript/paper years back. I was so stunned by the level of 
> misunderstanding expressed in your manuscript that I chose not to 
> spend any time reacting to those statements. Now that you choose to so 
> openly display your thoughts on the matter, I have no other choice 
> than to spell out your errors in public.
>
> All complex arrays in our 2005 paper are Hermitian (since they are the 
> FTs of real data), and so are all their inner products. In all the 
> integrals over rings one always averages a complex Fourier-space voxel 
> with its Hermitian conjugate yielding */ONE/* real value (times two)! 
> Without that Hermitian property, FRCs and FSCs, which are real 
> normalised correlation functions would not even have been possible. I 
> was - and still am - stunned by this level of misunderstanding!
>
> This is a blatant blunder that you are propagating over years, a 
> blunder that does not do any good to your reputation, yet also a 
> blunder that has probably damaged to our research income. The fact 
> that you can divulgate such rubbish and leave it out there for years 
> for referees to read (who are possibly not as well educated in physics 
> and mathematics) will do – and may already have done – damage to our 
> research.An apology is appropriate but an apology is not enough.
>
> Maybe you should ask your granting agencies how to transfer 25% of 
> your grant income to our research, in compensation of damages created 
> by your blunder!
>
> Success with your request!
>
> Marin
>
> PS. You have also missed that our 2005 paper explicitly includes the 
> influence of the size of the object within the sampling box (your: 
> “*/they are not independent of the nearby Fourier coefficients/*”). I 
> remain flabbergasted.
>
>
> On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano <coss at cnb.csic.es 
> <mailto:coss at cnb.csic.es>> wrote:
>
>     Dear all,
>
>     I always try to refrain myself from getting into these
>     discussions, but I cannot resist more the temptation. Here are
>     some more ideas that I hope bring more light than confusion:
>
>     - There must be some functional relationship between the FSC and
>     the SNR, but the exact analytical form of this relationship is
>     unknown (I suspect that it must be at least monotonic, the worse
>     the SNR, the worse FSC; but even this is difficult to prove). The
>     relationship we normally use FSC=SNR/(1+SNR) was derived in a
>     context that does not apply to CryoEM (1D stationary signals in
>     real space; our molecules are not stationary), and consequently
>     any reasoning of any threshold based on this relationship is
>     incorrect (see our review).
>
>     - Still, as long as we all use the same threshold, the reported
>     resolutions are comparable to each other. In that regard, I am
>     happy that we have set 0.143 (although any other number would have
>     served the purpose) as the standard.
>
>     - I totally agree with Steve that the full FSC is much more
>     informative than its crossing with the threshold. Specially,
>     because we should be much more worried about its behavior when it
>     has high values than when it has low values. Before crossing the
>     threshold it should be as high as possible, and that is the "true
>     measure" of goodness of the map. When it crosses the threshold of
>     0.143, it has too low SNR, and by definition, that is a very
>     unstable part of the FSC, resulting in relatively unstable reports
>     of resolution. We made some tests about the variability of the FSC
>     (refining random splits of the dataset), trying to put the error
>     bars that Steve was asking for, and it turned out to be pretty
>     reproducible (rather low variance except in the region when it
>     crosses the threshold) as long as the dataset was large enough
>     (which is the current state).
>
>     - @Marin, I always suffer with your reference to sloppy
>     statistics. If we take your paper of 2005 where the 1/2 bit
>     criterion was proposed
>     (https://www.sciencedirect.com/science/article/pii/S1047847705001292),
>     Eqs. 4 to 15 have completely ignored the fact that you are dealing
>     with Fourier components, that are complex numbers, and
>     consequently you have to deal with random variables that have two
>     components, which moreover the real and imaginary part are not
>     independent and, in their turn, they are not independent of the
>     nearby Fourier coefficients so that for computing radial averages
>     you would need to account for the correlation among coefficients
>     (https://www.aimspress.com/fileOther/PDF/biophysics/20150102.pdf).
>     For properly dealing the statistics, at least one needs to carry
>     out a two-dimensional reasoning, including the complex conjugate
>     multiplication which is all missing in your derivation, rather
>     than treating everything as one-dimensional, real valued random
>     variables. Additionally, embedded in your whole reasoning is the
>     idea that the expected value of a ratio is the ratio of the
>     expected values, that is a 0-th order Taylor approximation of the
>     mean of the distribution of a ratio between two random variables.
>     Finally, I always find an extreme difficulty to understand the 1
>     bit or 1/2 bit criteria, that is, what is the relationship between
>     the channel's capacity formula of Shannon
>     (https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem)
>     and our FSC (we do not have any channel through which we are
>     "transmitting" our volume, although it is true we have a model
>     y=x+n that is the same as in signal transmission, it is not true
>     that the average information of a signal is log2(1+SNR); for me,
>     the only relationship is that the SNR appears in both formulas,
>     FSC and channel capacity, but that does not automatically make
>     them comparable and interchangeble). This is not a criticism on
>     your work. I think the FSC is a very useful tool to measure some
>     properties of the reconstruction process and the quality of the
>     dataset (not everything is measured by the FSC) and it also has
>     its drawbacks (for instance, systematic errors are rewarded by the
>     FSC as they are reproducible in both halves). Moreover, I think
>     you are an extremely intelligent person, who I consider a good
>     friend, with a very good intuition about image processing and who
>     has brought very interesting ideas and methodologies into the
>     field. Only that we cannot become crazy about the FSC threshold
>     and the reported resolution, as the most interesting part of the
>     FSC is not when it is low, but when it is high.
>
>     I hope I can keep refraining myself in the future :-)
>
>     Cheers, Carlos Oscar
>
>     On 2/21/20 6:19 PM, Ludtke, Steven J. wrote:
>
>>     I've been steadfastly refusing to get myself dragged in this
>>     time, but with this very sensible statement (which I am largely
>>     in agreement with), I thought I'd throw in one thought, just to
>>     stir the pot a little more.
>>
>>     This is not a new idea, but I think it is the most sensible
>>     strategy I've heard proposed, and addresses Marin's concerns in a
>>     more conventional way. What we are talking about here is the
>>     statistical noise present in the FSC curves themselves. Viewed
>>     from the framework of traditional error analysis and propagation
>>     of uncertainties, which pretty much every scientist should be
>>     familiar with since high-school, (and thus would not be confusing
>>     to the non statisticians)  the 'correct' solution to this issue
>>     is not to adjust the threshold, but to present FSC curves with
>>     error bars.
>>
>>     One can then use a fixed threshold at a level based on
>>     expectation values, and simply produce a resolution value which
>>     also has an associated uncertainty. This is much better than
>>     using a variable threshold and still producing a single number
>>     with no uncertainty estimate! Not only does this approach account
>>     for the statistical noise in the FSC curve, but it also should
>>     stop people from reporting resolutions as 2.3397 Å, as it would
>>     be silly to say 2.3397 +- 0.2.
>>
>>     The cross terms are not ignored, but are used in the production
>>     of the error bars. This is a very simple approach, which is
>>     certainly closer to being correct than the fixed threshold
>>     without error-bars approach, and it solves many of the issues we
>>     have with resolution reporting people do.  Of course we still
>>     have people who will insist that 3.2+-0.2 is better than
>>     3.3+-0.2, but there isn't much you can do about them... (other
>>     than beat them over the head with a statistics textbook).
>>
>>     The caveat, of course, is that like all propagation of
>>     uncertainty that it is a linear approximation, and the
>>     correlation axis isn't linear, so the typical Normal
>>     distributions with linear propagation used to justify propagation
>>     of uncertainty aren't _strictly_ true. However, the approximation
>>     is fine as long as the error bars are reasonably small compared
>>     to the -1 to 1 range of the correlation axis. Each individual
>>     error bar is computed around its expectation value, so the
>>     overall nonlinearity of the correlation isn't a concern.
>>
>>
>>
>>     --------------------------------------------------------------------------------------
>>     Steven Ludtke, Ph.D. <sludtke at bcm.edu <mailto:sludtke at bcm.edu>>  
>>                       Baylor College of Medicine
>>     Charles C. Bell Jr., Professor of Structural Biology
>>     Dept. of Biochemistry and Molecular Biology                   
>>       (www.bcm.edu/biochem <http://www.bcm.edu/biochem>)
>>     Academic Director, CryoEM Core                              
>>       (cryoem.bcm.edu <http://cryoem.bcm.edu>)
>>     Co-Director CIBR Center                  
>>       (www.bcm.edu/research/cibr <http://www.bcm.edu/research/cibr>)
>>
>>
>>
>>>     On Feb 21, 2020, at 10:34 AM, Alexis Rohou <a.rohou at gmail.com
>>>     <mailto:a.rohou at gmail.com>> wrote:
>>>
>>>     ****CAUTION:*** This email is not from a BCM Source. Only click
>>>     links or open attachments you know are safe.*
>>>     ------------------------------------------------------------------------
>>>     Hi all,
>>>
>>>     For those bewildered by Marin's insistence that everyone's been
>>>     messing up their stats since the bronze age, I'd like to offer
>>>     what my understanding of the situation. More details in this
>>>     thread from a few years ago on the exact same topic:
>>>     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
>>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003939.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=CZ3YcAV1LVKXsLT0KjCIRby6j3XPA6GqZcOVP3nMyK0&e=>
>>>     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
>>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003944.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=oG6lGnei74jC5VVGsfFAdiTpIxrZhs_IH2mH0re5QRM&e=>
>>>
>>>     Notwithstanding notational problems (e.g. strict equations as
>>>     opposed to approximation symbols, or omission of symbols to
>>>     denote estimation), I believe Frank & Al-Ali and "descendent"
>>>     papers (e.g. appendix of Rosenthal & Henderson 2003) are fine.
>>>     The cross terms that Marin is agitated about indeed do in fact
>>>     have an expectation value of 0.0 (in the ensemble; if the
>>>     experiment were performed an infinite number of times with
>>>     different realizations of noise). I don't believe Pawel or Jose
>>>     Maria or any of the other authors really believe that the
>>>     cross-terms are orthogonal.
>>>
>>>     When N (the number of independent Fouier voxels in a shell) is
>>>     large enough, mean(Signal x Noise) ~ 0.0 is only an
>>>     approximation, but a pretty good one, even for a single FSC
>>>     experiment. This is why, in my book, derivations that depend on
>>>     Frank & Al-Ali are OK, under the strict assumption that N is
>>>     large. Numerically, this becomes apparent when Marin's half-bit
>>>     criterion is plotted - asymptotically it has the same behavior
>>>     as a constant threshold.
>>>
>>>     So, is Marin wrong to worry about this? No, I don't think so.
>>>     There are indeed cases where the assumption of large N is
>>>     broken. And under those circumstances, any fixed threshold
>>>     (0.143, 0.5, whatever) is dangerous. This is illustrated in
>>>     figures of van Heel & Schatz (2005). Small boxes, high-symmetry,
>>>     small objects in large boxes, and a number of other conditions
>>>     can make fixed thresholds dangerous.
>>>
>>>     It would indeed be better to use a non-fixed threshold. So why
>>>     am I not using the 1/2-bit criterion in my own work? While
>>>     numerically it behaves well at most resolution ranges, I was not
>>>     convinced by Marin's derivation in 2005. Philosophically though,
>>>     I think he's right - we should aim for FSC thresholds that are
>>>     more robust to the kinds of edge cases mentioned above. It would
>>>     be the right thing to do.
>>>
>>>     Hope this helps,
>>>     Alexis
>>>
>>>
>>>
>>>     On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A
>>>     <Pawel.A.Penczek at uth.tmc.edu
>>>     <mailto:Pawel.A.Penczek at uth.tmc.edu>> wrote:
>>>
>>>         Marin,
>>>
>>>         The statistics in 2010 review is fine. You may disagree with
>>>         assumptions, but I can assure you the “statistics” (as you
>>>         call it) is fine. Careful reading of the paper would reveal
>>>         to you this much.
>>>
>>>         Regards,
>>>         Pawel
>>>
>>>>         On Feb 16, 2020, at 10:38 AM, Marin van Heel
>>>>         <marin.vanheel at googlemail.com
>>>>         <mailto:marin.vanheel at googlemail.com>> wrote:
>>>>
>>>>         
>>>>
>>>>         ***** EXTERNAL EMAIL *****
>>>>
>>>>         Dear Pawel and All others ....
>>>>
>>>>         This 2010 review is - unfortunately - largely based on the
>>>>         flawed statistics I mentioned before, namely on the a
>>>>         priori assumption that the inner product of a signal vector
>>>>         and a noise vector are ZERO (an orthogonality assumption). 
>>>>         The (Frank & Al-Ali 1975) paper we have refuted on a number
>>>>         of occasions (for example in 2005, and most recently in our
>>>>         BioRxiv paper) but you still take that as the correct
>>>>         relation between SNR and FRC (and you never cite the
>>>>         criticism...).
>>>>
>>>>         Sorry
>>>>         Marin
>>>>
>>>>         On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A
>>>>         <Pawel.A.Penczek at uth.tmc.edu
>>>>         <mailto:Pawel.A.Penczek at uth.tmc.edu>> wrote:
>>>>
>>>>             Dear Teige,
>>>>
>>>>             I am wondering whether you are familiar with
>>>>
>>>>
>>>>
>>>>                 Resolution measures in molecular electron microscopy.
>>>>
>>>>             Penczek PA. Methods Enzymol. 2010.
>>>>
>>>>
>>>>                   Citation
>>>>
>>>>             Methods Enzymol. 2010;482:73-100. doi:
>>>>             10.1016/S0076-6879(10)82003-8.
>>>>
>>>>
>>>>             You will find there answers to all questions you asked
>>>>             and much more.
>>>>
>>>>             Regards,
>>>>             Pawel Penczek
>>>>
>>>>
>>>>             Regards,
>>>>             Pawel
>>>>             _______________________________________________
>>>>             3dem mailing list
>>>>             3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>>>             https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>>>             <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
>>>>
>>>         _______________________________________________
>>>         3dem mailing list
>>>         3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>>         https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=>
>>>
>>>     _______________________________________________
>>>     3dem mailing list
>>>     3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=
>>
>>
>>     _______________________________________________
>>     3dem mailing list
>>     3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>     https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>
>
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem

-- 
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres