[3dem] 3dem Digest, Vol 150, Issue 49

Sindelar, Charles charles.sindelar at yale.edu
Fri Feb 21 11:56:30 PST 2020


Hear! Hear!  To Alexis and Steve's points- I think these concisely capture the reason why most of us (myself included) are content to use the widely accepted 0.143 criterion- bearing in mind the important caveats that have been raised.

One way to summarize the difference between the "0.143 threshold" approach and the "half-bit" approach (van Heel & Schatz (2005)) is that they attempt answer different questions- 

(1) What is a minimally biased estimate of the resolution. (0.143)

vs

(2)  What is the highest resolution one can confidently claim one has. (half-bit)

The second approach suffers from the problem that as the noise in the measurement increasesa resampling approac, there is a systematic bias to underestimate the resolution (and who wants to do that??). Plus, I enjoy the whimsy of rallying around a seemingly arbitrary number ("0.143").

To improve the statistics, one could compute a large number of gold-standard FSC curves from resampled subsets of the data. But that hardly seems worth it when the bigger problems often come from other sources of systematic error, as many have noted. I like Steve's idea of reporting error bars on the resolution estimate.

- Chuck
 

    Today's Topics:
    
       1. Re: Which resolution? (Alexis Rohou)
       2. Re: Which resolution? (Ludtke, Steven J.)
    
    
    ----------------------------------------------------------------------
    
    Message: 1
    Date: Fri, 21 Feb 2020 08:34:45 -0800
    From: Alexis Rohou <a.rohou at gmail.com>
    To: "Penczek, Pawel A" <Pawel.A.Penczek at uth.tmc.edu>
    Cc: "3dem at ncmir.ucsd.edu" <3dem at ncmir.ucsd.edu>, Marin van Heel
    	<marin.vanheel at googlemail.com>, "ccpem at jiscmail.ac.uk"
    	<ccpem at jiscmail.ac.uk>, "CCP4BB at JISCMAIL.AC.UK"
    	<CCP4BB at jiscmail.ac.uk>
    Subject: Re: [3dem] Which resolution?
    Message-ID:
    	<CAM5goXS5xK2OoBUSFDQzv7HgnkiGw6nZSU0+A5BJoKVcT20o_A at mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"
    
    Hi all,
    
    For those bewildered by Marin's insistence that everyone's been messing up
    their stats since the bronze age, I'd like to offer what my understanding
    of the situation. More details in this thread from a few years ago on the
    exact same topic:
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
    
    Notwithstanding notational problems (e.g. strict equations as opposed to
    approximation symbols, or omission of symbols to denote estimation), I
    believe Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal
    & Henderson 2003) are fine. The cross terms that Marin is agitated about
    indeed do in fact have an expectation value of 0.0 (in the ensemble; if the
    experiment were performed an infinite number of times with different
    realizations of noise). I don't believe Pawel or Jose Maria or any of the
    other authors really believe that the cross-terms are orthogonal.
    
    When N (the number of independent Fouier voxels in a shell) is large
    enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty
    good one, even for a single FSC experiment. This is why, in my book,
    derivations that depend on Frank & Al-Ali are OK, under the strict
    assumption that N is large. Numerically, this becomes apparent when Marin's
    half-bit criterion is plotted - asymptotically it has the same behavior as
    a constant threshold.
    
    So, is Marin wrong to worry about this? No, I don't think so. There are
    indeed cases where the assumption of large N is broken. And under those
    circumstances, any fixed threshold (0.143, 0.5, whatever) is dangerous.
    This is illustrated in figures of van Heel & Schatz (2005). Small boxes,
    high-symmetry, small objects in large boxes, and a number of other
    conditions can make fixed thresholds dangerous.
    
    It would indeed be better to use a non-fixed threshold. So why am I not
    using the 1/2-bit criterion in my own work? While numerically it behaves
    well at most resolution ranges, I was not convinced by Marin's derivation
    in 2005. Philosophically though, I think he's right - we should aim for FSC
    thresholds that are more robust to the kinds of edge cases mentioned above.
    It would be the right thing to do.
    
    Hope this helps,
    Alexis
    
    
    ------------------------------
    
    Message: 2
    Date: Fri, 21 Feb 2020 17:19:00 +0000
    From: "Ludtke, Steven J." <sludtke at bcm.edu>
    To: Alexis Rohou <a.rohou at gmail.com>
    Cc: "Pawel A. Penczek" <Pawel.A.Penczek at uth.tmc.edu>, Marin van Heel
    	<marin.vanheel at googlemail.com>, "CCPEM at JISCMAIL.AC.UK"
    	<ccpem at jiscmail.ac.uk>, "3dem at ncmir.ucsd.edu" <3dem at ncmir.ucsd.edu>,
    	"CCP4BB at JISCMAIL.AC.UK" <CCP4BB at jiscmail.ac.uk>
    Subject: Re: [3dem] Which resolution?
    Message-ID: <84516CBB-AE8E-49B4-A123-7C6724E93CE2 at bcm.edu>
    Content-Type: text/plain; charset="utf-8"
    
    I've been steadfastly refusing to get myself dragged in this time, but with this very sensible statement (which I am largely in agreement with), I thought I'd throw in one thought, just to stir the pot a little more.
    
    This is not a new idea, but I think it is the most sensible strategy I've heard proposed, and addresses Marin's concerns in a more conventional way. What we are talking about here is the statistical noise present in the FSC curves themselves. Viewed from the framework of traditional error analysis and propagation of uncertainties, which pretty much every scientist should be familiar with since high-school, (and thus would not be confusing to the non statisticians)  the 'correct' solution to this issue is not to adjust the threshold, but to present FSC curves with error bars.
    
    One can then use a fixed threshold at a level based on expectation values, and simply produce a resolution value which also has an associated uncertainty. This is much better than using a variable threshold and still producing a single number with no uncertainty estimate!  Not only does this approach account for the statistical noise in the FSC curve, but it also should stop people from reporting resolutions as 2.3397 ?, as it would be silly to say 2.3397 +- 0.2.
    
    The cross terms are not ignored, but are used in the production of the error bars. This is a very simple approach, which is certainly closer to being correct than the fixed threshold without error-bars approach, and it solves many of the issues we have with resolution reporting people do.  Of course we still have people who will insist that 3.2+-0.2 is better than 3.3+-0.2, but there isn't much you can do about them... (other than beat them over the head with a statistics textbook).
    
    The caveat, of course, is that like all propagation of uncertainty that it is a linear approximation, and the correlation axis isn't linear, so the typical Normal distributions with linear propagation used to justify propagation of uncertainty aren't _strictly_ true. However, the approximation is fine as long as the error bars are reasonably small compared to the -1 to 1 range of the correlation axis. Each individual error bar is computed around its expectation value, so the overall nonlinearity of the correlation isn't a concern.
    
    
    
    --------------------------------------------------------------------------------------
    Steven Ludtke, Ph.D. <sludtke at bcm.edu<mailto:sludtke at bcm.edu>>                      Baylor College of Medicine
    Charles C. Bell Jr., Professor of Structural Biology

 



More information about the 3dem mailing list