[3dem] 3dem Digest, Vol 150, Issue 49

Sat Feb 22 01:19:26 PST 2020

Dear Chuck, so nice to here from you!

Yes, the idea of having error bars (to have a minimal understading of the
distribution of FSC's characterizing your map) is very appealing...... the
problem is that every ad hoc attempt we have done in that respect has given
us very low spread for large data sets. I am sure it can be done better,
and that in some cases it may be of value, but all seemed to be quite a lot
of work if you wanted to do it well, with not so great prospects (on the
other hand, now that algorithms go so fast, perhaps simple resampling
strategies would be worth to be explored to re.check if really the spread
is small)

Wbw..JM

On Fri, Feb 21, 2020 at 8:56 PM Sindelar, Charles <charles.sindelar at yale.edu>
wrote:

> Hear! Hear!  To Alexis and Steve's points- I think these concisely capture
> the reason why most of us (myself included) are content to use the widely
> accepted 0.143 criterion- bearing in mind the important caveats that have
> been raised.
>
> One way to summarize the difference between the "0.143 threshold" approach
> and the "half-bit" approach (van Heel & Schatz (2005)) is that they attempt
> answer different questions-
>
> (1) What is a minimally biased estimate of the resolution. (0.143)
>
> vs
>
> (2)  What is the highest resolution one can confidently claim one has.
> (half-bit)
>
> The second approach suffers from the problem that as the noise in the
> measurement increasesa resampling approac, there is a systematic bias to
> underestimate the resolution (and who wants to do that??). Plus, I enjoy
> the whimsy of rallying around a seemingly arbitrary number ("0.143").
>
> To improve the statistics, one could compute a large number of
> gold-standard FSC curves from resampled subsets of the data. But that
> hardly seems worth it when the bigger problems often come from other
> sources of systematic error, as many have noted. I like Steve's idea of
> reporting error bars on the resolution estimate.
>
> - Chuck
>
>
>     Today's Topics:
>
>        1. Re: Which resolution? (Alexis Rohou)
>        2. Re: Which resolution? (Ludtke, Steven J.)
>
>
>     ----------------------------------------------------------------------
>
>     Message: 1
>     Date: Fri, 21 Feb 2020 08:34:45 -0800
>     From: Alexis Rohou <a.rohou at gmail.com>
>     To: "Penczek, Pawel A" <Pawel.A.Penczek at uth.tmc.edu>
>     Cc: "3dem at ncmir.ucsd.edu" <3dem at ncmir.ucsd.edu>, Marin van Heel
>         <marin.vanheel at googlemail.com>, "ccpem at jiscmail.ac.uk"
>         <ccpem at jiscmail.ac.uk>, "CCP4BB at JISCMAIL.AC.UK"
>         <CCP4BB at jiscmail.ac.uk>
>     Subject: Re: [3dem] Which resolution?
>     Message-ID:
>         <
> CAM5goXS5xK2OoBUSFDQzv7HgnkiGw6nZSU0+A5BJoKVcT20o_A at mail.gmail.com>
>     Content-Type: text/plain; charset="utf-8"
>
>     Hi all,
>
>     For those bewildered by Marin's insistence that everyone's been
> messing up
>     their stats since the bronze age, I'd like to offer what my
> understanding
>     of the situation. More details in this thread from a few years ago on
> the
>     exact same topic:
> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
>
>     Notwithstanding notational problems (e.g. strict equations as opposed
> to
>     approximation symbols, or omission of symbols to denote estimation), I
>     believe Frank & Al-Ali and "descendent" papers (e.g. appendix of
> Rosenthal
>     & Henderson 2003) are fine. The cross terms that Marin is agitated
> about
>     indeed do in fact have an expectation value of 0.0 (in the ensemble;
> if the
>     experiment were performed an infinite number of times with different
>     realizations of noise). I don't believe Pawel or Jose Maria or any of
> the
>     other authors really believe that the cross-terms are orthogonal.
>
>     When N (the number of independent Fouier voxels in a shell) is large
>     enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a
> pretty
>     good one, even for a single FSC experiment. This is why, in my book,
>     derivations that depend on Frank & Al-Ali are OK, under the strict
>     assumption that N is large. Numerically, this becomes apparent when
> Marin's
>     half-bit criterion is plotted - asymptotically it has the same
> behavior as
>     a constant threshold.
>
>     So, is Marin wrong to worry about this? No, I don't think so. There are
>     indeed cases where the assumption of large N is broken. And under those
>     circumstances, any fixed threshold (0.143, 0.5, whatever) is dangerous.
>     This is illustrated in figures of van Heel & Schatz (2005). Small
> boxes,
>     high-symmetry, small objects in large boxes, and a number of other
>     conditions can make fixed thresholds dangerous.
>
>     It would indeed be better to use a non-fixed threshold. So why am I not
>     using the 1/2-bit criterion in my own work? While numerically it
> behaves
>     well at most resolution ranges, I was not convinced by Marin's
> derivation
>     in 2005. Philosophically though, I think he's right - we should aim
> for FSC
>     thresholds that are more robust to the kinds of edge cases mentioned
> above.
>     It would be the right thing to do.
>
>     Hope this helps,
>     Alexis
>
>
>     ------------------------------
>
>     Message: 2
>     Date: Fri, 21 Feb 2020 17:19:00 +0000
>     From: "Ludtke, Steven J." <sludtke at bcm.edu>
>     To: Alexis Rohou <a.rohou at gmail.com>
>     Cc: "Pawel A. Penczek" <Pawel.A.Penczek at uth.tmc.edu>, Marin van Heel
>         <marin.vanheel at googlemail.com>, "CCPEM at JISCMAIL.AC.UK"
>         <ccpem at jiscmail.ac.uk>, "3dem at ncmir.ucsd.edu" <3dem at ncmir.ucsd.edu
> >,
>         "CCP4BB at JISCMAIL.AC.UK" <CCP4BB at jiscmail.ac.uk>
>     Subject: Re: [3dem] Which resolution?
>     Message-ID: <84516CBB-AE8E-49B4-A123-7C6724E93CE2 at bcm.edu>
>     Content-Type: text/plain; charset="utf-8"
>
>     I've been steadfastly refusing to get myself dragged in this time, but
> with this very sensible statement (which I am largely in agreement with), I
> thought I'd throw in one thought, just to stir the pot a little more.
>
>     This is not a new idea, but I think it is the most sensible strategy
> I've heard proposed, and addresses Marin's concerns in a more conventional
> way. What we are talking about here is the statistical noise present in the
> FSC curves themselves. Viewed from the framework of traditional error
> analysis and propagation of uncertainties, which pretty much every
> scientist should be familiar with since high-school, (and thus would not be
> confusing to the non statisticians)  the 'correct' solution to this issue
> is not to adjust the threshold, but to present FSC curves with error bars.
>
>     One can then use a fixed threshold at a level based on expectation
> values, and simply produce a resolution value which also has an associated
> uncertainty. This is much better than using a variable threshold and still
> producing a single number with no uncertainty estimate!  Not only does this
> approach account for the statistical noise in the FSC curve, but it also
> should stop people from reporting resolutions as 2.3397 ?, as it would be
> silly to say 2.3397 +- 0.2.
>
>     The cross terms are not ignored, but are used in the production of the
> error bars. This is a very simple approach, which is certainly closer to
> being correct than the fixed threshold without error-bars approach, and it
> solves many of the issues we have with resolution reporting people do.  Of
> course we still have people who will insist that 3.2+-0.2 is better than
> 3.3+-0.2, but there isn't much you can do about them... (other than beat
> them over the head with a statistics textbook).
>
>     The caveat, of course, is that like all propagation of uncertainty
> that it is a linear approximation, and the correlation axis isn't linear,
> so the typical Normal distributions with linear propagation used to justify
> propagation of uncertainty aren't _strictly_ true. However, the
> approximation is fine as long as the error bars are reasonably small
> compared to the -1 to 1 range of the correlation axis. Each individual
> error bar is computed around its expectation value, so the overall
> nonlinearity of the correlation isn't a concern.
>
>
>
>
> --------------------------------------------------------------------------------------
>     Steven Ludtke, Ph.D. <sludtke at bcm.edu<mailto:sludtke at bcm.edu>>
>               Baylor College of Medicine
>     Charles C. Bell Jr., Professor of Structural Biology
>
>
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>

-- 
Prof. Jose-Maria Carazo
Biocomputing Unit, Head, CNB-CSIC
Spanish National Center for Biotechnology
Darwin 3, Universidad Autonoma de Madrid
28049 Madrid, Spain

Cell: +34639197980
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20200222/6b4fbe38/attachment.html>