<div dir="ltr">Dear Chuck, so nice to here from you!<div><br></div><div>Yes, the idea of having error bars (to have a minimal understading of the distribution of FSC's characterizing your map) is very appealing...... the problem is that every ad hoc attempt we have done in that respect has given us very low spread for large data sets. I am sure it can be done better, and that in some cases it may be of value, but all seemed to be quite a lot of work if you wanted to do it well, with not so great prospects (on the other hand, now that algorithms go so fast, perhaps simple resampling strategies would be worth to be explored to re.check if really the spread is small)</div><div><br></div><div>Wbw..JM</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 21, 2020 at 8:56 PM Sindelar, Charles <<a href="mailto:charles.sindelar@yale.edu">charles.sindelar@yale.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hear! Hear!  To Alexis and Steve's points- I think these concisely capture the reason why most of us (myself included) are content to use the widely accepted 0.143 criterion- bearing in mind the important caveats that have been raised.<br>

<br>

One way to summarize the difference between the "0.143 threshold" approach and the "half-bit" approach (van Heel & Schatz (2005)) is that they attempt answer different questions- <br>

<br>

(1) What is a minimally biased estimate of the resolution. (0.143)<br>

<br>

vs<br>

<br>

(2)  What is the highest resolution one can confidently claim one has. (half-bit)<br>

<br>

The second approach suffers from the problem that as the noise in the measurement increasesa resampling approac, there is a systematic bias to underestimate the resolution (and who wants to do that??). Plus, I enjoy the whimsy of rallying around a seemingly arbitrary number ("0.143").<br>

<br>

To improve the statistics, one could compute a large number of gold-standard FSC curves from resampled subsets of the data. But that hardly seems worth it when the bigger problems often come from other sources of systematic error, as many have noted. I like Steve's idea of reporting error bars on the resolution estimate.<br>

<br>

- Chuck<br>

<br>

<br>

    Today's Topics:<br>

<br>

       1. Re: Which resolution? (Alexis Rohou)<br>

       2. Re: Which resolution? (Ludtke, Steven J.)<br>

<br>

<br>

    ----------------------------------------------------------------------<br>

<br>

    Message: 1<br>

    Date: Fri, 21 Feb 2020 08:34:45 -0800<br>

    From: Alexis Rohou <<a href="mailto:a.rohou@gmail.com" target="_blank">a.rohou@gmail.com</a>><br>

    To: "Penczek, Pawel A" <<a href="mailto:Pawel.A.Penczek@uth.tmc.edu" target="_blank">Pawel.A.Penczek@uth.tmc.edu</a>><br>

    Cc: "<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a>" <<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a>>, Marin van Heel<br>

        <<a href="mailto:marin.vanheel@googlemail.com" target="_blank">marin.vanheel@googlemail.com</a>>, "<a href="mailto:ccpem@jiscmail.ac.uk" target="_blank">ccpem@jiscmail.ac.uk</a>"<br>

        <<a href="mailto:ccpem@jiscmail.ac.uk" target="_blank">ccpem@jiscmail.ac.uk</a>>, "<a href="mailto:CCP4BB@JISCMAIL.AC.UK" target="_blank">CCP4BB@JISCMAIL.AC.UK</a>"<br>

        <<a href="mailto:CCP4BB@jiscmail.ac.uk" target="_blank">CCP4BB@jiscmail.ac.uk</a>><br>

    Subject: Re: [3dem] Which resolution?<br>

    Message-ID:<br>

        <<a href="mailto:CAM5goXS5xK2OoBUSFDQzv7HgnkiGw6nZSU0%2BA5BJoKVcT20o_A@mail.gmail.com" target="_blank">CAM5goXS5xK2OoBUSFDQzv7HgnkiGw6nZSU0+A5BJoKVcT20o_A@mail.gmail.com</a>><br>

    Content-Type: text/plain; charset="utf-8"<br>

<br>

    Hi all,<br>

<br>

    For those bewildered by Marin's insistence that everyone's been messing up<br>

    their stats since the bronze age, I'd like to offer what my understanding<br>

    of the situation. More details in this thread from a few years ago on the<br>

    exact same topic:<br>

<a href="https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html" rel="noreferrer" target="_blank">https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html</a><br>

<br>

    Notwithstanding notational problems (e.g. strict equations as opposed to<br>

    approximation symbols, or omission of symbols to denote estimation), I<br>

    believe Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal<br>

    & Henderson 2003) are fine. The cross terms that Marin is agitated about<br>

    indeed do in fact have an expectation value of 0.0 (in the ensemble; if the<br>

    experiment were performed an infinite number of times with different<br>

    realizations of noise). I don't believe Pawel or Jose Maria or any of the<br>

    other authors really believe that the cross-terms are orthogonal.<br>

<br>

    When N (the number of independent Fouier voxels in a shell) is large<br>

    enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty<br>

    good one, even for a single FSC experiment. This is why, in my book,<br>

    derivations that depend on Frank & Al-Ali are OK, under the strict<br>

    assumption that N is large. Numerically, this becomes apparent when Marin's<br>

    half-bit criterion is plotted - asymptotically it has the same behavior as<br>

    a constant threshold.<br>

<br>

    So, is Marin wrong to worry about this? No, I don't think so. There are<br>

    indeed cases where the assumption of large N is broken. And under those<br>

    circumstances, any fixed threshold (0.143, 0.5, whatever) is dangerous.<br>

    This is illustrated in figures of van Heel & Schatz (2005). Small boxes,<br>

    high-symmetry, small objects in large boxes, and a number of other<br>

    conditions can make fixed thresholds dangerous.<br>

<br>

    It would indeed be better to use a non-fixed threshold. So why am I not<br>

    using the 1/2-bit criterion in my own work? While numerically it behaves<br>

    well at most resolution ranges, I was not convinced by Marin's derivation<br>

    in 2005. Philosophically though, I think he's right - we should aim for FSC<br>

    thresholds that are more robust to the kinds of edge cases mentioned above.<br>

    It would be the right thing to do.<br>

<br>

    Hope this helps,<br>

    Alexis<br>

<br>

<br>

    ------------------------------<br>

<br>

    Message: 2<br>

    Date: Fri, 21 Feb 2020 17:19:00 +0000<br>

    From: "Ludtke, Steven J." <<a href="mailto:sludtke@bcm.edu" target="_blank">sludtke@bcm.edu</a>><br>

    To: Alexis Rohou <<a href="mailto:a.rohou@gmail.com" target="_blank">a.rohou@gmail.com</a>><br>

    Cc: "Pawel A. Penczek" <<a href="mailto:Pawel.A.Penczek@uth.tmc.edu" target="_blank">Pawel.A.Penczek@uth.tmc.edu</a>>, Marin van Heel<br>

        <<a href="mailto:marin.vanheel@googlemail.com" target="_blank">marin.vanheel@googlemail.com</a>>, "<a href="mailto:CCPEM@JISCMAIL.AC.UK" target="_blank">CCPEM@JISCMAIL.AC.UK</a>"<br>

        <<a href="mailto:ccpem@jiscmail.ac.uk" target="_blank">ccpem@jiscmail.ac.uk</a>>, "<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a>" <<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a>>,<br>

        "<a href="mailto:CCP4BB@JISCMAIL.AC.UK" target="_blank">CCP4BB@JISCMAIL.AC.UK</a>" <<a href="mailto:CCP4BB@jiscmail.ac.uk" target="_blank">CCP4BB@jiscmail.ac.uk</a>><br>

    Subject: Re: [3dem] Which resolution?<br>

    Message-ID: <<a href="mailto:84516CBB-AE8E-49B4-A123-7C6724E93CE2@bcm.edu" target="_blank">84516CBB-AE8E-49B4-A123-7C6724E93CE2@bcm.edu</a>><br>

    Content-Type: text/plain; charset="utf-8"<br>

<br>

    I've been steadfastly refusing to get myself dragged in this time, but with this very sensible statement (which I am largely in agreement with), I thought I'd throw in one thought, just to stir the pot a little more.<br>

<br>

    This is not a new idea, but I think it is the most sensible strategy I've heard proposed, and addresses Marin's concerns in a more conventional way. What we are talking about here is the statistical noise present in the FSC curves themselves. Viewed from the framework of traditional error analysis and propagation of uncertainties, which pretty much every scientist should be familiar with since high-school, (and thus would not be confusing to the non statisticians)  the 'correct' solution to this issue is not to adjust the threshold, but to present FSC curves with error bars.<br>

<br>

    One can then use a fixed threshold at a level based on expectation values, and simply produce a resolution value which also has an associated uncertainty. This is much better than using a variable threshold and still producing a single number with no uncertainty estimate!  Not only does this approach account for the statistical noise in the FSC curve, but it also should stop people from reporting resolutions as 2.3397 ?, as it would be silly to say 2.3397 +- 0.2.<br>

<br>

    The cross terms are not ignored, but are used in the production of the error bars. This is a very simple approach, which is certainly closer to being correct than the fixed threshold without error-bars approach, and it solves many of the issues we have with resolution reporting people do.  Of course we still have people who will insist that 3.2+-0.2 is better than 3.3+-0.2, but there isn't much you can do about them... (other than beat them over the head with a statistics textbook).<br>

<br>

    The caveat, of course, is that like all propagation of uncertainty that it is a linear approximation, and the correlation axis isn't linear, so the typical Normal distributions with linear propagation used to justify propagation of uncertainty aren't _strictly_ true. However, the approximation is fine as long as the error bars are reasonably small compared to the -1 to 1 range of the correlation axis. Each individual error bar is computed around its expectation value, so the overall nonlinearity of the correlation isn't a concern.<br>

<br>

<br>

<br>

    --------------------------------------------------------------------------------------<br>

    Steven Ludtke, Ph.D. <<a href="mailto:sludtke@bcm.edu" target="_blank">sludtke@bcm.edu</a><mailto:<a href="mailto:sludtke@bcm.edu" target="_blank">sludtke@bcm.edu</a>>>                      Baylor College of Medicine<br>

    Charles C. Bell Jr., Professor of Structural Biology<br>

<br>

<br>

<br>

_______________________________________________<br>

3dem mailing list<br>

<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a><br>

<a href="https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem" rel="noreferrer" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Prof. Jose-Maria Carazo<br>Biocomputing Unit, Head, CNB-CSIC<br>Spanish National Center for Biotechnology</div><div>Darwin 3, Universidad Autonoma de Madrid</div><div>28049 Madrid, Spain</div><div><p style="margin:0cm 0cm 0.0001pt;font-size:12pt;font-family:"Times New Roman",serif"><br></p>Cell: +34639197980<br></div></div></div>