[3dem] 3dem Digest, Vol 150, Issue 49

Carlos Oscar Sorzano coss at cnb.csic.es
Sun Feb 23 00:56:10 PST 2020


I support this idea of computing error bars, and we may think to compute 
"horizontal" error bars, rather than "vertical" ones. That is, the 
(horizontal) error bar of the reported resolution (when it crosses 
several thresholds). In a way, that was explored in Fig. 2 of the 
BlocRes paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3837392/), 
although the representation was not as error bars, there was no multiple 
subsets, and the resolution was measured locally instead lof globally. 
But, what was clear was that the variability at low thresholds and high 
resolution, could be very unstable, which is also our experience.

Kind regards, Carlos Oscar

El 22/02/2020 a las 15:49, Ludtke, Steven J. escribió:
> Hi Jose-Maria,
> I completely agree that the error bars are going to be small in the 
> typical large data set at high resolution that we're dealing with 
> today. That's exactly as it should be.
>
> However, two counterpoints:
> 1) with the rise of in-situ subtomogram averaging, many data sets and 
> volume sizes are going to be smaller again, at least for a while.
>
> 2) even if the error bars are small, they give a basis for producing 
> an uncertainty value on the resolution, which I think is something we 
> really need to help combat the people going to extreme lengths (and 
> screwing up their maps) to get a completely meaningless 0.05 Å 
> resolution 'improvement"
>
>
> --------------------------------------------------------------------------------------
> Steven Ludtke, Ph.D. <sludtke at bcm.edu <mailto:sludtke at bcm.edu>>       
>               Baylor College of Medicine
> Charles C. Bell Jr., Professor of Structural Biology
> Dept. of Biochemistry and Molecular Biology             
>   (www.bcm.edu/biochem <http://www.bcm.edu/biochem>)
> Academic Director, CryoEM Core                     (cryoem.bcm.edu 
> <http://cryoem.bcm.edu>)
> Co-Director CIBR Center         (www.bcm.edu/research/cibr 
> <http://www.bcm.edu/research/cibr>)
>
>
>
>> On Feb 22, 2020, at 3:19 AM, Jose Maria Carazo <carazo at cnb.csic.es 
>> <mailto:carazo at cnb.csic.es>> wrote:
>>
>> ****CAUTION:*** This email is not from a BCM Source. Only click links 
>> or open attachments you know are safe.*
>> ------------------------------------------------------------------------
>> Dear Chuck, so nice to here from you!
>>
>> Yes, the idea of having error bars (to have a minimal understading of 
>> the distribution of FSC's characterizing your map) is very 
>> appealing...... the problem is that every ad hoc attempt we have done 
>> in that respect has given us very low spread for large data sets. I 
>> am sure it can be done better, and that in some cases it may be of 
>> value, but all seemed to be quite a lot of work if you wanted to do 
>> it well, with not so great prospects (on the other hand, now that 
>> algorithms go so fast, perhaps simple resampling strategies would be 
>> worth to be explored to re.check if really the spread is small)
>>
>> Wbw..JM
>>
>> On Fri, Feb 21, 2020 at 8:56 PM Sindelar, Charles 
>> <charles.sindelar at yale.edu <mailto:charles.sindelar at yale.edu>> wrote:
>>
>>     Hear! Hear!  To Alexis and Steve's points- I think these
>>     concisely capture the reason why most of us (myself included) are
>>     content to use the widely accepted 0.143 criterion- bearing in
>>     mind the important caveats that have been raised.
>>
>>     One way to summarize the difference between the "0.143 threshold"
>>     approach and the "half-bit" approach (van Heel & Schatz (2005))
>>     is that they attempt answer different questions-
>>
>>     (1) What is a minimally biased estimate of the resolution. (0.143)
>>
>>     vs
>>
>>     (2)  What is the highest resolution one can confidently claim one
>>     has. (half-bit)
>>
>>     The second approach suffers from the problem that as the noise in
>>     the measurement increasesa resampling approac, there is a
>>     systematic bias to underestimate the resolution (and who wants to
>>     do that??). Plus, I enjoy the whimsy of rallying around a
>>     seemingly arbitrary number ("0.143").
>>
>>     To improve the statistics, one could compute a large number of
>>     gold-standard FSC curves from resampled subsets of the data. But
>>     that hardly seems worth it when the bigger problems often come
>>     from other sources of systematic error, as many have noted. I
>>     like Steve's idea of reporting error bars on the resolution estimate.
>>
>>     - Chuck
>>
>>
>>         Today's Topics:
>>
>>            1. Re: Which resolution? (Alexis Rohou)
>>            2. Re: Which resolution? (Ludtke, Steven J.)
>>
>>
>>     ----------------------------------------------------------------------
>>
>>         Message: 1
>>         Date: Fri, 21 Feb 2020 08:34:45 -0800
>>         From: Alexis Rohou <a.rohou at gmail.com <mailto:a.rohou at gmail.com>>
>>         To: "Penczek, Pawel A" <Pawel.A.Penczek at uth.tmc.edu
>>     <mailto:Pawel.A.Penczek at uth.tmc.edu>>
>>         Cc: "3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>"
>>     <3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>>, Marin van Heel
>>             <marin.vanheel at googlemail.com
>>     <mailto:marin.vanheel at googlemail.com>>, "ccpem at jiscmail.ac.uk
>>     <mailto:ccpem at jiscmail.ac.uk>"
>>             <ccpem at jiscmail.ac.uk <mailto:ccpem at jiscmail.ac.uk>>,
>>     "CCP4BB at JISCMAIL.AC.UK <mailto:CCP4BB at JISCMAIL.AC.UK>"
>>             <CCP4BB at jiscmail.ac.uk <mailto:CCP4BB at jiscmail.ac.uk>>
>>         Subject: Re: [3dem] Which resolution?
>>         Message-ID:
>>            
>>     <CAM5goXS5xK2OoBUSFDQzv7HgnkiGw6nZSU0+A5BJoKVcT20o_A at mail.gmail.com
>>     <mailto:CAM5goXS5xK2OoBUSFDQzv7HgnkiGw6nZSU0%2BA5BJoKVcT20o_A at mail.gmail.com>>
>>         Content-Type: text/plain; charset="utf-8"
>>
>>         Hi all,
>>
>>         For those bewildered by Marin's insistence that everyone's
>>     been messing up
>>         their stats since the bronze age, I'd like to offer what my
>>     understanding
>>         of the situation. More details in this thread from a few
>>     years ago on the
>>         exact same topic:
>>     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003944.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=xmaCSRNPE3u67PDAef4f-xpFdDA-9JanE7OO-BicWjE&s=o3kcrNwzvjqKQXAYVB7CWUcF2U6kQrfnuV6keMXervs&e=>
>>
>>         Notwithstanding notational problems (e.g. strict equations as
>>     opposed to
>>         approximation symbols, or omission of symbols to denote
>>     estimation), I
>>         believe Frank & Al-Ali and "descendent" papers (e.g. appendix
>>     of Rosenthal
>>         & Henderson 2003) are fine. The cross terms that Marin is
>>     agitated about
>>         indeed do in fact have an expectation value of 0.0 (in the
>>     ensemble; if the
>>         experiment were performed an infinite number of times with
>>     different
>>         realizations of noise). I don't believe Pawel or Jose Maria
>>     or any of the
>>         other authors really believe that the cross-terms are orthogonal.
>>
>>         When N (the number of independent Fouier voxels in a shell)
>>     is large
>>         enough, mean(Signal x Noise) ~ 0.0 is only an approximation,
>>     but a pretty
>>         good one, even for a single FSC experiment. This is why, in
>>     my book,
>>         derivations that depend on Frank & Al-Ali are OK, under the
>>     strict
>>         assumption that N is large. Numerically, this becomes
>>     apparent when Marin's
>>         half-bit criterion is plotted - asymptotically it has the
>>     same behavior as
>>         a constant threshold.
>>
>>         So, is Marin wrong to worry about this? No, I don't think so.
>>     There are
>>         indeed cases where the assumption of large N is broken. And
>>     under those
>>         circumstances, any fixed threshold (0.143, 0.5, whatever) is
>>     dangerous.
>>         This is illustrated in figures of van Heel & Schatz (2005).
>>     Small boxes,
>>         high-symmetry, small objects in large boxes, and a number of
>>     other
>>         conditions can make fixed thresholds dangerous.
>>
>>         It would indeed be better to use a non-fixed threshold. So
>>     why am I not
>>         using the 1/2-bit criterion in my own work? While numerically
>>     it behaves
>>         well at most resolution ranges, I was not convinced by
>>     Marin's derivation
>>         in 2005. Philosophically though, I think he's right - we
>>     should aim for FSC
>>         thresholds that are more robust to the kinds of edge cases
>>     mentioned above.
>>         It would be the right thing to do.
>>
>>         Hope this helps,
>>         Alexis
>>
>>
>>         ------------------------------
>>
>>         Message: 2
>>         Date: Fri, 21 Feb 2020 17:19:00 +0000
>>         From: "Ludtke, Steven J." <sludtke at bcm.edu
>>     <mailto:sludtke at bcm.edu>>
>>         To: Alexis Rohou <a.rohou at gmail.com <mailto:a.rohou at gmail.com>>
>>         Cc: "Pawel A. Penczek" <Pawel.A.Penczek at uth.tmc.edu
>>     <mailto:Pawel.A.Penczek at uth.tmc.edu>>, Marin van Heel
>>             <marin.vanheel at googlemail.com
>>     <mailto:marin.vanheel at googlemail.com>>, "CCPEM at JISCMAIL.AC.UK
>>     <mailto:CCPEM at JISCMAIL.AC.UK>"
>>             <ccpem at jiscmail.ac.uk <mailto:ccpem at jiscmail.ac.uk>>,
>>     "3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>"
>>     <3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>>,
>>             "CCP4BB at JISCMAIL.AC.UK <mailto:CCP4BB at JISCMAIL.AC.UK>"
>>     <CCP4BB at jiscmail.ac.uk <mailto:CCP4BB at jiscmail.ac.uk>>
>>         Subject: Re: [3dem] Which resolution?
>>         Message-ID: <84516CBB-AE8E-49B4-A123-7C6724E93CE2 at bcm.edu
>>     <mailto:84516CBB-AE8E-49B4-A123-7C6724E93CE2 at bcm.edu>>
>>         Content-Type: text/plain; charset="utf-8"
>>
>>         I've been steadfastly refusing to get myself dragged in this
>>     time, but with this very sensible statement (which I am largely
>>     in agreement with), I thought I'd throw in one thought, just to
>>     stir the pot a little more.
>>
>>         This is not a new idea, but I think it is the most sensible
>>     strategy I've heard proposed, and addresses Marin's concerns in a
>>     more conventional way. What we are talking about here is the
>>     statistical noise present in the FSC curves themselves. Viewed
>>     from the framework of traditional error analysis and propagation
>>     of uncertainties, which pretty much every scientist should be
>>     familiar with since high-school, (and thus would not be confusing
>>     to the non statisticians)  the 'correct' solution to this issue
>>     is not to adjust the threshold, but to present FSC curves with
>>     error bars.
>>
>>         One can then use a fixed threshold at a level based on
>>     expectation values, and simply produce a resolution value which
>>     also has an associated uncertainty. This is much better than
>>     using a variable threshold and still producing a single number
>>     with no uncertainty estimate!  Not only does this approach
>>     account for the statistical noise in the FSC curve, but it also
>>     should stop people from reporting resolutions as 2.3397 ?, as it
>>     would be silly to say 2.3397 +- 0.2.
>>
>>         The cross terms are not ignored, but are used in the
>>     production of the error bars. This is a very simple approach,
>>     which is certainly closer to being correct than the fixed
>>     threshold without error-bars approach, and it solves many of the
>>     issues we have with resolution reporting people do.  Of course we
>>     still have people who will insist that 3.2+-0.2 is better than
>>     3.3+-0.2, but there isn't much you can do about them... (other
>>     than beat them over the head with a statistics textbook).
>>
>>         The caveat, of course, is that like all propagation of
>>     uncertainty that it is a linear approximation, and the
>>     correlation axis isn't linear, so the typical Normal
>>     distributions with linear propagation used to justify propagation
>>     of uncertainty aren't _strictly_ true. However, the approximation
>>     is fine as long as the error bars are reasonably small compared
>>     to the -1 to 1 range of the correlation axis. Each individual
>>     error bar is computed around its expectation value, so the
>>     overall nonlinearity of the correlation isn't a concern.
>>
>>
>>
>>     --------------------------------------------------------------------------------------
>>         Steven Ludtke, Ph.D. <sludtke at bcm.edu
>>     <mailto:sludtke at bcm.edu><mailto:sludtke at bcm.edu
>>     <mailto:sludtke at bcm.edu>>>                     Baylor College of
>>     Medicine
>>         Charles C. Bell Jr., Professor of Structural Biology
>>
>>
>>
>>     _______________________________________________
>>     3dem mailing list
>>     3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>     https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=xmaCSRNPE3u67PDAef4f-xpFdDA-9JanE7OO-BicWjE&s=3OTi19FGwT17NSo9ukTyFi9XYR3xR4L54AR2cjs2ilI&e=>
>>
>>
>>
>> --
>> Prof. Jose-Maria Carazo
>> Biocomputing Unit, Head, CNB-CSIC
>> Spanish National Center for Biotechnology
>> Darwin 3, Universidad Autonoma de Madrid
>> 28049 Madrid, Spain
>>
>> Cell: +34639197980
>> _______________________________________________
>> 3dem mailing list
>> 3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=xmaCSRNPE3u67PDAef4f-xpFdDA-9JanE7OO-BicWjE&s=3OTi19FGwT17NSo9ukTyFi9XYR3xR4L54AR2cjs2ilI&e=
>
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20200223/0dd58c9d/attachment-0001.html>


More information about the 3dem mailing list