[3dem] Which resolution?

Carlos Oscar Sorzano coss at cnb.csic.es
Fri Feb 21 10:15:29 PST 2020


Dear all,

I always try to refrain myself from getting into these discussions, but 
I cannot resist more the temptation. Here are some more ideas that I 
hope bring more light than confusion:

- There must be some functional relationship between the FSC and the 
SNR, but the exact analytical form of this relationship is unknown (I 
suspect that it must be at least monotonic, the worse the SNR, the worse 
FSC; but even this is difficult to prove). The relationship we normally 
use FSC=SNR/(1+SNR) was derived in a context that does not apply to 
CryoEM (1D stationary signals in real space; our molecules are not 
stationary), and consequently any reasoning of any threshold based on 
this relationship is incorrect (see our review).

- Still, as long as we all use the same threshold, the reported 
resolutions are comparable to each other. In that regard, I am happy 
that we have set 0.143 (although any other number would have served the 
purpose) as the standard.

- I totally agree with Steve that the full FSC is much more informative 
than its crossing with the threshold. Specially, because we should be 
much more worried about its behavior when it has high values than when 
it has low values. Before crossing the threshold it should be as high as 
possible, and that is the "true measure" of goodness of the map. When it 
crosses the threshold of 0.143, it has too low SNR, and by definition, 
that is a very unstable part of the FSC, resulting in relatively 
unstable reports of resolution. We made some tests about the variability 
of the FSC (refining random splits of the dataset), trying to put the 
error bars that Steve was asking for, and it turned out to be pretty 
reproducible (rather low variance except in the region when it crosses 
the threshold) as long as the dataset was large enough (which is the 
current state).

- @Marin, I always suffer with your reference to sloppy statistics. If 
we take your paper of 2005 where the 1/2 bit criterion was proposed 
(https://www.sciencedirect.com/science/article/pii/S1047847705001292), 
Eqs. 4 to 15 have completely ignored the fact that you are dealing with 
Fourier components, that are complex numbers, and consequently you have 
to deal with random variables that have two components, which moreover 
the real and imaginary part are not independent and, in their turn, they 
are not independent of the nearby Fourier coefficients so that for 
computing radial averages you would need to account for the correlation 
among coefficients 
(https://www.aimspress.com/fileOther/PDF/biophysics/20150102.pdf). For 
properly dealing the statistics, at least one needs to carry out a 
two-dimensional reasoning, including the complex conjugate 
multiplication which is all missing in your derivation, rather than 
treating everything as one-dimensional, real valued random variables. 
Additionally, embedded in your whole reasoning is the idea that the 
expected value of a ratio is the ratio of the expected values, that is a 
0-th order Taylor approximation of the mean of the distribution of a 
ratio between two random variables. Finally, I always find an extreme 
difficulty to understand the 1 bit or 1/2 bit criteria, that is, what is 
the relationship between the channel's capacity formula of Shannon 
(https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem) and our 
FSC (we do not have any channel through which we are "transmitting" our 
volume, although it is true we have a model y=x+n that is the same as in 
signal transmission, it is not true that the average information of a 
signal is log2(1+SNR); for me, the only relationship is that the SNR 
appears in both formulas, FSC and channel capacity, but that does not 
automatically make them comparable and interchangeble). This is not a 
criticism on your work. I think the FSC is a very useful tool to measure 
some properties of the reconstruction process and the quality of the 
dataset (not everything is measured by the FSC) and it also has its 
drawbacks (for instance, systematic errors are rewarded by the FSC as 
they are reproducible in both halves). Moreover, I think you are an 
extremely intelligent person, who I consider a good friend, with a very 
good intuition about image processing and who has brought very 
interesting ideas and methodologies into the field. Only that we cannot 
become crazy about the FSC threshold and the reported resolution, as the 
most interesting part of the FSC is not when it is low, but when it is high.

I hope I can keep refraining myself in the future :-)

Cheers, Carlos Oscar

On 2/21/20 6:19 PM, Ludtke, Steven J. wrote:

> I've been steadfastly refusing to get myself dragged in this time, but 
> with this very sensible statement (which I am largely in agreement 
> with), I thought I'd throw in one thought, just to stir the pot a 
> little more.
>
> This is not a new idea, but I think it is the most sensible strategy 
> I've heard proposed, and addresses Marin's concerns in a more 
> conventional way. What we are talking about here is the statistical 
> noise present in the FSC curves themselves. Viewed from the framework 
> of traditional error analysis and propagation of uncertainties, which 
> pretty much every scientist should be familiar with since high-school, 
> (and thus would not be confusing to the non statisticians)  the 
> 'correct' solution to this issue is not to adjust the threshold, but 
> to present FSC curves with error bars.
>
> One can then use a fixed threshold at a level based on expectation 
> values, and simply produce a resolution value which also has an 
> associated uncertainty. This is much better than using a variable 
> threshold and still producing a single number with no uncertainty 
> estimate!  Not only does this approach account for the statistical 
> noise in the FSC curve, but it also should stop people from reporting 
> resolutions as 2.3397 Å, as it would be silly to say 2.3397 +- 0.2.
>
> The cross terms are not ignored, but are used in the production of the 
> error bars. This is a very simple approach, which is certainly closer 
> to being correct than the fixed threshold without error-bars approach, 
> and it solves many of the issues we have with resolution reporting 
> people do.  Of course we still have people who will insist that 
> 3.2+-0.2 is better than 3.3+-0.2, but there isn't much you can do 
> about them... (other than beat them over the head with a statistics 
> textbook).
>
> The caveat, of course, is that like all propagation of uncertainty 
> that it is a linear approximation, and the correlation axis isn't 
> linear, so the typical Normal distributions with linear propagation 
> used to justify propagation of uncertainty aren't _strictly_ true. 
> However, the approximation is fine as long as the error bars are 
> reasonably small compared to the -1 to 1 range of the correlation 
> axis. Each individual error bar is computed around its expectation 
> value, so the overall nonlinearity of the correlation isn't a concern.
>
>
>
> --------------------------------------------------------------------------------------
> Steven Ludtke, Ph.D. <sludtke at bcm.edu <mailto:sludtke at bcm.edu>>       
>               Baylor College of Medicine
> Charles C. Bell Jr., Professor of Structural Biology
> Dept. of Biochemistry and Molecular Biology                 
>   (www.bcm.edu/biochem <http://www.bcm.edu/biochem>)
> Academic Director, CryoEM Core                         (cryoem.bcm.edu 
> <http://cryoem.bcm.edu>)
> Co-Director CIBR Center             (www.bcm.edu/research/cibr 
> <http://www.bcm.edu/research/cibr>)
>
>
>
>> On Feb 21, 2020, at 10:34 AM, Alexis Rohou <a.rohou at gmail.com 
>> <mailto:a.rohou at gmail.com>> wrote:
>>
>> ****CAUTION:*** This email is not from a BCM Source. Only click links 
>> or open attachments you know are safe.*
>> ------------------------------------------------------------------------
>> Hi all,
>>
>> For those bewildered by Marin's insistence that everyone's been 
>> messing up their stats since the bronze age, I'd like to offer what 
>> my understanding of the situation. More details in this thread from a 
>> few years ago on the exact same topic:
>> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003939.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=CZ3YcAV1LVKXsLT0KjCIRby6j3XPA6GqZcOVP3nMyK0&e=>
>> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003944.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=oG6lGnei74jC5VVGsfFAdiTpIxrZhs_IH2mH0re5QRM&e=>
>>
>> Notwithstanding notational problems (e.g. strict equations as opposed 
>> to approximation symbols, or omission of symbols to denote 
>> estimation), I believe Frank & Al-Ali and "descendent" papers (e.g. 
>> appendix of Rosenthal & Henderson 2003) are fine. The cross terms 
>> that Marin is agitated about indeed do in fact have an expectation 
>> value of 0.0 (in the ensemble; if the experiment were performed an 
>> infinite number of times with different realizations of noise). I 
>> don't believe Pawel or Jose Maria or any of the other authors really 
>> believe that the cross-terms are orthogonal.
>>
>> When N (the number of independent Fouier voxels in a shell) is large 
>> enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a 
>> pretty good one, even for a single FSC experiment. This is why, in my 
>> book, derivations that depend on Frank & Al-Ali are OK, under the 
>> strict assumption that N is large. Numerically, this becomes apparent 
>> when Marin's half-bit criterion is plotted - asymptotically it has 
>> the same behavior as a constant threshold.
>>
>> So, is Marin wrong to worry about this? No, I don't think so. There 
>> are indeed cases where the assumption of large N is broken. And under 
>> those circumstances, any fixed threshold (0.143, 0.5, whatever) is 
>> dangerous. This is illustrated in figures of van Heel & Schatz 
>> (2005). Small boxes, high-symmetry, small objects in large boxes, and 
>> a number of other conditions can make fixed thresholds dangerous.
>>
>> It would indeed be better to use a non-fixed threshold. So why am I 
>> not using the 1/2-bit criterion in my own work? While numerically it 
>> behaves well at most resolution ranges, I was not convinced by 
>> Marin's derivation in 2005. Philosophically though, I think he's 
>> right - we should aim for FSC thresholds that are more robust to the 
>> kinds of edge cases mentioned above. It would be the right thing to do.
>>
>> Hope this helps,
>> Alexis
>>
>>
>>
>> On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A 
>> <Pawel.A.Penczek at uth.tmc.edu <mailto:Pawel.A.Penczek at uth.tmc.edu>> wrote:
>>
>>     Marin,
>>
>>     The statistics in 2010 review is fine. You may disagree with
>>     assumptions, but I can assure you the “statistics” (as you call
>>     it) is fine. Careful reading of the paper would reveal to you
>>     this much.
>>
>>     Regards,
>>     Pawel
>>
>>>     On Feb 16, 2020, at 10:38 AM, Marin van Heel
>>>     <marin.vanheel at googlemail.com
>>>     <mailto:marin.vanheel at googlemail.com>> wrote:
>>>
>>>     
>>>
>>>     ***** EXTERNAL EMAIL *****
>>>
>>>     Dear Pawel and All others ....
>>>
>>>     This 2010 review is - unfortunately - largely based on the
>>>     flawed statistics I mentioned before, namely on the a priori
>>>     assumption that the inner product of a signal vector and a noise
>>>     vector are ZERO (an orthogonality assumption).  The (Frank &
>>>     Al-Ali 1975) paper we have refuted on a number of occasions (for
>>>     example in 2005, and most recently in our BioRxiv paper) but you
>>>     still take that as the correct relation between SNR and FRC (and
>>>     you never cite the criticism...).
>>>
>>>     Sorry
>>>     Marin
>>>
>>>     On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A
>>>     <Pawel.A.Penczek at uth.tmc.edu
>>>     <mailto:Pawel.A.Penczek at uth.tmc.edu>> wrote:
>>>
>>>         Dear Teige,
>>>
>>>         I am wondering whether you are familiar with
>>>
>>>
>>>
>>>             Resolution measures in molecular electron microscopy.
>>>
>>>         Penczek PA. Methods Enzymol. 2010.
>>>
>>>
>>>               Citation
>>>
>>>         Methods Enzymol. 2010;482:73-100. doi:
>>>         10.1016/S0076-6879(10)82003-8.
>>>
>>>
>>>         You will find there answers to all questions you asked and
>>>         much more.
>>>
>>>         Regards,
>>>         Pawel Penczek
>>>
>>>
>>>         Regards,
>>>         Pawel
>>>         _______________________________________________
>>>         3dem mailing list
>>>         3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>>         https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
>>>
>>     _______________________________________________
>>     3dem mailing list
>>     3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>>     https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=>
>>
>> _______________________________________________
>> 3dem mailing list
>> 3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=
>
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20200221/3fa339a2/attachment-0001.html>


More information about the 3dem mailing list