[3dem] [ccpem] on FSC curve (A can of worms...)

Sun Aug 30 10:14:41 PDT 2015

Ok, I've tried to avoid this discussion, as it seems like somewhat pointless rehashing of old debates to little real point. However, based on direct emails I've gotten from some people new to the field, it may be causing a lot of confusion and uncertainty among this group. They lack the historical context to understand the point of the debate.  Let me add a couple of minor points to the discussion:

1) Compensating for statistical uncertainty through use of an adjustment to the threshold is confusing to people raised in experimental science. In essence, it is concealing the fact that the FSC values have considerable uncertainty due to counting statistics and other effects. That is, the final resolution plots wind up being the intersection of two lines with no presented uncertainty at all, and we find people looking a specific intersection points between these two lines with ridiculous levels of precision.

A much more sensible way to present this result would be to produce FSC curves with error bars, which do a much better job of expressing the fact that there is considerable uncertainty in the resulting intersection!  The difficulty is how to best produce such error bars.

Once you have an FSC with error bars, you still have the question of a threshold value/curve. I would argue that the error bars subsume the uncertainty, and using Alexis arguments about expectation values, you can then use a fixed value threshold.  I think Alexis arguments are spot-on in this case (FSC relationship to SNR is an expectation value), and Marin's orthogonality argument is fundamentally incorrect. The cross-terms in the presence of noise do have an expectation value of zero, of course!  The cross-terms contribute to the uncertainty in the estimator, not to its asymptotic value.

2) Closely related to point #1 is the issue that our resolution estimates simply are not that precise. They do have considerable uncertainty (which an FSC with error bars would help to express). They also ignore differences in the FSC curve at resolutions lower than the cutoff resolution, which are also significant from the perspective of map interpretation. If I have an FSC curve up close to 1 which smoothly and rapidly falls to zero near some target resolution, the quality of the map is not equivalent to an FSC which begins falling gradually at much lower resolution and undergoes considerable gymnastics before finally falling below the 'threshold' value.

----
Our field takes these resolution numbers MUCH too seriously, and have unwisely turned them into the sole measure of map quality. I do not believe it is possible to make the FSC into a single catch-all measure. 

Following the 'error-bar' approach (if we can agree on one) would properly associate an uncertainty with each measured resolution value, to point out the limits of this estimator in a way that a reviewer from any field could easily encompass. Like the X-ray community, we need to adopt additional criteria rather than continue these pointless debates trying to make the FSC more statistically accurate than it is possible for it to be.

----------------------------------------------------------------------------
Steven Ludtke, Ph.D.
Professor, Dept of Biochemistry and Mol. Biol.         (www.bcm.edu/biochem)
Co-Director National Center For Macromolecular Imaging        (ncmi.bcm.edu)
Co-Director CIBR Center                          (www.bcm.edu/research/cibr)
Baylor College of Medicine                             
sludtke at bcm.edu

> On Aug 30, 2015, at 11:44 AM, Marin van Heel <marin.vanheel at googlemail.com> wrote:
> 
> 
> Hi Pavel,
> 
> You have lost me completely now.
> 
> This thread was very much devoted to Bershad & Rockmore 1974... I don't know what you are talking about.  What do you mean with the "true source of the FSC"? Have you read the whole thread from the beginning? Are you confusing the FSC formula with the Frank & Al-Ali 1975 formula?
> 
> Sorry, you really lost me here...
> 
> Cheers
> 
> Marin
> 
> On 30/08/2015 13:14, Penczek, Pawel A wrote:
>> Hi,
>> 
>> these are indeed interesting days to see scientific argument based on wikipedia note!  I wonder who could have written it,
>> as it conveniently omits the true source of FSC and the one that contains the correct derivation:
>> 
>> Bershad, N. J., and Rockmore, A. J. (1974). On estimating signal-to-noise ratio using the
>> sample correlation coefficient. IEEE Trans. Inf. Theory IT20, 112–113.
>> 
>> The note about N nicely illustrates my point about misleading arguments.  In real world statistical significance of correlation
>> coefficient depends on both number of samples and its value, so the reasoning is plain wrong.
>> 
>> Regards,
>> -
>> Pawel Penczek
>> pawel.a.penczek at uth.tmc.edu
>> 
>> 
>> 
>>> On Aug 30, 2015, at 10:58 AM, Marin van Heel <marin.vanheel at googlemail.com> wrote:
>>> 
>>> 
>>> Hi Pawel,
>>> 
>>> I was waiting for your reaction since you have mentioned more than once over a beer that we were wrong with our 2005 paper... I am glad my shaky understanding of statistics was at least good enough to come up with the FRC/FSC concept in the first place (https://en.wikipedia.org/wiki/Fourier_shell_correlation).  :)
>>>  Now to your only concrete point so far: what size of "N" is big enough?
>>> 
>>> This thread has been going on for a while and you have apparently missed what was discussed earlier, so let me help you out by repeating it below. :)
>>> 
>>> Cheers
>>> Marin
>>> 
>>> ================================================================================
>>> 
>>> The problem with this “N” is that in the context of the FSC, “N” refers rather to the number of complex numbers in a Fourier shell which can be pretty low when close to the origin, say at 5 pixels from the Fourier space origin N~ 125 (4*Pi*R**2; with R=5 à N=~250, divide by 2 because of the Hermitian symmetry). When we are dealing with say an icosahedral structure, there is a 60-fold redundancy within that shell thus N=~125/60 ~= 2; not a very large N at all!  At R=10 … N~8; at R=20… N~32, etc…   (the expected random correlation sigma 1/√N: 1/√2 =1/1.41; 1/√8 =1/2.82 …) Of course, for a C1 structure, say a ribosome, these N values are much higher and hence the relevance thresholds much lower (VH&S05). Everyone who still claims a fixed value threshold for any structure (icosahedral; C1 or with any other pointgroup symmetry) has a real problem counting!  I suggest all referees out here to no longer accept such ignorance!
>>> 
>>> ===============================================================
>>> 
>>> On 30/08/2015 11:16, Penczek, Pawel A wrote:
>>>> Marin,
>>>> 
>>>> your understanding of statistics is shaky.  You confuse expected value with outcome of a single experiment.  This confusion first surfaced in your paper on half-bit criterion.
>>>> 
>>>> Most of your statements below can be quantified and shown to be wrong.
>>>> Other stem from hidden assumptions, which are also incorrect.
>>>> 
>>>> For example, what does it mean N is not large in all cases.  What cases?  How large would satisfy you?
>>>> 
>>>> I am not sure why you picked up this subject and what is your point, but you are not helping out.
>>>> 
>>>> Regards,
>>>> Pawel
>>>> 
>>>> 
>>>>> On Aug 30, 2015, at 9:05 AM, Marin van Heel <marin.vanheel at googlemail.com>
>>>>>  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Alexis,
>>>>> 
>>>>> My final pennies:
>>>>> 
>>>>> (1) We FULLY agree that the B&R74 formula (=F&A75 formula) is wrong when used in connection with a single experiment!
>>>>> 
>>>>> (2) The field view of the image does not change upon binning, hence the SNR obtained from the same field of view increases upon binning. There is no paradox: the SNR just cannot be interpreted as an information metric on the object you are observing, that is all. We choose to FULLY disagree here.
>>>>> 
>>>>> (3) "N is not so large" in ALL cases of interest. The issue was settled more than a decade ago (but largely ignored since). Defining an agreed standard resolution indicator is not a "marginal" issue, it is like defining the "meter" as the unit of length. Of course, even a well defined resolution value can only be a single-valued indicator of what we will be able to interpret in terms of the 3D map details. But still, we need no further "elastic-band" criteria in cryo-EM. For example, when calculating a local resolution map and continuously reducing the comparison area, with the same FSC fixed-threshold value, will necessarily lead to absurd results. It leaves too much room for deliberately "polishing" the quality metrics.
>>>>> 
>>>>> (4) Again, this is a one-off experiment that does not tolerate negative SNR values. It is WRONG to interpret this in terms of an expectation value because that does not reflect the experimental reality.
>>>>> 
>>>>> (5) "Of course, signal and noise are not orthogonal, and the cross-terms are not zero." We FULLY agree that this is what really matters!
>>>>> "It's just that the expectation value of these cross terms is zero."  Well yes, that is true but - again - this expectation value is irrelevant in a one-off experiment! You simply cannot apply that to a single experiment. For example, in VH&S(2005) hundreds of such individual FSC plots are plotted on top of each other to illustrate their expected "one-off" behaviour. One would get close to the expectation value of zero if we would sum hundreds or thousands such FSC plots. But that would not be relevant to any individual FSC or SSNR experiment contained in the set.
>>>>> 
>>>>> (6) Alexis, note that in (B&R74) even their formula 15 estimator is only valid "for large N".
>>>>> 
>>>>> (7) What is relevant is that the cross term between signal and noise cannot be neglected. At the level when we state we have collected enough information to achieve some predetermined resolution threshold, the variance of the signal is typically of the same order as that of the noise... Thus the cross terms between the signal and the noise will be of the same order of magnitude as the also "uncorrelated" noise-to-noise cross terms which "everybody" does include in their calculations. For consistency "everybody" should also use the zero expectation value for the noise-to-noise cross terms. In that case, both sides of the formula go to infinity. Ha Ha!
>>>>> 
>>>>> Let us look at all the extremes of the B&R formula (as any physicist is taught to do at university; I have already hammered on the SNR positivity issue):
>>>>> 
>>>>> A) Signal --> 0 (Noise =/ 0) thus: SNR --> 0 and CCC --> Noise-to-Noise cross term; ergo B&R gives WRONG results in any single experiment with a limited number of sampling points N.
>>>>> It can only be made correct if the number of sampling points goes to infinity (which is not our case) or if the experiment is repeated an infinite number of times (which is not our case). In both cases the B&R formula would become correct (but pointless).
>>>>> 
>>>>> B) Noise --> 0 (Signal =/ 0) thus: SNR --> Infinity  and CCC --> 1  thus CCC/(1-CCC) --> Infinity (here one could define B&R to be correct; but irrelevant)
>>>>> 
>>>>> C) In all other cases the Signal-to-Noise cross terms may not be ignored and the B&R74 formula - which excludes these cross-terms - is wrong.
>>>>> 
>>>>> 
>>>>> To summarise: The B&R74 formula (= the F&A75 formula) is wrong or not relevant. In deriving the formula the all-important cross terms between Signal and Noise have been ignored which are crucial for our one-off FSC & SSNR experiments. A fixed-valued FSC threshold criterion is WRONG and leaves space for deliberate manipulation of the resulting metrics.
>>>>> 
>>>>> May the pennies drop!
>>>>> 
>>>>> Marin
>>>>> 
>>>>> Alexis, thank you for insisting on the nitty-gritty! You forced me to be explicit.
>>>>> Maybe that was just what was needed to clean up this mess in cryo-EM.
>>>>> The bottom line is "Marin van Heel" is right and "everybody else" is wrong! Or did I miss something ...?
>>>>> 
>>>>> 
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Fourier-5Fshell-5Fcorrelation&d=BQICAg&c=6vgNTiRn9_pqCD9hKx9JgXN1VapJQ8JVoF8oWH1AgfQ&r=vDDf9rsFxPMXm8JgJa6hc4B9V4qKr7wftnDkLIRdshI&m=yyiodHzukeKHj63IQQ816ZLMCgxCdxReihbtOUqTM8Q&s=gDuzWw1j1111BJvz1waTjaww_S7XFNipoGRb-ie0U-c&e=
>>>>>  =============================================================
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________ 3dem mailing list 3dem at ncmir.ucsd.edu https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=BQICAg&c=6vgNTiRn9_pqCD9hKx9JgXN1VapJQ8JVoF8oWH1AgfQ&r=vDDf9rsFxPMXm8JgJa6hc4B9V4qKr7wftnDkLIRdshI&m=yyiodHzukeKHj63IQQ816ZLMCgxCdxReihbtOUqTM8Q&s=vsu72az2FrqZLzdI-Oexc2jKHYotTUK4Ye6e2Mbz0KM&e= 
>>>> 
>>>> _______________________________________________
>>>> 3dem mailing list
>>>> 
>>>> 3dem at ncmir.ucsd.edu
>>>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>>> 
>>>> .
>>>> 
>>>> 
> 
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem