[3dem] [ccpem] on FSC curve (A can of worms...)

Marin van Heel marin.vanheel at googlemail.com
Fri Aug 21 15:55:57 PDT 2015


Hi Alexis

What now? My own (ex-) students questioning me? Well I guess I need to 
see that leniently and interpret this as a success: having educated a 
new generation of independent critical scientists… HeHe. So, no hard 
feelings, but you do force me to go into much more detail here than I 
had anticipated, so let’s start:

1)    You correctly discuss and interpret the SNR/CCC issue in the 
context the original work by Bershard & Rockmore (1974) from which Frank 
& Al-Ali (1975) have used the formulas. The detailed considerations of 
B&R(74) have not all survived into the F&A(75) manuscript. I mention 
this explicitly because the actual use of the formula “SNR = 
(CCC/(1-CCC))” in cryo-EM today is as if this were a true formula 
relating the CCC to SNR! All subsequent papers from the Frank group cite 
only the F&A(75) paper and these typically drop the “~” for a “=” and 
apply it to everything effectively without any restriction (as does 
almost everybody else in the cryo-EM field). Also IT DOESN'T help that 
the only reference to the vH&S (2005) paper (in which we criticize the 
F&A(75) paper) by the Frank group is by Liao & Frank (2010) in a review 
about resolution. Our 2005 paper is cited falsely (quoting the opposite 
of what we actually say) and on a subject not related to our criticism!

2)    One more thing Alexis, before I go into the issues you rise. The 
F&A (75) paper is based on the assumption – like do very many papers in 
image processing – that the real-space SNR of an image is a measure of 
its “significant information” content. That assumption is simply wrong. 
A straightforward counter-example: take two 4Kx4K standard cryo-EM 
images of the same vitreous ice sample and calculate/guesstimate their 
SNR(-4096). Then bin the two images to 2Kx2K each by averaging four 
pixels into one and then calculate SNR-2048; do that again to find 
SNR-1024; SNR-512, etc. until you reach SNR-1. Each of these binned 
images obviously looks at the same overall area of the sample, but you 
will see that: …  SNR-128 > SNR-256 > SNR-512 > SNR-1024 > SNR-2048 > 
SNR-4096.  Does that mean the information content of the binned images 
is better than that of un-binned originals?? NO! Because the 
high-frequency (Poisson) noise tends to be damped by the binning 
operation whereas the more low-frequency sample signal is reinforced, 
the SNR of the binned images are better than those of the originals. If 
the image’s real-space SNR would be a real information-content metric, 
we would all first bin down our precious 4Kx4K direct-electron cryo-EM 
images to 512x512 or even 256x256 before starting our real data 
processing. Ergo: the real-space SNR is not an information-content 
metric and the a priori assumption upon which F&A(75) is based is thus 
incorrect. F&A(75) also argue that N, the number of pixels is ~ 10.000 
(in 1975; more like  N ~ 16.000.000 now) and that thus the 
approximations are valid. We will see below that this is an entirely 
inappropriate argument in the context of the FSC.

3)    In the context of the FSC, “N” refers rather to the number of 
complex numbers in a Fourier shell which can be pretty low when close to 
the origin, say at 5 pixels from the Fourier space origin N~ 125 
(4*Pi*R**2; with R=5 --> N=~250, divide that by 2 because of the 
Hermitian symmetry). When we then are also dealing with an icosahedral 
structure, there is a 60-fold redundancy within that shell thus 
N=~125/60 -> N~2; not a very large N at all!  At R=10 … N~8; at R=20… 
N~32, etc…   (the expected random correlation sigma 1/√N: 1/√2 =1/1.41; 
1/√8 =1/2.82 …) Of course, for a C1 structure, these N values are 60 
times higher and hence the relevance thresholds much lower (VH&S05). 
Everyone who claims a fixed value threshold for any structure 
(icosahedral; C1 or with any other pointgroup symmetry) has a real 
problem counting!  I hope all referees out there take good notice and 
think twice before accepting flawed fixed-valued thresholds!

4)    Finally back to your points Alexis and to the original B&R(74) 
work. You mention B&R(74) formula #6. I had to look up the paper again… 
Actually the formula appears twice there with slightly different 
definitions. The first is formula #6 which is the desired one I cited in 
my first email; the second (formula#7) is the real “estimator” you are 
referring to, I believe. I agree with you that putting a CCC= -1 is a 
bit of an extreme example and maybe not in synch with the gist of 
B&R(74). However, it is fully in synch with the interpretation given to 
the formula since F&A(75). You do not discuss my main point here, namely 
that when CCC fluctuates around the zero mark. The estimator (formula#7) 
will then yield negative values ~50% of the time and that may only be 
corrected by repeating the experiment an infinite number of times which 
would then lead to an exact SNR = 0. One can only hope your cryo-EM 
sample will survive the necessary infinite number of 
experiments/exposures! Sorry Alexis, your argumentation that negative 
SNRs should be accepted since they will eventually average out to a “0” 
real SNR value is very farfetched (even though mathematically 
justifiable...). Since FSC experiments are typically one-off experiments 
I’d rather stay with two feet on the ground and not follow you along 
this path. We are here dealing with low “N” values in one-off FSC 
experiments.

5)    Nevertheless, this is not where the real problem is! The real 
problem in terms of the B&R(74) paper surfaces earlier, namely at the 
level of formulas #4 and #5A which are the ones discussed in B&R(74) 
that are closest to the FRC/FSC formulas (VH&S05). These formulas still 
contain the CROSS TERMS between the (constant) signal and the random 
noise (xi and yi each contain both the signal and the noise). In 
formulas #6 (and possibly in #7) these cross terms have vanished.

6)    The basic problem in all these derivations is the inner product of 
between signal and noise: s(t).n(t) (in B&R speak).  This we discussed 
extensively in VH&S05. People state first that the signal and the noise 
are UNCORRELATED and that thus s(t).n(t) = 0! (See the many references 
discussed in VH&S05, including the Rosenthal & Henderson 2003 “0.143” 
FSC paper). A zero inner-product actually means that these two vectors 
are assumed to be ORTHOGONAL, not UNCORRELATED. By the same token, the 
inner product of two realizations of the UNCORRELATED noise vectors 
should also be defined as ORTHOGONAL and s(t).s’(t) should thus also be 
identical to zero! Here suddenly everybody agrees correctly s(t).s’(t) ~ 
1/√N. It doesn’t help that the concepts of “correlated”, “uncorrelated”, 
“independent” have various and often conflicting definitions in the 
statistical literature. So let us rather stick to the clean, unambiguous 
definitions of orthogonality and non-orthogonality. (“Two vectors are 
orthogonal if and only if their inner product is zero.”) Then let us 
henceforward ask ourselves two clean questions: “Is our signal 
orthogonal to each of our noise vectors?” and “Is one realization of the 
noise vector orthogonal to any other noise-vector realization?”

7)    If the answer to either of these questions by any cryo-EM 
scientist is “YES”, I will not accept that person as a friend in 
Facebook! (That was a joke).

8)    Sorry Smith Liu, this may have been far more than you bargained 
for, but the take-home lesson is: fixed-valued FSC thresholds are 
mathematically wrong and must be avoided for the sake of science. It 
confuses newcomers in the field, and the continued use of incorrect 
statistics will continue to damage the reputation of the cryo-EM field.

Hope this helps,

Marin

===========================================================

On 12/08/2015 15:17, Alexis Rohou wrote:
> Hi Marin,
>
> So many tasty worms in there. As you & others already know, I agree 
> with you on the dangers of fixed-threshold criteria.
>
> However, on the topic of the “SNR = (CCC/(1-CCC))” formula, I am not 
> convinced by your argument involving CC=-1.
>
> The reason is that this formula is really an /estimator/ for the true, 
> unknown, SNR. This is explicitly stated by Bershard & Rockmore (1974), 
> whose work Frank & Al-Ali (1975) builds on as well as by Frank & 
> Al-Ali themselves. See in B&R (1974) equation 6, where the 
> left-hand-side is an estimate for SNR (alpha circumflex in their 
> notation) based on the right hand-side, which involves the sample 
> cross-correlation (r in their notation). Or, indeed, see the title: 
> "On estimating signal-to-noise ratio using the sample correlation 
> coefficient ".
>
> To put it bluntly, estimators should be expected to "fail" or "get it 
> wrong" sometimes (i.e. if used after a single, one-off experiment). 
> Thankfully, B&R derived estimates for the variance (error) of their 
> estimator.  Saxton (1978) also derives this, and Pawel Penczek has a 
> nice & detailed review (2010 Methods Enzym) of confidence intervals 
> that can be derived from such estimator variances.
>
> If the sample CC (FSC in a particular shell) comes out as -1, either 
> (1) the fundamental assumption B&R used to derive the estimator, 
> namely that we are measuring two noise-corrupted versions of the same 
> signal, was violated and we shouldn't be using this estimator at all, 
> or (2) the specific occurrences of the noise in our two measurements 
> conspired to give us exactly anti-correlated measurements. If the 
> number of measurements is not tiny, this is incredibly, incredibly 
> unlikely. Therefore, no matter what the SNR estimator says (-0.5 in 
> your example), it's OK that the truth is very different since if we 
> were to repeat the experiment we would almost never, ever get the same 
> (CC=-1) result again.
>
> In fact, according to B&R, if we repeated the experiment an infinite 
> number of times, the average estimate would be exactly correct (if you 
> used their unbiased estimator, but even the one you mention is 
> basically fine). The CC=-1 measurement would just be seen as a freak 
> outlier. The distribution of estimates can be characterized, and this 
> freak measurement would be way out in the tail.
>
> I find no reason (yet?) to believe that B&R's estimator is wrong.
>
> Cheers,
> Alexis
>
>
>
> -- 
> Alexis Rohou
>
> Research Specialist
> Grigorieff Lab
> http://grigoriefflab.janelia.org
> Tel. +1 571 209 4000 x3485
>
>
> On 08/12/2015 08:19 AM, Marin van Heel wrote:
>>
>> Dear Smith Liu,
>>
>> You have hit upon a can of worms here… Although the FRC/FSC metrics 
>> we introduced in 1982/1986 [1, 2] are now considered the "gold 
>> standard" cryo-EM resolution criterion, these resolution issues 
>> continue to be heavily debated [3]. Many FSC add-ons/variants and 
>> tangential issues such as “reference bias” have been inserted into 
>> the resolution criterion discussion. These discussions unfortunately 
>> confuse even established researchers (referees of major journals…), 
>> let alone newcomers to the field. Many believe the resolution issue 
>> is better resolved in X-crystallography. In fact, the FSC is arguably 
>> a better metric than the R-factor, the generally accepted resolution 
>> metric in X-ray crystallography [4]. Fortunately, FRC/FSC criteria 
>> are now slowly also becoming the standard in optical microscopy, 
>> X-ray microscopy, X-ray crystallography, and other fields of 2D/3D 
>> imaging.
>>
>> The most controversial part of the FSC discussion is the FSC 
>> threshold value to serve as a resolution criterion (such as the FSC 
>> 0.5 value you mention). It took more than a decade to remove the 
>> mathematically flawed DPR (Differential Phase Residual) from the 
>> literature, after I explicitly discussed its shortcomings and 
>> proposed a corrected phase residual in 1987 [3]. The discussion in 
>> the field was then deviated towards the FSC threshold at which one 
>> defines the average resolution of a 3D structure. The “0.5” 
>> “criterion” was just postulated ad hoc, without any scientific 
>> justification. Ten years ago, we argued that all fixed-valued FSC 
>> threshold criteria (such as: “0.5” and “0.143”) are based on flawed 
>> statistics [5]. Virtually all more formal justifications for 
>> resolution criteria start off referring to the old formula “SNR = 
>> (CCC/(1-CCC))” by Frank & Al-Ali  1975 [6]. Unfortunately this 
>> formula is also mathematically incorrect as was discussed previously 
>> [5].
>>
>> Here is another very simple argument to illustrate its flawed 
>> definition: the normalised CCC (or FSC) has values in the range:  
>> -1<=CCC<=+1, whereas the SNR (=S2/N2) is, per definition, positive. 
>> Now insert the value CCC= -1, the case of perfectly anti-correlated 
>> data, into the formula. This yields: SNR = “-0.5”, a rampant 
>> violation of the SNR definition range. The formula could be valid for 
>> the limiting case of CCC is close to unity, but such high correlation 
>> values are not relevant in the resolution-threshold context. For 
>> uncorrelated signals/noise the CCC oscillates around the zero mark 
>> and, through the flawed Frank & Al-Ali formula, produces as many 
>> positive as it does erroneous negative SNR values.
>>
>> Unfortunately, virtually all (~100?) papers on resolution criteria 
>> and validation tests in cryo-EM (from friends and foes) are based on 
>> this formula and are thus based on “flawed statistics” to say the 
>> least. With the great recent success of cryo-EM, everybody appears to 
>> have stopped thinking about the basics, and merrily continue to refer 
>> to incorrect stuff while focusing on “my resolution is better than 
>> yours”. After decades of funny jokes and verbal FSC controversies at 
>> GRC meetings, I don’t find it so funny anymore: it is time to clean 
>> up the mess. I have lost the patience to discuss these issues with 
>> referees who continue to consider the subject as debatable. 
>> Questionable actions are sometimes hidden behind this controversy 
>> such as in Mao & Sodrosky [7], who cynically accuse us - their 
>> critics - of not knowing how to interpret the FSC: “FSC estimates of 
>> resolution are known to be quite sensitive to statistical bias …” 
>> etc. etc.  As I said, this whole issue is no longer amusing; it has 
>> become a matter of the debatable scientific culture (integrity?) in 
>> the field of the cryo-EM field.
>>
>> Oh, by the way, Smith Liu, what I really was going to say when I 
>> started typing an answer to your question is that if you are new to 
>> the field it is a good idea to read some basic literature in Fourier 
>> Optics. Maybe my lecture notes can help [8]. The horizontal axis in 
>> the FSC is 1/spatial-frequency (we are in Fourier space) and the FSC 
>> values in the curve indicate the cross-correlation level at that 
>> level of resolution (= inside that specific 3D Fourier shell).
>>
>> Hope this helps,
>>
>> Marin
>>
>> [1] Van Heel M, Keegstra W, Schutter W, van Bruggen EFJ: Arthropod 
>> hemocyanin structures studied by image analysis 
>> http://singleparticles.org/methodology/MvH_FRC_Leeds_1982.pdf
>> [2] Harauz G & van Heel M: Exact filters for general geometry three 
>> dimensional reconstruction, Optik 73 (1986) 146-156
>> [3]Van Heel M: Similarity measures between images. Ultramicroscopy 21 
>> (1987) 95-100.]. [4] Van Heel: Unveiling ribosomal structures: the 
>> final phases. Current Opinions in Structural Biology 10 (2000) 259-264.
>> [5] Van Heel M & Schatz M:  Fourier Shell Correlation Threshold 
>> Criteria, J. Struct. Biol. 151 (2005) 250-262
>> [6] Frank J & Al-Ali L:  Signal-to-noise ratio of electron 
>> micrographs obtained by cross correlation. Nature (1975)
>> [7] Mao Y, Castillo-Menendeza LR, Sodroski JG: Reply to Subramaniam, 
>> van Heel, and Henderson: Validity of the cryo-electron microscopy 
>> structures of the HIV-1 envelope glycoprotein complex. PNAS 2013 
>> www.pnas.org/cgi/doi/10.1073/pnas.1316666110
>> [8] Van Heel:  Principles of Phase Contrast (Electron) Microscopy. 
>> http://www.single-particles.org/methodology/MvH_Phase_Contrast.pdf
>>
>> ===========================================
>>
>>
>>
>> On 08/08/2015 07:45, Smith Liu wrote:
>>> Dear All,
>>>
>>> I know the x-axis of the FSC curve is on the reverse of the 
>>> resolution, and the value in the x-axis corresponding FSC 0.5 is 
>>> usually regarded as the reverse of the resolution of the whole EM map.
>>>
>>> Here I do not know the meaning of the resolution in the X-axis. The 
>>> Whole map has only one resolution corresponding FSC 0.5, then why 
>>> the x-axis is on different resolutions (for example the x-axis is 
>>> from resolution 0 to 20 A, or the reverse of that scope)? Is it 
>>> because different parts of the map have different resolutions 
>>> (caused by different parts of map  have different quality), or it is 
>>> because the X-axis of the FSC curve has some relation with Fourier 
>>> shell? If the X-axis of the FSC is on the property related to 
>>> Fourier shell, then what is in the relation of resolution (or the 
>>> reverse of it) in the x-axis with Fourier shell (in addition, what 
>>> is the Fourier shell)?
>>>
>>> Best regards.
>>>
>>> Smith
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> 3dem mailing list
>> 3dem at ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20150821/9046a134/attachment-0001.html>


More information about the 3dem mailing list