[3dem] Which resolution?

Sun Feb 23 03:15:34 PST 2020

Hi Carlos Oscar and Jose-Maria,

I choose to answer you guys first, because it will take little of my time
to counter your criticism and because I have long since been less than
amused by your published, ill-conceived criticism:

“*Marin, I always suffer with your reference to sloppy statistics. If we
take your paper of 2005 where the 1/2 bit criterion was proposed, Eqs. 4 to
15 have completely ignored the fact that you are dealing with Fourier
components, that are complex numbers, and consequently you have to deal
with random variables that have TWO components, which moreover the real and
imaginary part are not independent and, in their turn, they are not
independent of the nearby Fourier coefficients so that for computing radial
averages you would need to account for the correlation among coefficients*”

I had seen this argumentation against our (2005) paper in your
manuscript/paper years back. I was so stunned by the level of
misunderstanding expressed in your manuscript that I chose not to spend any
time reacting to those statements. Now that you choose to so openly display
your thoughts on the matter, I have no other choice than to spell out your
errors in public.

All complex arrays in our 2005 paper are Hermitian (since they are the FTs
of real data), and so are all their inner products. In all the integrals
over rings one always averages a complex Fourier-space voxel with its
Hermitian conjugate yielding *ONE* real value (times two)!  Without that
Hermitian property, FRCs and FSCs, which are real normalised correlation
functions would not even have been possible. I was - and still am - stunned
by this level of misunderstanding!

This is a blatant blunder that you are propagating over years, a blunder
that does not do any good to your reputation, yet also a blunder that has
probably damaged to our research income. The fact that you can divulgate
such rubbish and leave it out there for years for referees to read (who are
possibly not as well educated in physics and mathematics) will do – and may
already have done – damage to our research.  An apology is appropriate but
an apology is not enough.

Maybe you should ask your granting agencies how to transfer 25% of your
grant income to our research, in compensation of damages created by your
blunder!

Success with your request!

Marin

PS. You have also missed that our 2005 paper explicitly includes the
influence of the size of the object within the sampling box (your: “*they
are not independent of the nearby Fourier coefficients*”). I remain
flabbergasted.

On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano <coss at cnb.csic.es>
wrote:

> Dear all,
>
> I always try to refrain myself from getting into these discussions, but I
> cannot resist more the temptation. Here are some more ideas that I hope
> bring more light than confusion:
>
> - There must be some functional relationship between the FSC and the SNR,
> but the exact analytical form of this relationship is unknown (I suspect
> that it must be at least monotonic, the worse the SNR, the worse FSC; but
> even this is difficult to prove). The relationship we normally use
> FSC=SNR/(1+SNR) was derived in a context that does not apply to CryoEM (1D
> stationary signals in real space; our molecules are not stationary), and
> consequently any reasoning of any threshold based on this relationship is
> incorrect (see our review).
>
> - Still, as long as we all use the same threshold, the reported
> resolutions are comparable to each other. In that regard, I am happy that
> we have set 0.143 (although any other number would have served the purpose)
> as the standard.
>
> - I totally agree with Steve that the full FSC is much more informative
> than its crossing with the threshold. Specially, because we should be much
> more worried about its behavior when it has high values than when it has
> low values. Before crossing the threshold it should be as high as possible,
> and that is the "true measure" of goodness of the map. When it crosses the
> threshold of 0.143, it has too low SNR, and by definition, that is a very
> unstable part of the FSC, resulting in relatively unstable reports of
> resolution. We made some tests about the variability of the FSC (refining
> random splits of the dataset), trying to put the error bars that Steve was
> asking for, and it turned out to be pretty reproducible (rather low
> variance except in the region when it crosses the threshold) as long as the
> dataset was large enough (which is the current state).
>
> - @Marin, I always suffer with your reference to sloppy statistics. If we
> take your paper of 2005 where the 1/2 bit criterion was proposed (
> https://www.sciencedirect.com/science/article/pii/S1047847705001292),
> Eqs. 4 to 15 have completely ignored the fact that you are dealing with
> Fourier components, that are complex numbers, and consequently you have to
> deal with random variables that have two components, which moreover the
> real and imaginary part are not independent and, in their turn, they are
> not independent of the nearby Fourier coefficients so that for computing
> radial averages you would need to account for the correlation among
> coefficients (
> https://www.aimspress.com/fileOther/PDF/biophysics/20150102.pdf). For
> properly dealing the statistics, at least one needs to carry out a
> two-dimensional reasoning, including the complex conjugate multiplication
> which is all missing in your derivation, rather than treating everything as
> one-dimensional, real valued random variables. Additionally, embedded in
> your whole reasoning is the idea that the expected value of a ratio is the
> ratio of the expected values, that is a 0-th order Taylor approximation of
> the mean of the distribution of a ratio between two random variables.
> Finally, I always find an extreme difficulty to understand the 1 bit or 1/2
> bit criteria, that is, what is the relationship between the channel's
> capacity formula of Shannon (
> https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem) and our
> FSC (we do not have any channel through which we are "transmitting" our
> volume, although it is true we have a model y=x+n that is the same as in
> signal transmission, it is not true that the average information of a
> signal is log2(1+SNR); for me, the only relationship is that the SNR
> appears in both formulas, FSC and channel capacity, but that does not
> automatically make them comparable and interchangeble). This is not a
> criticism on your work. I think the FSC is a very useful tool to measure
> some properties of the reconstruction process and the quality of the
> dataset (not everything is measured by the FSC) and it also has its
> drawbacks (for instance, systematic errors are rewarded by the FSC as they
> are reproducible in both halves). Moreover, I think you are an extremely
> intelligent person, who I consider a good friend, with a very good
> intuition about image processing and who has brought very interesting ideas
> and methodologies into the field. Only that we cannot become crazy about
> the FSC threshold and the reported resolution, as the most interesting part
> of the FSC is not when it is low, but when it is high.
>
> I hope I can keep refraining myself in the future :-)
>
> Cheers, Carlos Oscar
>
> On 2/21/20 6:19 PM, Ludtke, Steven J. wrote:
>
> I've been steadfastly refusing to get myself dragged in this time, but
> with this very sensible statement (which I am largely in agreement with), I
> thought I'd throw in one thought, just to stir the pot a little more.
>
> This is not a new idea, but I think it is the most sensible strategy I've
> heard proposed, and addresses Marin's concerns in a more conventional way.
> What we are talking about here is the statistical noise present in the FSC
> curves themselves. Viewed from the framework of traditional error analysis
> and propagation of uncertainties, which pretty much every scientist should
> be familiar with since high-school, (and thus would not be confusing to the
> non statisticians)  the 'correct' solution to this issue is not to adjust
> the threshold, but to present FSC curves with error bars.
>
> One can then use a fixed threshold at a level based on expectation values,
> and simply produce a resolution value which also has an associated
> uncertainty. This is much better than using a variable threshold and still
> producing a single number with no uncertainty estimate!  Not only does this
> approach account for the statistical noise in the FSC curve, but it also
> should stop people from reporting resolutions as 2.3397 Å, as it would be
> silly to say 2.3397 +- 0.2.
>
> The cross terms are not ignored, but are used in the production of the
> error bars. This is a very simple approach, which is certainly closer to
> being correct than the fixed threshold without error-bars approach, and it
> solves many of the issues we have with resolution reporting people do.  Of
> course we still have people who will insist that 3.2+-0.2 is better than
> 3.3+-0.2, but there isn't much you can do about them... (other than beat
> them over the head with a statistics textbook).
>
> The caveat, of course, is that like all propagation of uncertainty that it
> is a linear approximation, and the correlation axis isn't linear, so the
> typical Normal distributions with linear propagation used to justify
> propagation of uncertainty aren't _strictly_ true. However, the
> approximation is fine as long as the error bars are reasonably small
> compared to the -1 to 1 range of the correlation axis. Each individual
> error bar is computed around its expectation value, so the overall
> nonlinearity of the correlation isn't a concern.
>
>
>
>
> --------------------------------------------------------------------------------------
> Steven Ludtke, Ph.D. <sludtke at bcm.edu>                      Baylor
> College of Medicine
> Charles C. Bell Jr., Professor of Structural Biology
> Dept. of Biochemistry and Molecular Biology                      (
> www.bcm.edu/biochem)
> Academic Director, CryoEM Core                                        (
> cryoem.bcm.edu)
> Co-Director CIBR Center                                    (
> www.bcm.edu/research/cibr)
>
>
>
> On Feb 21, 2020, at 10:34 AM, Alexis Rohou <a.rohou at gmail.com> wrote:
>
> ****CAUTION:*** This email is not from a BCM Source. Only click links or
> open attachments you know are safe.*
> ------------------------------
> Hi all,
>
> For those bewildered by Marin's insistence that everyone's been messing up
> their stats since the bronze age, I'd like to offer what my understanding
> of the situation. More details in this thread from a few years ago on the
> exact same topic:
> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003939.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=CZ3YcAV1LVKXsLT0KjCIRby6j3XPA6GqZcOVP3nMyK0&e=>
> https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003944.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=oG6lGnei74jC5VVGsfFAdiTpIxrZhs_IH2mH0re5QRM&e=>
>
> Notwithstanding notational problems (e.g. strict equations as opposed to
> approximation symbols, or omission of symbols to denote estimation), I
> believe Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal
> & Henderson 2003) are fine. The cross terms that Marin is agitated about
> indeed do in fact have an expectation value of 0.0 (in the ensemble; if the
> experiment were performed an infinite number of times with different
> realizations of noise). I don't believe Pawel or Jose Maria or any of the
> other authors really believe that the cross-terms are orthogonal.
>
> When N (the number of independent Fouier voxels in a shell) is large
> enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty
> good one, even for a single FSC experiment. This is why, in my book,
> derivations that depend on Frank & Al-Ali are OK, under the strict
> assumption that N is large. Numerically, this becomes apparent when Marin's
> half-bit criterion is plotted - asymptotically it has the same behavior as
> a constant threshold.
>
> So, is Marin wrong to worry about this? No, I don't think so. There are
> indeed cases where the assumption of large N is broken. And under those
> circumstances, any fixed threshold (0.143, 0.5, whatever) is dangerous.
> This is illustrated in figures of van Heel & Schatz (2005). Small boxes,
> high-symmetry, small objects in large boxes, and a number of other
> conditions can make fixed thresholds dangerous.
>
> It would indeed be better to use a non-fixed threshold. So why am I not
> using the 1/2-bit criterion in my own work? While numerically it behaves
> well at most resolution ranges, I was not convinced by Marin's derivation
> in 2005. Philosophically though, I think he's right - we should aim for FSC
> thresholds that are more robust to the kinds of edge cases mentioned above.
> It would be the right thing to do.
>
> Hope this helps,
> Alexis
>
>
>
> On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A <
> Pawel.A.Penczek at uth.tmc.edu> wrote:
>
>> Marin,
>>
>> The statistics in 2010 review is fine. You may disagree with assumptions,
>> but I can assure you the “statistics” (as you call it) is fine. Careful
>> reading of the paper would reveal to you this much.
>>
>> Regards,
>> Pawel
>>
>> On Feb 16, 2020, at 10:38 AM, Marin van Heel <
>> marin.vanheel at googlemail.com> wrote:
>>
>> 
>>
>> ***** EXTERNAL EMAIL *****
>> Dear Pawel and All others ....
>>
>> This 2010 review is - unfortunately - largely based on the flawed
>> statistics I mentioned before, namely on the a priori assumption that the
>> inner product of a signal vector and a noise vector are ZERO (an
>> orthogonality assumption).  The (Frank & Al-Ali 1975) paper we have refuted
>> on a number of occasions (for example in 2005, and most recently in our
>> BioRxiv paper) but you still take that as the correct relation between SNR
>> and FRC (and you never cite the criticism...).
>> Sorry
>> Marin
>>
>> On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
>> Pawel.A.Penczek at uth.tmc.edu> wrote:
>>
>>> Dear Teige,
>>>
>>> I am wondering whether you are familiar with
>>>
>>> Resolution measures in molecular electron microscopy.
>>> Penczek PA. Methods Enzymol. 2010.
>>> Citation
>>>
>>> Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
>>>
>>> You will find there answers to all questions you asked and much more.
>>>
>>> Regards,
>>> Pawel Penczek
>>>
>>>
>>> Regards,
>>> Pawel
>>> _______________________________________________
>>> 3dem mailing list
>>> 3dem at ncmir.ucsd.edu
>>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
>>>
>> _______________________________________________
>> 3dem mailing list
>> 3dem at ncmir.ucsd.edu
>> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=>
>>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=
>
>
>
> _______________________________________________
> 3dem mailing list3dem at ncmir.ucsd.eduhttps://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20200223/d243a09d/attachment-0001.html>