[3dem] B-factors

Alexis Rohou a.rohou at gmail.com
Thu Feb 1 11:16:41 PST 2024


Dear Scott, Henning, colleagues,

I now think that using 1/d instead of 1/d^2 is not only suboptimal from a
theoretical point of view but also may lead to erroneous estimates of
resolution, number of particles, and/or B factors.

Unless someone can show me the errors of my ways, I would recommend that
no-one use linear fits to ResLog plots where the resolution axis is in
reciprocal Å anymore, and that cryoSPARC (and other packages that use 1/d
plots for this) be modified to plot in reciprocal squared Å instead.

Below is an explanation of how I have reached this conclusion. If I missed
something or made an error somewhere, I'd appreciate a correction!

Scott and others correctly point out that many ResLog plots with a 1/d axis
(appear to) show linear relationships between ln(N) and 1/d. How could this
be?

The answer I believe is that the vast majority of ResLog plots do not cover
a wide enough range of resolutions.

Let's consider the theoretical predictions from Rosenthal & Henderson
(2003), which Henning mentioned in his original post. Below is a plot of
the 1/d or 1/d^2 resolution as a function of ln(N) (the natural log of the
number of particles) using this theoretical prediction, with resolutions
ranging from 8 Å (~1100 particles) to 1.5 Å (~50 million particles; for
this simulation, I used a B factor of 50 Å^2). The blue dots are using 1/d
on the Y axis, and the red dots are using 1/d^2:
[image: image.png]

When plotted like that, it is obvious that the linear fit is not a good
model for 1/d v ln(N) (compare the blue line and the blue dots). But notice
that the deviations from linearity are not that great in the blue series -
in the presence of experimental noise you would be forgiven for missing it.

Now, let's look at the exact same plots from the theoretical prediction,
but pretending we only reached 3 Å resolution and that the resolution
estimates at 8 Å and below are either not available or unreliable (due to
effects of the molecular envelope or secondary structures, say):
[image: image.png]
Even when considering only these theoretical predictions, a ResLog plot
using 1/d looks very linear indeed in the 8 Å - 3 Å range! In other words,
the exponential damping due to the B factor is only noticeable when
considering wide resolution ranges, wider than those covered by most ResLog
plots.

In recent days, I have inspected ResLog plots from several of our in-house
projects with ongoing refinements and the large majority of them do not
cover a wide resolution range (because those projects are stuck at say 3 Å,
and/or because the low-resolution points on the ResLog plots do not fit
either 1/d or 1/d^2) and they have pretty good looking linear fits against
both 1/d or 1/d^2.

However, we have two live projects that are delivering structures in the
2.2-2.3 Å range, and for which the low-resolution parts of the ResLog plots
are also "fittable". The first one is a C2-symmetric protein with the
following ResLog plot, from 9 Å (100 particles per half) to 2.2 Å (554,277
particles):
[image: image.png]
The second project is on an icosahedral object - here's one of the ResLog
plots from that project, spanning 21 Å (5 particles per half dataset) to
2.4 Å (17,144 particles):
[image: image.png]

At least in these cases, I think it's obvious that one should not model 1/d
resolution as a linear function of ln(N). In fact doing so in the
low-resolution range would yield a significant underestimate of the B
factor, and of the number of particles needed to reach high resolution.

Conversely, plotting 1/d^2 as a function of ln(N) would lead to
approximately the same estimate of B factor (or of number of particles
needed to reach a target resolution) regardless of the number of points
plotted or the resolution range used.

Thus, for reasons both theoretical and practical, I do not recommend any
one use ResLog plots using a 1/d scale.

Cheers,
Alexis



On Fri, Jan 19, 2024 at 1:03 PM Scott Stagg <sstagg at fsu.edu> wrote:

> Dear colleagues,
>
> I just want to clarify the usage of the ResLog plots from my perspective.
> Our intent with ResLog was to provide an empirical metric for assessing
> reconstruction and data quality. We noticed that plots of the spatial
> frequency vs. the log of the numbers of particles were linear, which makes
> it very convenient to compare reconstructions in terms of the ResLog slope
> and intercept. The units were also sensible with respect to our
> understanding of structure (Å and particles) which fits with how we think
> about structures. Our claim was that the ResLog slope related to the
> overall B factor while the the intercept (or rather nptlcs=1 since 0 is
> undefined) relates to reconstruction quality, but we never provided a
> theoretical formulation for determining B factor from the plots. I agree
> that the most conventional formulation (though not only) for determining
> the B factor is the Guinier plot etc as described in Rosenthal and
> Henderson with units of Å^2. It is also has the same units at the
> crystallographic B factor, which is quite sensible.
>
> Regarding ResLog slopes and intercepts, they remain very convenient ways
> of comparing reconstructions and datasets. I have always intended to go
> back and determine what particular values for ResLog intercepts tell us
> about the reliability of a given reconstruction. There is definitely a
> number below which the reconstruction is questionable. The idea is that the
> ResLog intercept is another metric for reconstruction validation and/or can
> tell a given investigator whether or not there are issues with their data
> that need further investigation.
>
> Best regards,
> Scott
>
> Scott Stagg
> Professor
> Institute of Molecular Biophysics
> Department of Biological Sciences
> Florida State University
> Tallahassee, FL 32306-4380
> w: 850-645-7872
> f: 850-644-7244
> sstagg at fsu.edu
> https://urldefense.com/v3/__https://www.stagglab.com__;!!Mih3wA!CTKYTraijnExvwQBK7rltkgMQPUgWv04vJLdUnptwk8uvp8BkSN1T7CRrCdA3VQo6zPMNprTX7PBhbAuaw$ 
> <https://urldefense.com/v3/__https://www.stagglab.com__;!!Mih3wA!HerswEKHNBNhS8CdXzImAoL-GDUp6jMXkJmuYCZIEw0mx5USLX2pjs3fAMJBWO87MT_ctmaMeYSoEgDD1A$>
>
>
>
>
> On Jan 19, 2024, at 2:38 AM, Takanori Nakane <
> tnakane.protein at osaka-u.ac.jp> wrote:
>
> Hi,
>
> This is a tricky point.
>
> Indeed poses obtained from fewer numbers of particles are
> less accurate, because the reference volume is of lower quality.
> Thus, RELION's way is more solid from a theoretical point of view.
> But we have to recall that other parameters have been optimised against
> ALL particles in earlier CtfRefine and Polish jobs.
> To make it *really* solid, one would have to repeat everything from
> raw movies! But this is computationally unrealistic.
>
> In reality, fortunately, the difference is tiny (if any) unless
> you are looking at the fewest particle region.
>
> Also note that stochasticity in random sampling introduces
> variations in the resolution. To be precise (e.g. you are
> writing a paper comparing two datasets), one should perform
> 3 or 5 trials for each data point.
>
> Best regards,
>
> Takanori Nakane
>
> On 2024/01/19 16:23, Hagen, Wim J. wrote:
>
> Dear Henning,
> You mention “ResLog B-factor”, but could that be mixing up two things?
> Reslog Analysis, at least in cryoSPARC, uses reconstruction-only for the
> subsets of particles to build the plot, based on the poses of the particles
> from 3D-refinement of the full dataset.
>
> https://urldefense.com/v3/__https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/post-processing/job-reslog-analysis__;!!Mih3wA!BM4AIXXXnt06QSHnQoyitzFlmTy7NoSqPwgDL5IG1eNvFyk_quTjvF_cYd5UQMCoinBzzHHHUhuutKBlhCs5OPvkGQ3aHh-54Zo$
>  <
> https://urldefense.com/v3/__https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/post-processing/job-reslog-analysis__;!!Mih3wA!AgY5dBQd-XhHY1O0m5MdiahCiXuZaFNiqzSWPxClQ6CMYaws6iEctqqslya3plYhyszAACzLOR0t1m9fwFWVKVud0Q$%3E
> The Rosenthal-Henderson B-factor plot, at least in the Relion
> bfactor_plot.py script, uses 3D-refinement for the subsets of particles.
>
> https://urldefense.com/v3/__https://github.com/3dem/relion/blob/master/scripts/bfactor_plot.py__;!!Mih3wA!BM4AIXXXnt06QSHnQoyitzFlmTy7NoSqPwgDL5IG1eNvFyk_quTjvF_cYd5UQMCoinBzzHHHUhuutKBlhCs5OPvkGQ3aADj7CPU$
>  <
> https://urldefense.com/v3/__https://github.com/3dem/relion/blob/master/scripts/bfactor_plot.py__;!!Mih3wA!AgY5dBQd-XhHY1O0m5MdiahCiXuZaFNiqzSWPxClQ6CMYaws6iEctqqslya3plYhyszAACzLOR0t1m9fwFU1HQk4_g$%3E
> This makes the outcome of both procedures different, regardless of the
> units used for fitting the plotted line. Reslog Analysis still serves the
> quality assessments discussed in Stagg et al, but one could argue that it
> is not the best indicator for how many (more) particles on needs to get to
> a certain resolution.
> Best,
> Wim
> *From:*3dem <3dem-bounces at ncmir.ucsd.edu> *On Behalf Of *Alexis Rohou
> *Sent:* 19 January 2024 00:45
> *To:* Henning Stahlberg <henning.stahlberg at epfl.ch>
> *Cc:* 3dem at ncmir.ucsd.edu
> *Subject:* Re: [3dem] B-factors
> *CAUTION:*This email originated from outside of Thermo Fisher Scientific.
> If you believe it to be suspicious, report using the Report Phish button in
> Outlook or send to SOC at thermofisher.com.
> Dear Henning,
> Thanks for bringing this to our attention.
> I for one hadn't picked up on this discrepancy until you pointed it out. I
> had assumed that all ResLog plots always used Å^2 units. To my mind this is
> the only dimensionality that makes sense if we're going to use the "B
> factor" concept and wording (in analogy to temperature factors) to describe
> incoherent averaging in our reconstructions. I would have thought we'd
> always want to give B factors in units of Å^2, following the logic outlined
> in R&H2003.
> Perhaps Scott and/or co-authors could comment as to why they moved from
> Å^2 to Å.
> Cheers,
> Alexis
> PS. My understanding of this topic is summarized in section 4.7 of this
> 2021 book <
> https://urldefense.com/v3/__https:/iopscience.iop.org/book/edit/978-0-7503-3039-8__;!!Mih3wA!EInZgiJTGjny9OeUYvfegswag6cL2aI9L5BOih5sIGlIEB17fQvaYylR2PG7ZLpVAOQHO6FMdGAZKrb0iw$%3E.
> I regret that I didn't notice this discrepancy in the units at the time of
> writing - my text makes the assumption that all ResLog plots use 1/Å^2.
> Despite this omission, readers who are not familiar with the topic in
> question may still find it a useful introduction. I'd be happy to share
> preprints if you do not have access to the final publication.
> On Thu, Jan 18, 2024 at 6:27 AM Henning Stahlberg <
> henning.stahlberg at epfl.ch <mailto:henning.stahlberg at epfl.ch>> wrote:
>    Dear Colleagues,
>    There are two B-factors used in cryo-EM:
>    Rosenthal and Henderson, JMB (2003) discuss the Guinier plot, where
>    the amplitude falloff beyond 10A resolution can be fitted with a
>    B-factor that has the unit "Angstromˆ2".
>    They also discuss the dependency of the resolution d (in Angstroem)
>    on the number # of particles, and provide the basis for a ResLog
>    B-factor, which is obtained from the slope of 1/dˆ2 as a function of
>    ln(#).  The numbers of particles needed to reach a resolution "d" is
>    then obtained with:
>    # = (1/Nasym) * (<S>/<N>)ˆ2 * (30 pi) / (N_e * sigma_e * d) * exp(B
>    / (2 * dˆ2))
>    The B-factor also in this case is defined as in Angstromˆ2.
>    Stagg et al., JSB (2014) define a dependency of the resolution d
>    from ln(#), with
>    d = constant * ln(#) + constant.
>    So, here, “d” is linear, not to the square. Their ResLog B-factor is
>    then presumably obtained from the first "constant" in that equation,
>    therefore in Angstrom,  not Angstromˆ2.
>    This is also implemented in CryoSPARC, which also plots 1/d as a
>    function of ln(#).
>    But other papers, such as Yip et al. and Holger Stark, Nature (2020)
>    discuss the ResLog B-factor in Aˆ2 again.
>    It is interesting for a map to provide all three, the FSC 0.143, the
>    Sharpening B-factor in Aˆ2, and the Reconstruction ("ResLog")
>    B-factor in Aˆ2.
>    But, what is the most commonly used definition of the ResLog
>    B-factor, A or Aˆ2 ?
>    Best wishes,
>    Henning.
>    Henning Stahlberg
>    Laboratory of Biological Electron Microscopy
>    Institute of Physics, School of Basic Sciences, EPFL, and
>    Dep. of Fund. Microbiology, Faculty of Biology and Medicine, UNIL,
>    Cubotron, BSP421, 1015 Lausanne, Switzerland
>
> https://urldefense.com/v3/__https://lbem.ch__;!!Mih3wA!He8RNad9-usi4V76A6bilma8i0ypnwt1U80fxDsSqpHDIfUZKGI_ft-uqv-vt7uS2afD77_BWpZnsmapEJYkinMjcYW-0HQ$
> <
> https://urldefense.com/v3/__https:/lbem.ch__;!!Mih3wA!He8RNad9-usi4V76A6bilma8i0ypnwt1U80fxDsSqpHDIfUZKGI_ft-uqv-vt7uS2afD77_BWpZnsmapEJYkinMjcYW-0HQ$%3E
> , +41 21 693 45 07
>    _______________________________________________
>    3dem mailing list
>    3dem at ncmir.ucsd.edu <mailto:3dem at ncmir.ucsd.edu>
>
> https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!PhOWcWs!wBM_zftkoRjJLQfMAIbWuNDrz0OcT6gyk4_TzHBIFF3eo5tu5x4KD6qrdpN1_9rMCR_q5NYW32LvHz0PIwmSvsIjh4fPFqc$
>     <
> https://urldefense.com/v3/__https:/mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!HLrAl2XzZ3iCLg!ArjPrGwnc6rWmiLNtvNWBRbGfdvYaxJJudu-3KwR1vsCir0wwbdG49Jsj4zHHl4l-QDxjMcFOsquWNWcuhye$%3E
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
>
> https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!PhOWcWs!wBM_zftkoRjJLQfMAIbWuNDrz0OcT6gyk4_TzHBIFF3eo5tu5x4KD6qrdpN1_9rMCR_q5NYW32LvHz0PIwmSvsIjh4fPFqc$
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
>
> https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!PhOWcWs!wBM_zftkoRjJLQfMAIbWuNDrz0OcT6gyk4_TzHBIFF3eo5tu5x4KD6qrdpN1_9rMCR_q5NYW32LvHz0PIwmSvsIjh4fPFqc$
>
>
> _______________________________________________
> 3dem mailing list
> 3dem at ncmir.ucsd.edu
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/ebc45825/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 63829 bytes
Desc: not available
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/ebc45825/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 67340 bytes
Desc: not available
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/ebc45825/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 66715 bytes
Desc: not available
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/ebc45825/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 72072 bytes
Desc: not available
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/ebc45825/attachment-0007.png>


More information about the 3dem mailing list