[3dem] B-factors

Alex Noble anoble at nysbc.org
Thu Feb 1 12:31:41 PST 2024


Hi Alexis,

Your analysis seems valid. When we did the analysis in the ResLog paper, we were heavily limited by resolution (4.5 angstroms) and didn't look below 100 particles because it became visually unreliable. That was over 10 years ago in the cryoEM middle ages.

However, I'm not convinced that ResLog plots are useless. The y-intercept crossings may have real value, as we investigated in the ResLog paper. So I think saying that no-one should use them is too strong of a statement. This is something that needs to be investigated further.

Anyways, that's my 2-cents.
Best,
-Alex
________________________________
From: 3dem <3dem-bounces at ncmir.ucsd.edu> on behalf of Alexis Rohou <a.rohou at gmail.com>
Sent: Thursday, February 1, 2024 2:16 PM
To: Scott Stagg <sstagg at fsu.edu>
Cc: 3dem at ncmir.ucsd.edu <3dem at ncmir.ucsd.edu>
Subject: Re: [3dem] B-factors

Dear Scott, Henning, colleagues,

I now think that using 1/d instead of 1/d^2 is not only suboptimal from a theoretical point of view but also may lead to erroneous estimates of resolution, number of particles, and/or B factors.

Unless someone can show me the errors of my ways, I would recommend that no-one use linear fits to ResLog plots where the resolution axis is in reciprocal Å anymore, and that cryoSPARC (and other packages that use 1/d plots for this) be modified to plot in reciprocal squared Å instead.

Below is an explanation of how I have reached this conclusion. If I missed something or made an error somewhere, I'd appreciate a correction!

Scott and others correctly point out that many ResLog plots with a 1/d axis (appear to) show linear relationships between ln(N) and 1/d. How could this be?

The answer I believe is that the vast majority of ResLog plots do not cover a wide enough range of resolutions.

Let's consider the theoretical predictions from Rosenthal & Henderson (2003), which Henning mentioned in his original post. Below is a plot of the 1/d or 1/d^2 resolution as a function of ln(N) (the natural log of the number of particles) using this theoretical prediction, with resolutions ranging from 8 Å (~1100 particles) to 1.5 Å (~50 million particles; for this simulation, I used a B factor of 50 Å^2). The blue dots are using 1/d on the Y axis, and the red dots are using 1/d^2:
[image.png]

When plotted like that, it is obvious that the linear fit is not a good model for 1/d v ln(N) (compare the blue line and the blue dots). But notice that the deviations from linearity are not that great in the blue series - in the presence of experimental noise you would be forgiven for missing it.

Now, let's look at the exact same plots from the theoretical prediction, but pretending we only reached 3 Å resolution and that the resolution estimates at 8 Å and below are either not available or unreliable (due to effects of the molecular envelope or secondary structures, say):
[image.png]
Even when considering only these theoretical predictions, a ResLog plot using 1/d looks very linear indeed in the 8 Å - 3 Å range! In other words, the exponential damping due to the B factor is only noticeable when considering wide resolution ranges, wider than those covered by most ResLog plots.

In recent days, I have inspected ResLog plots from several of our in-house projects with ongoing refinements and the large majority of them do not cover a wide resolution range (because those projects are stuck at say 3 Å, and/or because the low-resolution points on the ResLog plots do not fit either 1/d or 1/d^2) and they have pretty good looking linear fits against both 1/d or 1/d^2.

However, we have two live projects that are delivering structures in the 2.2-2.3 Å range, and for which the low-resolution parts of the ResLog plots are also "fittable". The first one is a C2-symmetric protein with the following ResLog plot, from 9 Å (100 particles per half) to 2.2 Å (554,277 particles):
[image.png]
The second project is on an icosahedral object - here's one of the ResLog plots from that project, spanning 21 Å (5 particles per half dataset) to 2.4 Å (17,144 particles):
[image.png]

At least in these cases, I think it's obvious that one should not model 1/d resolution as a linear function of ln(N). In fact doing so in the low-resolution range would yield a significant underestimate of the B factor, and of the number of particles needed to reach high resolution.

Conversely, plotting 1/d^2 as a function of ln(N) would lead to approximately the same estimate of B factor (or of number of particles needed to reach a target resolution) regardless of the number of points plotted or the resolution range used.

Thus, for reasons both theoretical and practical, I do not recommend any one use ResLog plots using a 1/d scale.

Cheers,
Alexis



On Fri, Jan 19, 2024 at 1:03 PM Scott Stagg <sstagg at fsu.edu<mailto:sstagg at fsu.edu>> wrote:
Dear colleagues,

I just want to clarify the usage of the ResLog plots from my perspective. Our intent with ResLog was to provide an empirical metric for assessing reconstruction and data quality. We noticed that plots of the spatial frequency vs. the log of the numbers of particles were linear, which makes it very convenient to compare reconstructions in terms of the ResLog slope and intercept. The units were also sensible with respect to our understanding of structure (Å and particles) which fits with how we think about structures. Our claim was that the ResLog slope related to the overall B factor while the the intercept (or rather nptlcs=1 since 0 is undefined) relates to reconstruction quality, but we never provided a theoretical formulation for determining B factor from the plots. I agree that the most conventional formulation (though not only) for determining the B factor is the Guinier plot etc as described in Rosenthal and Henderson with units of Å^2. It is also has the same units at the crystallographic B factor, which is quite sensible.

Regarding ResLog slopes and intercepts, they remain very convenient ways of comparing reconstructions and datasets. I have always intended to go back and determine what particular values for ResLog intercepts tell us about the reliability of a given reconstruction. There is definitely a number below which the reconstruction is questionable. The idea is that the ResLog intercept is another metric for reconstruction validation and/or can tell a given investigator whether or not there are issues with their data that need further investigation.

Best regards,
Scott

Scott Stagg
Professor
Institute of Molecular Biophysics
Department of Biological Sciences
Florida State University
Tallahassee, FL 32306-4380
w: 850-645-7872
f: 850-644-7244
sstagg at fsu.edu<mailto:sstagg at fsu.edu>
https://urldefense.com/v3/__https://www.stagglab.com__;!!Mih3wA!A6vZvwHMRBIXPHzTIUsj7wpOjfZZ7BYZE39GTXzLLcMb1C9sd3kIj1A7KLHnkrpyWLlSXui5hevY-EBnRg$ <https://urldefense.com/v3/__https://www.stagglab.com__;!!Mih3wA!HerswEKHNBNhS8CdXzImAoL-GDUp6jMXkJmuYCZIEw0mx5USLX2pjs3fAMJBWO87MT_ctmaMeYSoEgDD1A$>




On Jan 19, 2024, at 2:38 AM, Takanori Nakane <tnakane.protein at osaka-u.ac.jp<mailto:tnakane.protein at osaka-u.ac.jp>> wrote:

Hi,

This is a tricky point.

Indeed poses obtained from fewer numbers of particles are
less accurate, because the reference volume is of lower quality.
Thus, RELION's way is more solid from a theoretical point of view.
But we have to recall that other parameters have been optimised against
ALL particles in earlier CtfRefine and Polish jobs.
To make it *really* solid, one would have to repeat everything from
raw movies! But this is computationally unrealistic.

In reality, fortunately, the difference is tiny (if any) unless
you are looking at the fewest particle region.

Also note that stochasticity in random sampling introduces
variations in the resolution. To be precise (e.g. you are
writing a paper comparing two datasets), one should perform
3 or 5 trials for each data point.

Best regards,

Takanori Nakane

On 2024/01/19 16:23, Hagen, Wim J. wrote:
Dear Henning,
You mention “ResLog B-factor”, but could that be mixing up two things?
Reslog Analysis, at least in cryoSPARC, uses reconstruction-only for the subsets of particles to build the plot, based on the poses of the particles from 3D-refinement of the full dataset.
https://urldefense.com/v3/__https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/post-processing/job-reslog-analysis__;!!Mih3wA!BM4AIXXXnt06QSHnQoyitzFlmTy7NoSqPwgDL5IG1eNvFyk_quTjvF_cYd5UQMCoinBzzHHHUhuutKBlhCs5OPvkGQ3aHh-54Zo$  <https://urldefense.com/v3/__https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/post-processing/job-reslog-analysis__;!!Mih3wA!AgY5dBQd-XhHY1O0m5MdiahCiXuZaFNiqzSWPxClQ6CMYaws6iEctqqslya3plYhyszAACzLOR0t1m9fwFWVKVud0Q$%3E
The Rosenthal-Henderson B-factor plot, at least in the Relion bfactor_plot.py script, uses 3D-refinement for the subsets of particles.
https://urldefense.com/v3/__https://github.com/3dem/relion/blob/master/scripts/bfactor_plot.py__;!!Mih3wA!BM4AIXXXnt06QSHnQoyitzFlmTy7NoSqPwgDL5IG1eNvFyk_quTjvF_cYd5UQMCoinBzzHHHUhuutKBlhCs5OPvkGQ3aADj7CPU$  <https://urldefense.com/v3/__https://github.com/3dem/relion/blob/master/scripts/bfactor_plot.py__;!!Mih3wA!AgY5dBQd-XhHY1O0m5MdiahCiXuZaFNiqzSWPxClQ6CMYaws6iEctqqslya3plYhyszAACzLOR0t1m9fwFU1HQk4_g$%3E
This makes the outcome of both procedures different, regardless of the units used for fitting the plotted line. Reslog Analysis still serves the quality assessments discussed in Stagg et al, but one could argue that it is not the best indicator for how many (more) particles on needs to get to a certain resolution.
Best,
Wim
*From:*3dem <3dem-bounces at ncmir.ucsd.edu<mailto:3dem-bounces at ncmir.ucsd.edu>> *On Behalf Of *Alexis Rohou
*Sent:* 19 January 2024 00:45
*To:* Henning Stahlberg <henning.stahlberg at epfl.ch<mailto:henning.stahlberg at epfl.ch>>
*Cc:* 3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
*Subject:* Re: [3dem] B-factors
*CAUTION:*This email originated from outside of Thermo Fisher Scientific. If you believe it to be suspicious, report using the Report Phish button in Outlook or send to SOC at thermofisher.com<mailto:SOC at thermofisher.com>.
Dear Henning,
Thanks for bringing this to our attention.
I for one hadn't picked up on this discrepancy until you pointed it out. I had assumed that all ResLog plots always used Å^2 units. To my mind this is the only dimensionality that makes sense if we're going to use the "B factor" concept and wording (in analogy to temperature factors) to describe incoherent averaging in our reconstructions. I would have thought we'd always want to give B factors in units of Å^2, following the logic outlined in R&H2003.
Perhaps Scott and/or co-authors could comment as to why they moved from Å^2 to Å.
Cheers,
Alexis
PS. My understanding of this topic is summarized in section 4.7 of this 2021 book <https://urldefense.com/v3/__https:/iopscience.iop.org/book/edit/978-0-7503-3039-8__;!!Mih3wA!EInZgiJTGjny9OeUYvfegswag6cL2aI9L5BOih5sIGlIEB17fQvaYylR2PG7ZLpVAOQHO6FMdGAZKrb0iw$%3E. I regret that I didn't notice this discrepancy in the units at the time of writing - my text makes the assumption that all ResLog plots use 1/Å^2. Despite this omission, readers who are not familiar with the topic in question may still find it a useful introduction. I'd be happy to share preprints if you do not have access to the final publication.
On Thu, Jan 18, 2024 at 6:27 AM Henning Stahlberg <henning.stahlberg at epfl.ch<mailto:henning.stahlberg at epfl.ch> <mailto:henning.stahlberg at epfl.ch<mailto:henning.stahlberg at epfl.ch>>> wrote:
   Dear Colleagues,
   There are two B-factors used in cryo-EM:
   Rosenthal and Henderson, JMB (2003) discuss the Guinier plot, where
   the amplitude falloff beyond 10A resolution can be fitted with a
   B-factor that has the unit "Angstromˆ2".
   They also discuss the dependency of the resolution d (in Angstroem)
   on the number # of particles, and provide the basis for a ResLog
   B-factor, which is obtained from the slope of 1/dˆ2 as a function of
   ln(#).  The numbers of particles needed to reach a resolution "d" is
   then obtained with:
   # = (1/Nasym) * (<S>/<N>)ˆ2 * (30 pi) / (N_e * sigma_e * d) * exp(B
   / (2 * dˆ2))
   The B-factor also in this case is defined as in Angstromˆ2.
   Stagg et al., JSB (2014) define a dependency of the resolution d
   from ln(#), with
   d = constant * ln(#) + constant.
   So, here, “d” is linear, not to the square. Their ResLog B-factor is
   then presumably obtained from the first "constant" in that equation,
   therefore in Angstrom,  not Angstromˆ2.
   This is also implemented in CryoSPARC, which also plots 1/d as a
   function of ln(#).
   But other papers, such as Yip et al. and Holger Stark, Nature (2020)
   discuss the ResLog B-factor in Aˆ2 again.
   It is interesting for a map to provide all three, the FSC 0.143, the
   Sharpening B-factor in Aˆ2, and the Reconstruction ("ResLog")
   B-factor in Aˆ2.
   But, what is the most commonly used definition of the ResLog
   B-factor, A or Aˆ2 ?
   Best wishes,
   Henning.
   Henning Stahlberg
   Laboratory of Biological Electron Microscopy
   Institute of Physics, School of Basic Sciences, EPFL, and
   Dep. of Fund. Microbiology, Faculty of Biology and Medicine, UNIL,
   Cubotron, BSP421, 1015 Lausanne, Switzerland
   https://urldefense.com/v3/__https://lbem.ch__;!!Mih3wA!He8RNad9-usi4V76A6bilma8i0ypnwt1U80fxDsSqpHDIfUZKGI_ft-uqv-vt7uS2afD77_BWpZnsmapEJYkinMjcYW-0HQ$ <https://urldefense.com/v3/__https:/lbem.ch__;!!Mih3wA!He8RNad9-usi4V76A6bilma8i0ypnwt1U80fxDsSqpHDIfUZKGI_ft-uqv-vt7uS2afD77_BWpZnsmapEJYkinMjcYW-0HQ$%3E , +41 21 693 45 07
   _______________________________________________
   3dem mailing list
   3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu> <mailto:3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>>
   https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!PhOWcWs!wBM_zftkoRjJLQfMAIbWuNDrz0OcT6gyk4_TzHBIFF3eo5tu5x4KD6qrdpN1_9rMCR_q5NYW32LvHz0PIwmSvsIjh4fPFqc$     <https://urldefense.com/v3/__https:/mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!HLrAl2XzZ3iCLg!ArjPrGwnc6rWmiLNtvNWBRbGfdvYaxJJudu-3KwR1vsCir0wwbdG49Jsj4zHHl4l-QDxjMcFOsquWNWcuhye$%3E
_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!PhOWcWs!wBM_zftkoRjJLQfMAIbWuNDrz0OcT6gyk4_TzHBIFF3eo5tu5x4KD6qrdpN1_9rMCR_q5NYW32LvHz0PIwmSvsIjh4fPFqc$
_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://urldefense.com/v3/__https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem__;!!PhOWcWs!wBM_zftkoRjJLQfMAIbWuNDrz0OcT6gyk4_TzHBIFF3eo5tu5x4KD6qrdpN1_9rMCR_q5NYW32LvHz0PIwmSvsIjh4fPFqc$

_______________________________________________
3dem mailing list
3dem at ncmir.ucsd.edu<mailto:3dem at ncmir.ucsd.edu>
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/0c1cf987/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 63829 bytes
Desc: image.png
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/0c1cf987/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 67340 bytes
Desc: image.png
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/0c1cf987/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 66715 bytes
Desc: image.png
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/0c1cf987/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 72072 bytes
Desc: image.png
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20240201/0c1cf987/attachment-0007.png>


More information about the 3dem mailing list