<div dir="ltr">Hi Carlos Oscar,<div><br></div><div>I wish I understood more about x-ray scattering, because I agree there are likely useful lessons there. But your point about SAXS is only really valid when considering small angles (low resolutions), isn't it? Can much be said about the identity or structure of a protein from the wide-angle x-ray scattering profiles?</div><div><br></div><div>Cheers,</div><div>Alexis</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 27, 2020 at 9:27 AM Carlos Oscar Sorzano <<a href="mailto:coss@cnb.csic.es">coss@cnb.csic.es</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>I totally agree with Arjen, and I think that all the ideas thrown
in are useful in their context. Another thing to consider, SAXS
curves are very much related to the topic we are discussing, are
they not? (understanding the differences between X-ray photons and
electrons diffraction, frozen proteins and proteins in solution,
possible mixtures of various conformations, etc.). The fact that
different SAXS curves are measured for different proteins, would
that show that there is not a single Platonic curve for all
proteins?</p>
<p>Cheers, Carlos Oscar<br>
</p>
<div>El 27/08/2020 a las 16:49, Arjen Jakobi
- TNW escribió:<br>
</div>
<blockquote type="cite">
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Hi Alexis,</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">You bring up an interesting point.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">My understanding is that Wilson statistics
assumes (and is strictly valid only for) independent and
uniformly distributed (= random) atoms. This is why I think
that Wilson statistics as derived in the original paper are
primarily valid in the high-resolution (better than 3Å) part
of the Wilson/Guinier plot. </span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">At lower resolution, in particular at those
spatial frequencies corresponding to repetitive features in
real-space, i.e. the regular path proteins (secondary
structure), and nucleic acids (base stacking) follow in 3D
space & (ordered) solvent give rise to characteristic
features in the pair-distribution function. This is what you
typically see in a “Guinier plot”: this plot is in principle
a (rotationally averaged) representation of the texture of
the macromolecule and will contain characteristic deviations
from the exponential (or linear in log-plot) decay expected
from Wilson statistics, because at some resolution/spatial
frequencies the arrangement of atoms is not random. This is
true regardless of whether you consider X-ray or EM
experiments. This is also a reason why I think B-factor
estimations if performed including these regions are
systematically off; the R^2 of linear regression will be
poor. Once you are moving to higher resolution, let us say
3.0 Å and beyond, a protein structure can very well be
considered as a collection of randomly distributed atoms and
here Wilson statistics hold and the slope gives the
B-factor. If you do a fit in this region of a Guinier plot,
e.g. for high-resolution ApoF structures, the fit will be
very good.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Regarding Carlos Oscar’s statement that the
fall-off is “arbitrary depending on its shape”, I do not
necessarily agree but I guess the point he is trying to make
is that the radially averaged fall-off will be modulated by
these effect (e.g. secondary structure) and this could be
considered a “fingerprint” of the protein in question. In
practice, when radially averaging over the entire structure,
the fall-off, including deviations from Wilson statistics,
will be very similar for most proteins unless they are e.g.
all-alpha, all-beta or contain significant amount of nucleic
acids as e.g. ribosomes.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">This aside: Could it be that the difference you
observe comes from the fact that in your simulated model you
do not account for solvent? If you make a thought experiment
and place your protein in a “vacuum” then this would lead to
an overestimation of “contrast” at the molecule surface
compared to the situation where you have solvent. Taking
this to your simulated structure factor, the calculated
structure factor would be expected to be systematically
larger than the observed structure factor amplitude in
regions where in the real situation bulk solvent is having
noticeable effect (e.g. 5Å and below). </span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">If in your case you have derived your B_ideal
from a fit in this region (e.g. 20 – 4 Å), than the
calculated fall-off would probably be steeper than the
observed amplitude fall-off. If you have fit in the
high-resolution region then this should have no effect.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Not sure if it helps.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Best,</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Arjen</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black"><u></u> <u></u></span></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(181,196,223);padding:3pt 0cm 0cm">
<p class="MsoNormal" style="margin-left:36pt"><b><span style="font-size:12pt;color:black">From:
</span></b><span style="font-size:12pt;color:black">3dem
<a href="mailto:3dem-bounces@ncmir.ucsd.edu" target="_blank"><3dem-bounces@ncmir.ucsd.edu></a> on behalf of Alexis
Rohou <a href="mailto:a.rohou@gmail.com" target="_blank"><a.rohou@gmail.com></a><br>
<b>Date: </b>Thursday, 27 August 2020 at 07:06<br>
<b>To: </b>Carlos Oscar Sorzano <a href="mailto:coss@cnb.csic.es" target="_blank"><coss@cnb.csic.es></a><br>
<b>Cc: </b>3dem <a href="mailto:3dem@ncmir.ucsd.edu" target="_blank"><3dem@ncmir.ucsd.edu></a><br>
<b>Subject: </b>Re: [3dem] what is the ideal B factor?<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt">Thank you
Carlos Oscar for summarizing your work so succinctly!
<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt">I just would
like to pick up on your concluding sentence:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal" style="margin-left:36pt">In
conclusion, from my point of view, there is not an optimal
decay valid for all proteins, but it depends on each
specific protein. And the shape of the decay is not a
straight line, but arbitrary depending on its shape.<u></u><u></u></p>
</blockquote>
<div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt">If your
point of view is correct, this implies that ResLog plots
and the resulting B factors should not be compared to each
other if they were obtained from images of different
proteins. This would be quite a departure from the field's
consensus. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt">Cheers,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt">Alexis<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal" style="margin-left:36pt">On Wed, Aug
26, 2020 at 12:32 AM Carlos Oscar Sorzano <<a href="mailto:coss@cnb.csic.es" target="_blank">coss@cnb.csic.es</a>>
wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<p style="margin-left:36pt">Dear Alexis and all,<u></u><u></u></p>
<p style="margin-left:36pt">as a very condensed summary
of what Jose Luis Vilas and us showed in the paper you
have mentioned below is that:<u></u><u></u></p>
<p style="margin-left:36pt">1. the Fourier spectrum of a
single atom is expected to decay in amplitude with
frequency (the only way it can be flat is that it is
infinitely thin). This is well known and comes from the
electron atomic scattering factors.<u></u><u></u></p>
<p style="margin-left:36pt">2. the Fourier spectrum of a
collection of atoms is mostly determined by the shape of
that collection, more than on the specific nature of the
atoms being involved (we performed extremely harsh
modifications to the atoms and the decay did not change
significantly).<u></u><u></u></p>
<p style="margin-left:36pt">3. the reasons normally
argued to make the spectrum flat, do not apply to
macromolecules and the reason why B-factor sharpening
produces "nice" structures is mostly a visualization
reason (higher amplitudes at high frequencies result in
sharper edges whose isosurfaces are easier to track and
fit with an atomic model).<u></u><u></u></p>
<p style="margin-left:36pt">Because of 2, the decay of
the radial average of the Fourier transform of a
macromolecule cannot be expected to follow any
particular shape (for instance, a straight line) a
priori, that we can estimate its slope (also a priori)
and force our 3D reconstruction to follow that slope. In
that regard the question of what is the expected slope
is ill-posed. From my point of view, the amplitude
correction is much more meaningful when performed in the
spirit of LocScale of Jakobi and Sachse. You fit an
atomic model to the map, then convert it into a map,
estimate its decay and force the map to follow this
decay. In this way, the shape of the collection of atoms
(and their nature) is explicitly taken into account.
This process can be performed iteratively (with the
corrected map, you may refine the atomic model, refine
the map amplitudes again, ...). I also like the idea
that this process is performed locally.<u></u><u></u></p>
<p style="margin-left:36pt">If we do not want to wait
for the atomic model to make the amplitude correction,
we have devised an alternative based on the local
resolution (E. Ramirez-Aportela, J.L.Vilas, A. Glukhova,
R. Melero, P. Conesa, M. Martinez, D. Maluenda, J. Mota,
A. Jimenez, J. Vargas, R. Marabini, P.M. Sexton, J.M.
Carazo, C.O.S. Sorzano. Automatic local resolution-based
sharpening of cryo-EM maps. Bioinformatics 36: 765-772
(2020)). There is no guarantee that it will follow the
correct decay, but in practice we have observed that it
normally approximates the correct decay quite closely
(there are some examples of this in the paper).<u></u><u></u></p>
<p style="margin-left:36pt">The procedure above of local
correction based on local resolution is local and it
does not require an the atomic model. If we still want
to do a global correction without an atomic model,
procedures like the one of phenix (<a href="https://urldefense.com/v3/__https:/journals.iucr.org/d/issues/2018/06/00/ic5102/ic5102.pdf__;!!Mih3wA!Rn8ptfdW3i-ZJAsyJRtf84MeTMX0gbKZxAVhSD_WbmT6PxiAn3RA49bpffFudf71Mg$" target="_blank">https://journals.iucr.org/d/issues/2018/06/00/ic5102/ic5102.pdf</a>)
provides some clue based on the maximum continuity of
the isosurface.<u></u><u></u></p>
<p style="margin-left:36pt">Finally, we found that a
combination of DeepRes (E. Ramírez-Aportela, J. Mota, P.
Conesa, J.M. Carazo, C.O.S. Sorzano. DeepRes: A New Deep
Learning and aspect-based Local Resolution Method for
Electron Microscopy Maps . IUCR J 6: 1054-1063 (2019))
and BlocRes (<a href="https://urldefense.com/v3/__https:/www.sciencedirect.com/science/article/pii/S1047847713002086__;!!Mih3wA!Rn8ptfdW3i-ZJAsyJRtf84MeTMX0gbKZxAVhSD_WbmT6PxiAn3RA49bpffHaOthRoQ$" target="_blank">https://www.sciencedirect.com/science/article/pii/S1047847713002086</a>)
could help to find a B-factor that does not result in
overfitting.<u></u><u></u></p>
<p style="margin-left:36pt">In conclusion, from my point
of view, there is not an optimal decay valid for all
proteins, but it depends on each specific protein. And
the shape of the decay is not a straight line, but
arbitrary depending on its shape.<u></u><u></u></p>
<p style="margin-left:36pt">I hope these reflections
helped a bit.<u></u><u></u></p>
<p style="margin-left:36pt">Cheers, Carlos Oscar<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-left:36pt">On
8/26/20 7:05 AM, Alexis Rohou wrote:<u></u><u></u></p>
</div>
<blockquote style="margin-top:5pt;margin-bottom:5pt">
<div>
<p class="MsoNormal" style="margin-left:36pt">Dear
colleagues,<br>
<br>
I hope you may be able to help me get my head around
something.<br>
<br>
When considering the radially-averaged amplitudes of
an ideal 3D protein structure, the expectation (as
laid out in Fig1 of Rosenthal & Henderson, 2003
(PMID: 14568533), among others) is that in the
Wilson-statistics regime (q > 0.1 Å^-1, let’s
say), amplitudes will decay in a Gaussian manner, or
linearly when plotted on a log scale against q^2,
reflecting the decay of structure factors.
<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-left:36pt"><br>
This expectation is certainly met when simulating
maps from PDB files, as described nicely for
example by Carlos Oscar Sorzano and colleagues
recently (Vilas et al., 2020, PMID: 31911170).
Let’s call the rate of decay of this ideal curve
B_ideal, the “ideal” B factor.<br>
<br>
Assuming for a moment that noise has a flat
spectrum (reasonable so long as shot noise is
dominant), one may follow in Rosenthal &
Henderson’s footsteps and draw a horizontal line
on our plot to represent the noise floor. As more
averaging is carried out, the noise floor is
lowered relative to our protein’s amplitude
profile. As more particles are averaged (without
error, let’s say) the intersection between the
protein’s ideal radial amplitude profile and the
noise floor moves to higher and higher
frequencies.<br>
<br>
This is the basis for the so-called ResLog plots,
where one charts the resolution as a function of
the number of averaged particles. The slope of the
ResLog plot is related to the slope of the radial
amplitude profile of the protein. Assuming no
additional sources of errors (i.e. ideal
instrument and no processing errors), B_ideal (the
slope of the ideal protein amplitude profile) can
be computed from the slope of the ResLog plot via
B_ideal = 2.0/slope.<br>
<br>
Now, to my question. By looking at the slope of a
schematic Guinier plot generated using Wilson
statistics and atomic scattering factors for
electrons, I estimated a B_ideal of approximately
50 Å^2 (decay of ~ 1.37 natural log in amplitude
over 0.1 Å^-2). The problem is that recent
high-resolution studies have reported
ResLog-estimated B factors of 32.5 Å^2 (Nakane et
al., 2020) and 36 Å^2 (Yip et al., 2020), leading
me to wonder what is wrong in the above model.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt"><br>
I see several possibilities:<br>
<br>
(1) B_ideal is actually significantly less than
50 Å^2. This would be consistent with the
empirical observation that “flattening” maps’
amplitude spectrum (i.e. assuming B-ideal = 0 Å^2)
gives very nice maps. Either:<u></u><u></u></p>
</div>
<blockquote style="margin-left:30pt;margin-right:0cm">
<div>
<p class="MsoNormal" style="margin-left:36pt">a.
I mis-estimated B_ideal when reading the
simulated amplitude spectrum plot. Has anyone
done this (i.e. fit a B factor to a simulated
map’s amplitude spectrum, or to a simulated
spectrum)? What did you find?<u></u><u></u></p>
</div>
</blockquote>
<blockquote style="margin-left:30pt;margin-right:0cm">
<div>
<p class="MsoNormal" style="margin-left:36pt">b.
The simulations using atomic scattering
factors and Wilson statistics do not correctly
capture the actual amplitude profile of
proteins, which is actually much flatter than
the atomic scattering factors suggest.<u></u><u></u></p>
</div>
</blockquote>
<div>
<p class="MsoNormal" style="margin-left:36pt">(2)
B_ideal actually is ~ 50 Å^2, but the assumption
of a flat noise spectrum is wrong. I guess that if
the true noise spectrum were also decaying at a
function of q^2, this would cause the ResLog plot
to report “too small” a B factor<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:36pt"><br>
What do you think? <br>
<br>
Cheers,<br>
Alexis<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:36pt"><br>
<br>
<u></u><u></u></p>
<pre style="margin-left:36pt">_______________________________________________<u></u><u></u></pre>
<pre style="margin-left:36pt">3dem mailing list<u></u><u></u></pre>
<pre style="margin-left:36pt"><a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a><u></u><u></u></pre>
<pre style="margin-left:36pt"><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=D1ILr-LNOMGqt1qo-zsdtVsmr4I50KRZdcn6bv1MFNw&m=cIQ9kDZ2XD1GyeGVoYRZijEGtEi7t5R_FnZ8g1qbHl0&s=geo8EXc_Gnz8XpAmgRX12qTby_h7HIC3ar4LnxfFpSY&e=" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a><u></u><u></u></pre>
</blockquote>
</div>
</blockquote>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
3dem mailing list
<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a>
<a href="https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
3dem mailing list<br>
<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a><br>
<a href="https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem" rel="noreferrer" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a><br>
</blockquote></div>