<div dir="ltr">Hi Carlos Oscar,<div><br></div><div>I wish I understood more about x-ray scattering, because I agree there are likely useful lessons there. But your point about SAXS is only really valid when considering small angles (low resolutions), isn't it? Can much be said about the identity or structure of a protein from the wide-angle x-ray scattering profiles?</div><div><br></div><div>Cheers,</div><div>Alexis</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 27, 2020 at 9:27 AM Carlos Oscar Sorzano <<a href="mailto:coss@cnb.csic.es">coss@cnb.csic.es</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>I totally agree with Arjen, and I think that all the ideas thrown

      in are useful in their context. Another thing to consider, SAXS

      curves are very much related to the topic we are discussing, are

      they not? (understanding the differences between X-ray photons and

      electrons diffraction, frozen proteins and proteins in solution,

      possible mixtures of various conformations, etc.). The fact that

      different SAXS curves are measured for different proteins, would

      that show that there is not a single Platonic curve for all

      proteins?</p>

    <p>Cheers, Carlos Oscar<br>

    </p>

    <div>El 27/08/2020 a las 16:49, Arjen Jakobi

      - TNW escribió:<br>

    </div>

    <blockquote type="cite">

      <div>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Hi Alexis,</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">You bring up an interesting point.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">My understanding is that Wilson statistics

            assumes (and is strictly valid only for) independent and

            uniformly distributed (= random) atoms. This is why I think

            that Wilson statistics as derived in the original paper are

            primarily valid in the high-resolution (better than 3Å) part

            of the Wilson/Guinier plot. </span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">At lower resolution, in particular at those

            spatial frequencies corresponding to repetitive features in

            real-space, i.e. the regular path proteins (secondary

            structure), and nucleic acids (base stacking) follow in 3D

            space & (ordered) solvent give rise to characteristic

            features in the pair-distribution function. This is what you

            typically see in a “Guinier plot”: this plot is in principle

            a (rotationally averaged) representation of the texture of

            the macromolecule and will contain characteristic deviations

            from the exponential (or linear in log-plot) decay expected

            from Wilson statistics, because at some resolution/spatial

            frequencies the arrangement of atoms is not random. This is

            true regardless of whether you consider X-ray or EM

            experiments. This is also a reason why I think B-factor

            estimations if performed including these regions are

            systematically off; the R^2 of linear regression will be

            poor. Once you are moving to higher resolution, let us say

            3.0 Å and beyond, a protein structure can very well be

            considered as a collection of randomly distributed atoms and

            here Wilson statistics hold and the slope gives the

            B-factor. If you do a fit in this region of a Guinier plot,

            e.g. for high-resolution ApoF structures, the fit will be

            very good.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Regarding Carlos Oscar’s statement that the

            fall-off is “arbitrary depending on its shape”, I do not

            necessarily agree but I guess the point he is trying to make

            is that the radially averaged fall-off will be modulated by

            these effect (e.g. secondary structure) and this could be

            considered a “fingerprint” of the protein in question. In

            practice, when radially averaging over the entire structure,

            the fall-off, including deviations from Wilson statistics,

            will be very similar for most proteins unless they are e.g.

            all-alpha, all-beta or contain significant amount of nucleic

            acids as e.g. ribosomes.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">This aside: Could it be that the difference you

            observe comes from the fact that in your simulated model you

            do not account for solvent? If you make a thought experiment

            and place your protein in a “vacuum” then this would lead to

            an overestimation of “contrast” at the molecule surface

            compared to the situation where you have solvent. Taking

            this to your simulated structure factor, the calculated

            structure factor would be expected to be systematically

            larger than the observed structure factor amplitude in

            regions where in the real situation bulk solvent is having

            noticeable effect (e.g. 5Å and below). </span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">If in your case you have derived your B_ideal

            from a fit in this region (e.g. 20 – 4 Å), than the

            calculated fall-off would probably be steeper than the

            observed amplitude fall-off. If you have fit in the

            high-resolution region then this should have no effect.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Not sure if it helps.</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Best,</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black" lang="EN-US">Arjen</span><span style="font-family:-webkit-standard,serif;color:black"><u></u><u></u></span></p>

        <p class="MsoNormal"><u></u> <u></u></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black"><u></u> <u></u></span></p>

        <p class="MsoNormal"><span style="font-size:10.5pt;font-family:Helvetica;color:black"><u></u> <u></u></span></p>

        <div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(181,196,223);padding:3pt 0cm 0cm">

          <p class="MsoNormal" style="margin-left:36pt"><b><span style="font-size:12pt;color:black">From:

              </span></b><span style="font-size:12pt;color:black">3dem

              <a href="mailto:3dem-bounces@ncmir.ucsd.edu" target="_blank"><3dem-bounces@ncmir.ucsd.edu></a> on behalf of Alexis

              Rohou <a href="mailto:a.rohou@gmail.com" target="_blank"><a.rohou@gmail.com></a><br>

              <b>Date: </b>Thursday, 27 August 2020 at 07:06<br>

              <b>To: </b>Carlos Oscar Sorzano <a href="mailto:coss@cnb.csic.es" target="_blank"><coss@cnb.csic.es></a><br>

              <b>Cc: </b>3dem <a href="mailto:3dem@ncmir.ucsd.edu" target="_blank"><3dem@ncmir.ucsd.edu></a><br>

              <b>Subject: </b>Re: [3dem] what is the ideal B factor?<u></u><u></u></span></p>

        </div>

        <div>

          <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

        </div>

        <div>

          <p class="MsoNormal" style="margin-left:36pt">Thank you

            Carlos Oscar for summarizing your work so succinctly!

            <u></u><u></u></p>

          <div>

            <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt">I just would

              like to pick up on your concluding sentence:<u></u><u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

          </div>

          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">

            <p class="MsoNormal" style="margin-left:36pt">In

              conclusion, from my point of view, there is not an optimal

              decay valid for all proteins, but it depends on each

              specific protein. And the shape of the decay is not a

              straight line, but arbitrary depending on its shape.<u></u><u></u></p>

          </blockquote>

          <div>

            <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt">If your

              point of view is correct, this implies that ResLog plots

              and the resulting B factors should not be compared to each

              other if they were obtained from images of different

              proteins. This would be quite a departure from the field's

              consensus. <u></u><u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt">Cheers,<u></u><u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt">Alexis<u></u><u></u></p>

          </div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

          </div>

        </div>

        <p class="MsoNormal" style="margin-left:36pt"><u></u> <u></u></p>

        <div>

          <div>

            <p class="MsoNormal" style="margin-left:36pt">On Wed, Aug

              26, 2020 at 12:32 AM Carlos Oscar Sorzano <<a href="mailto:coss@cnb.csic.es" target="_blank">coss@cnb.csic.es</a>>

              wrote:<u></u><u></u></p>

          </div>

          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">

            <div>

              <p style="margin-left:36pt">Dear Alexis and all,<u></u><u></u></p>

              <p style="margin-left:36pt">as a very condensed summary

                of what Jose Luis Vilas and us showed in the paper you

                have mentioned below is that:<u></u><u></u></p>

              <p style="margin-left:36pt">1. the Fourier spectrum of a

                single atom is expected to decay in amplitude with

                frequency (the only way it can be flat is that it is

                infinitely thin). This is well known and comes from the

                electron atomic scattering factors.<u></u><u></u></p>

              <p style="margin-left:36pt">2. the Fourier spectrum of a

                collection of atoms is mostly determined by the shape of

                that collection, more than on the specific nature of the

                atoms being involved (we performed extremely harsh

                modifications to the atoms and the decay did not change

                significantly).<u></u><u></u></p>

              <p style="margin-left:36pt">3. the reasons normally

                argued to make the spectrum flat, do not apply to

                macromolecules and the reason why B-factor sharpening

                produces "nice" structures is mostly a visualization

                reason (higher amplitudes at high frequencies result in

                sharper edges whose isosurfaces are easier to track and

                fit with an atomic model).<u></u><u></u></p>

              <p style="margin-left:36pt">Because of 2, the decay of

                the radial average of the Fourier transform of a

                macromolecule cannot be expected to follow any

                particular shape (for instance, a straight line) a

                priori, that we can estimate its slope (also a priori)

                and force our 3D reconstruction to follow that slope. In

                that regard the question of what is the expected slope

                is ill-posed. From my point of view, the amplitude

                correction is much more meaningful when performed in the

                spirit of LocScale of Jakobi and Sachse. You fit an

                atomic model to the map, then convert it into a map,

                estimate its decay and force the map to follow this

                decay. In this way, the shape of the collection of atoms

                (and their nature) is explicitly taken into account.

                This process can be performed iteratively (with the

                corrected map, you may refine the atomic model, refine

                the map amplitudes again, ...). I also like the idea

                that this process is performed locally.<u></u><u></u></p>

              <p style="margin-left:36pt">If we do not want to wait

                for the atomic model to make the amplitude correction,

                we have devised an alternative based on the local

                resolution (E. Ramirez-Aportela, J.L.Vilas, A. Glukhova,

                R. Melero, P. Conesa, M. Martinez, D. Maluenda, J. Mota,

                A. Jimenez, J. Vargas, R. Marabini, P.M. Sexton, J.M.

                Carazo, C.O.S. Sorzano. Automatic local resolution-based

                sharpening of cryo-EM maps. Bioinformatics 36: 765-772

                (2020)). There is no guarantee that it will follow the

                correct decay, but in practice we have observed that it

                normally approximates the correct decay quite closely

                (there are some examples of this in the paper).<u></u><u></u></p>

              <p style="margin-left:36pt">The procedure above of local

                correction based on local resolution is local and it

                does not require an the atomic model. If we still want

                to do a global correction without an atomic model,

                procedures like the one of phenix (<a href="https://urldefense.com/v3/__https:/journals.iucr.org/d/issues/2018/06/00/ic5102/ic5102.pdf__;!!Mih3wA!Rn8ptfdW3i-ZJAsyJRtf84MeTMX0gbKZxAVhSD_WbmT6PxiAn3RA49bpffFudf71Mg$" target="_blank">https://journals.iucr.org/d/issues/2018/06/00/ic5102/ic5102.pdf</a>)

                provides some clue based on the maximum continuity of

                the isosurface.<u></u><u></u></p>

              <p style="margin-left:36pt">Finally, we found that a

                combination of DeepRes (E. Ramírez-Aportela, J. Mota, P.

                Conesa, J.M. Carazo, C.O.S. Sorzano. DeepRes: A New Deep

                Learning and aspect-based Local Resolution Method for

                Electron Microscopy Maps . IUCR J 6: 1054-1063 (2019))

                and BlocRes (<a href="https://urldefense.com/v3/__https:/www.sciencedirect.com/science/article/pii/S1047847713002086__;!!Mih3wA!Rn8ptfdW3i-ZJAsyJRtf84MeTMX0gbKZxAVhSD_WbmT6PxiAn3RA49bpffHaOthRoQ$" target="_blank">https://www.sciencedirect.com/science/article/pii/S1047847713002086</a>)

                could help to find a B-factor that does not result in

                overfitting.<u></u><u></u></p>

              <p style="margin-left:36pt">In conclusion, from my point

                of view, there is not an optimal decay valid for all

                proteins, but it depends on each specific protein. And

                the shape of the decay is not a straight line, but

                arbitrary depending on its shape.<u></u><u></u></p>

              <p style="margin-left:36pt">I hope these reflections

                helped a bit.<u></u><u></u></p>

              <p style="margin-left:36pt">Cheers, Carlos Oscar<u></u><u></u></p>

              <div>

                <p class="MsoNormal" style="margin-left:36pt">On

                  8/26/20 7:05 AM, Alexis Rohou wrote:<u></u><u></u></p>

              </div>

              <blockquote style="margin-top:5pt;margin-bottom:5pt">

                <div>

                  <p class="MsoNormal" style="margin-left:36pt">Dear

                    colleagues,<br>

                    <br>

                    I hope you may be able to help me get my head around

                    something.<br>

                    <br>

                    When considering the radially-averaged amplitudes of

                    an ideal 3D protein structure, the expectation (as

                    laid out in Fig1 of Rosenthal & Henderson, 2003

                    (PMID: 14568533), among others) is that in the

                    Wilson-statistics regime (q > 0.1 Å^-1, let’s

                    say), amplitudes will decay in a Gaussian manner, or

                    linearly when plotted on a log scale against q^2,

                    reflecting the decay of structure factors.

                    <u></u><u></u></p>

                  <div>

                    <p class="MsoNormal" style="margin-left:36pt"><br>

                      This expectation is certainly met when simulating

                      maps from PDB files, as described nicely for

                      example by Carlos Oscar Sorzano and colleagues

                      recently (Vilas et al., 2020, PMID: 31911170).

                      Let’s call the rate of decay of this ideal curve

                      B_ideal, the “ideal” B factor.<br>

                      <br>

                      Assuming for a moment that noise has a flat

                      spectrum (reasonable so long as shot noise is

                      dominant), one may follow in Rosenthal &

                      Henderson’s footsteps and draw a horizontal line

                      on our plot to represent the noise floor. As more

                      averaging is carried out, the noise floor is

                      lowered relative to our protein’s amplitude

                      profile. As more particles are averaged (without

                      error, let’s say) the intersection between the

                      protein’s ideal radial amplitude profile and the

                      noise floor moves to higher and higher

                      frequencies.<br>

                      <br>

                      This is the basis for the so-called ResLog plots,

                      where one charts the resolution as a function of

                      the number of averaged particles. The slope of the

                      ResLog plot is related to the slope of the radial

                      amplitude profile of the protein. Assuming no

                      additional sources of errors (i.e. ideal

                      instrument and no processing errors), B_ideal (the

                      slope of the ideal protein amplitude profile) can

                      be computed from the slope of the ResLog plot via

                      B_ideal = 2.0/slope.<br>

                      <br>

                      Now, to my question. By looking at the slope of a

                      schematic Guinier plot generated using Wilson

                      statistics and atomic scattering factors for

                      electrons, I estimated a B_ideal of approximately

                      50 Å^2 (decay of ~ 1.37 natural log in amplitude

                      over 0.1 Å^-2). The problem is that recent

                      high-resolution studies have reported

                      ResLog-estimated B factors of 32.5 Å^2 (Nakane et

                      al., 2020) and 36 Å^2 (Yip et al., 2020), leading

                      me to wonder what is wrong in the above model.<u></u><u></u></p>

                  </div>

                  <div>

                    <p class="MsoNormal" style="margin-left:36pt"><br>

                      I see several possibilities:<br>

                      <br>

                      (1)   B_ideal is actually significantly less than

                      50 Å^2. This would be consistent with the

                      empirical observation that “flattening” maps’

                      amplitude spectrum (i.e. assuming B-ideal = 0 Å^2)

                      gives very nice maps. Either:<u></u><u></u></p>

                  </div>

                  <blockquote style="margin-left:30pt;margin-right:0cm">

                    <div>

                      <p class="MsoNormal" style="margin-left:36pt">a.

                            I mis-estimated B_ideal when reading the

                        simulated amplitude spectrum plot. Has anyone

                        done this (i.e. fit a B factor to a simulated

                        map’s amplitude spectrum, or to a simulated

                        spectrum)? What did you find?<u></u><u></u></p>

                    </div>

                  </blockquote>

                  <blockquote style="margin-left:30pt;margin-right:0cm">

                    <div>

                      <p class="MsoNormal" style="margin-left:36pt">b.

                            The simulations using atomic scattering

                        factors and Wilson statistics do not correctly

                        capture the actual amplitude profile of

                        proteins, which is actually much flatter than

                        the atomic scattering factors suggest.<u></u><u></u></p>

                    </div>

                  </blockquote>

                  <div>

                    <p class="MsoNormal" style="margin-left:36pt">(2)

                        B_ideal actually is ~ 50 Å^2, but the assumption

                      of a flat noise spectrum is wrong. I guess that if

                      the true noise spectrum were also decaying at a

                      function of q^2, this would cause the ResLog plot

                      to report “too small” a B factor<u></u><u></u></p>

                  </div>

                  <div>

                    <p class="MsoNormal" style="margin-left:36pt"><br>

                      What do you think? <br>

                      <br>

                      Cheers,<br>

                      Alexis<u></u><u></u></p>

                  </div>

                </div>

                <p class="MsoNormal" style="margin-left:36pt"><br>

                  <br>

                  <u></u><u></u></p>

                <pre style="margin-left:36pt">_______________________________________________<u></u><u></u></pre>

                <pre style="margin-left:36pt">3dem mailing list<u></u><u></u></pre>

                <pre style="margin-left:36pt"><a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a><u></u><u></u></pre>

                <pre style="margin-left:36pt"><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=D1ILr-LNOMGqt1qo-zsdtVsmr4I50KRZdcn6bv1MFNw&m=cIQ9kDZ2XD1GyeGVoYRZijEGtEi7t5R_FnZ8g1qbHl0&s=geo8EXc_Gnz8XpAmgRX12qTby_h7HIC3ar4LnxfFpSY&e=" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a><u></u><u></u></pre>

              </blockquote>

            </div>

          </blockquote>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <pre>_______________________________________________

3dem mailing list

<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a>

<a href="https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a>

</pre>

    </blockquote>

  </div>

_______________________________________________<br>

3dem mailing list<br>

<a href="mailto:3dem@ncmir.ucsd.edu" target="_blank">3dem@ncmir.ucsd.edu</a><br>

<a href="https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem" rel="noreferrer" target="_blank">https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem</a><br>

</blockquote></div>