[3dem] [ccpem] K-means clustering algorithm

Penczek, Pawel A Pawel.A.Penczek at uth.tmc.edu
Fri Aug 18 12:47:16 PDT 2017


Dear Leonardo,

you are correct.  Somehow the effort to make the text as brief as possible resulted
in elimination of some definitions.

CCC means correlation coefficient, that is a number that reflects similarity of two
images computed without changing their orientations.

ccf means cross-correlation function, i.e., so to say a collection of correlation coefficients computed
for various orientations of an image with respect to the template.  Further, the cryptic
text in Figure 3 “Calculation of 2D ccfs (psi, s_x, s_y)” is meant to say that
crosscorrelation functions are computed between 2D image with respect to a set of possible
angles and shifts (translations) with respect to a template, so of course resulting ccfs are 3D, i.e., functions of three
variables: in-plane angle and two translations.

As for exact details of RELION implementation, they would be better explained by the author.
However, as we argue in the Supplement of the Primer, ML-based clustering can be seen as a generalization
of simple K-means algorithm or, vice versa, K-means as a simplification of ML method.
The cited reference (it is available on the net and can be read in a browser):
MacKay, D.J.C. (2003). Information Theory, Inference and Learning Algorithms
(Cambridge, Cambridge University Press).
gives very good and accessible introduction to clustering, mixture of gaussian approach and other
related matters and in particular contains a discussion why some ML-based methods
result in possible numerical instabilities.

Regards,
—
Pawel A. Penczek, Ph.D.
Professor
Structural Biology Imaging Center, Director
The University of Texas
phone: 713-500-5416
fax: 713-500-0652
https://med.uth.edu/bmb/faculty/pawel-a-penczek/





On 18/08/2017 08:05, Leonardo Feletto wrote:recently
Dear all,

I am a newbie in using image analysis softwares and in studying image analysis in general. I was reading this review

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC4409659_&d=DwIGaQ&c=6vgNTiRn9_pqCD9hKx9JgXN1VapJQ8JVoF8oWH1AgfQ&r=vDDf9rsFxPMXm8JgJa6hc4B9V4qKr7wftnDkLIRdshI&m=t9ZXDtQrl9QMgRm6rtu3KOmfgGfANoF2gy-Y85iHp8o&s=6H13u1mmxerE4Uy16QwLju4qdAqRxSp0wfV08DXhHAU&e=
to better understand the k-means clustering algorithm, which underlies many processes in image analysis. I had some troubles to properly figure out something in this figure:

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC4409659_figure_F3_&d=DwIGaQ&c=6vgNTiRn9_pqCD9hKx9JgXN1VapJQ8JVoF8oWH1AgfQ&r=vDDf9rsFxPMXm8JgJa6hc4B9V4qKr7wftnDkLIRdshI&m=t9ZXDtQrl9QMgRm6rtu3KOmfgGfANoF2gy-Y85iHp8o&s=QBc07-QzTd_Upp0zfhK0Sjl83ZwWDRjrSHi5n7wSN2Y&e=
I cannot understand the meaning of CCCs and ccfs abbreviations used to describe the processes taking place while k-means clustering algorithm is running. In addition, I was interested to understand whether 3D classification in RELION relies on a variation of this process too, integrating bayesian approach to correctly split the particles dataset in the different models eventually produced.

I hope someone could answer my questions,

Best regards


Leonardo


--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20170818/0bd29641/attachment.html>


More information about the 3dem mailing list