[3dem] CryoPPP - A Large Expert-Labelled Cryo-EM Image Dataset for Machine Learning Protein Particle Picking

liguo Wang wanguw11 at gmail.com
Mon Mar 27 08:23:49 PDT 2023

Dear colleagues,

We are pleased to announce the newly launched cryo-EM single particle dataset for training of AI-based particle picking programs at https://urldefense.com/v3/__https://github.com/BioinfoMachineLearning/cryoppp__;!!Mih3wA!Ev9r7Q08AuARhV0gDPNrhF8qiGUXoL8V6ZUqVrSQv8zTcZXZlc4ikxrPUkD-7VSS0iaBwGsf1zAoE-ZGMRI$ .

As we know, particle picking in single particle analysis is a critical step in reconstructing protein structures and machine learning is a promising approach to automating protein particle picking. However, the development of machine learning methods is hindered by the lack of large, high-quality, manually labelled training data. To address the bottleneck, CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for single protein particle picking and analysis was created. It consists of 9,089 manually labelled cryo-EM micrographs of 32 non-redundant, representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The coordinates of protein particles in the diverse, high-resolution micrographs were labelled by human experts. The protein particle labelling process was rigorously validated by both 2D particle class validation and 3D density map validation with the gold standard. CryoPPP is several times larger than any other particle dataset in the field and is expected to greatly facilitate the training and test of machine learning and artificial intelligence methods for automated cryo-EM protein particle picking.

Please see the manuscript for details at https://urldefense.com/v3/__https://www.biorxiv.org/content/10.1101/2023.02.21.529443v1__;!!Mih3wA!Ev9r7Q08AuARhV0gDPNrhF8qiGUXoL8V6ZUqVrSQv8zTcZXZlc4ikxrPUkD-7VSS0iaBwGsf1zAoYxO3rAg$  and try the new dataset at https://urldefense.com/v3/__https://github.com/BioinfoMachineLearning/cryoppp__;!!Mih3wA!Ev9r7Q08AuARhV0gDPNrhF8qiGUXoL8V6ZUqVrSQv8zTcZXZlc4ikxrPUkD-7VSS0iaBwGsf1zAoE-ZGMRI$ .


Liguo Wang and Jianlin Cheng

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ncmir.ucsd.edu/pipermail/3dem/attachments/20230327/c8a74e7a/attachment.html>

More information about the 3dem mailing list