Dataset Open Access

proteinNet3D

Li, Rui; Yushkevich, Artsemi; Kudryashev, Misha; Yakimovich, Artur


JSON-LD (schema.org) Export

{
  "description": "<p>ProteinNet3D is a curated large-scale dataset of 3D macromolecular density volumes designed to support representation learning and benchmarking in structural biology. The dataset is derived from the publicly available Electron Microscopy Data Bank (EMDB), a comprehensive repository of experimentally determined cryo-electron microscopy (cryo-EM) maps spanning diverse macromolecules, molecular assemblies, and subcellular structures.</p>\n\n<p>ProteinNet3D focuses specifically on individual macromolecules resolved by single-particle analysis (SPA) or subtomogram averaging (STA), ensuring methodological consistency across samples. To emphasize biologically meaningful structures while avoiding extreme cases, entries were restricted to a molecular weight range of 100&ndash;1500 kDa. This criterion excludes small domains and excessively large complexes, resulting in a dataset well-suited for learning size-robust structural representations.</p>\n\n<p>All volumes are standardized through isotropic resampling, spatial normalization to a fixed grid (64&sup3; voxels), and intensity normalization to zero mean and unit variance. Background regions are masked using annotated contour levels to reduce noise contributions. To enhance diversity and rotational invariance, each structure is augmented with multiple random 3D rotations.</p>\n\n<p>Overall, ProteinNet3D comprises 26,110 processed samples and captures substantial structural heterogeneity, experimental variability, and realistic noise characteristics, making it a rigorous benchmark for 3D deep learning in cryo-EM.</p>", 
  "@id": "https://doi.org/10.14278/rodare.4516", 
  "sameAs": [
    "https://www.hzdr.de/publications/Publ-43018"
  ], 
  "@type": "Dataset", 
  "creator": [
    {
      "name": "Li, Rui", 
      "@type": "Person", 
      "@id": "https://orcid.org/0000-0002-3085-5267", 
      "affiliation": "Center for Advanced Systems Understanding"
    }, 
    {
      "name": "Yushkevich, Artsemi", 
      "@type": "Person", 
      "@id": "https://orcid.org/0000-0002-8729-9281", 
      "affiliation": "Max Delbr\u00fcck Center for Molecular Medicine"
    }, 
    {
      "name": "Kudryashev, Misha", 
      "@type": "Person", 
      "@id": "https://orcid.org/0000-0003-3550-6274", 
      "affiliation": "Max Delbr\u00fcck Center for Molecular Medicine"
    }, 
    {
      "name": "Yakimovich, Artur", 
      "@type": "Person", 
      "@id": "https://orcid.org/0000-0003-2458-4904", 
      "affiliation": "Center for Advanced Systems Understanding"
    }
  ], 
  "datePublished": "2026-02-18", 
  "identifier": "https://doi.org/10.14278/rodare.4516", 
  "keywords": [
    "cryo-EM", 
    "deep learning", 
    "proteins", 
    "EMDB"
  ], 
  "name": "proteinNet3D", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "@type": "DataDownload", 
      "fileFormat": "npz", 
      "contentUrl": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f/test_202212.npz"
    }, 
    {
      "@type": "DataDownload", 
      "fileFormat": "npz", 
      "contentUrl": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f/train_202212.npz"
    }, 
    {
      "@type": "DataDownload", 
      "fileFormat": "npz", 
      "contentUrl": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f/val_202212.npz"
    }
  ], 
  "url": "https://rodare.hzdr.de/record/4516"
}
40
8
views
downloads
All versions This version
Views 4040
Downloads 88
Data volume 73.0 GB73.0 GB
Unique views 3535
Unique downloads 77

Share

Cite as