Dataset Open Access
Li, Rui;
Yushkevich, Artsemi;
Kudryashev, Misha;
Yakimovich, Artur
{
"owners": [
1078
],
"links": {
"badge": "https://rodare.hzdr.de/badge/doi/10.14278/rodare.4516.svg",
"doi": "https://doi.org/10.14278/rodare.4516",
"conceptbadge": "https://rodare.hzdr.de/badge/doi/10.14278/rodare.4515.svg",
"conceptdoi": "https://doi.org/10.14278/rodare.4515",
"bucket": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f",
"html": "https://rodare.hzdr.de/record/4516",
"latest": "https://rodare.hzdr.de/api/records/4516",
"latest_html": "https://rodare.hzdr.de/record/4516"
},
"stats": {
"volume": 72977746720.0,
"unique_downloads": 7.0,
"version_unique_downloads": 7.0,
"unique_views": 35.0,
"downloads": 8.0,
"version_unique_views": 35.0,
"version_views": 40.0,
"version_downloads": 8.0,
"version_volume": 72977746720.0,
"views": 40.0
},
"doi": "10.14278/rodare.4516",
"metadata": {
"pub_id": "43018",
"title": "proteinNet3D",
"creators": [
{
"orcid": "0000-0002-3085-5267",
"name": "Li, Rui",
"affiliation": "Center for Advanced Systems Understanding"
},
{
"orcid": "0000-0002-8729-9281",
"name": "Yushkevich, Artsemi",
"affiliation": "Max Delbr\u00fcck Center for Molecular Medicine"
},
{
"orcid": "0000-0003-3550-6274",
"name": "Kudryashev, Misha",
"affiliation": "Max Delbr\u00fcck Center for Molecular Medicine"
},
{
"orcid": "0000-0003-2458-4904",
"name": "Yakimovich, Artur",
"affiliation": "Center for Advanced Systems Understanding"
}
],
"communities": [
{
"id": "health"
},
{
"id": "rodare"
}
],
"relations": {
"version": [
{
"is_last": true,
"count": 1,
"index": 0,
"parent": {
"pid_value": "4515",
"pid_type": "recid"
},
"last_child": {
"pid_value": "4516",
"pid_type": "recid"
}
}
]
},
"doi": "10.14278/rodare.4516",
"access_right_category": "success",
"publication_date": "2026-02-18",
"related_identifiers": [
{
"scheme": "url",
"identifier": "https://www.hzdr.de/publications/Publ-43018",
"relation": "isIdenticalTo"
},
{
"scheme": "doi",
"identifier": "10.14278/rodare.4515",
"relation": "isVersionOf"
}
],
"access_right": "open",
"description": "<p>ProteinNet3D is a curated large-scale dataset of 3D macromolecular density volumes designed to support representation learning and benchmarking in structural biology. The dataset is derived from the publicly available Electron Microscopy Data Bank (EMDB), a comprehensive repository of experimentally determined cryo-electron microscopy (cryo-EM) maps spanning diverse macromolecules, molecular assemblies, and subcellular structures.</p>\n\n<p>ProteinNet3D focuses specifically on individual macromolecules resolved by single-particle analysis (SPA) or subtomogram averaging (STA), ensuring methodological consistency across samples. To emphasize biologically meaningful structures while avoiding extreme cases, entries were restricted to a molecular weight range of 100–1500 kDa. This criterion excludes small domains and excessively large complexes, resulting in a dataset well-suited for learning size-robust structural representations.</p>\n\n<p>All volumes are standardized through isotropic resampling, spatial normalization to a fixed grid (64³ voxels), and intensity normalization to zero mean and unit variance. Background regions are masked using annotated contour levels to reduce noise contributions. To enhance diversity and rotational invariance, each structure is augmented with multiple random 3D rotations.</p>\n\n<p>Overall, ProteinNet3D comprises 26,110 processed samples and captures substantial structural heterogeneity, experimental variability, and realistic noise characteristics, making it a rigorous benchmark for 3D deep learning in cryo-EM.</p>",
"keywords": [
"cryo-EM",
"deep learning",
"proteins",
"EMDB"
],
"license": {
"id": "CC-BY-4.0"
},
"doc_id": "1",
"resource_type": {
"title": "Dataset",
"type": "dataset"
}
},
"conceptdoi": "10.14278/rodare.4515",
"created": "2026-02-18T10:16:42.960521+00:00",
"revision": 4,
"updated": "2026-02-23T08:29:23.687742+00:00",
"files": [
{
"type": "npz",
"key": "test_202212.npz",
"links": {
"self": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f/test_202212.npz"
},
"checksum": "md5:c2ca2945e73a5eaa970940806fc542af",
"bucket": "8b333ff7-eb02-409c-a8a5-bf754a9fc52f",
"size": 2516582756
},
{
"type": "npz",
"key": "train_202212.npz",
"links": {
"self": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f/train_202212.npz"
},
"checksum": "md5:adb8edf49732c0f661a146589b53552b",
"bucket": "8b333ff7-eb02-409c-a8a5-bf754a9fc52f",
"size": 20131610980
},
{
"type": "npz",
"key": "val_202212.npz",
"links": {
"self": "https://rodare.hzdr.de/api/files/8b333ff7-eb02-409c-a8a5-bf754a9fc52f/val_202212.npz"
},
"checksum": "md5:45de8755b7ddbb4ccbaffd5cda4c6615",
"bucket": "8b333ff7-eb02-409c-a8a5-bf754a9fc52f",
"size": 2516582756
}
],
"id": 4516,
"conceptrecid": "4515"
}
| All versions | This version | |
|---|---|---|
| Views | 40 | 40 |
| Downloads | 8 | 8 |
| Data volume | 73.0 GB | 73.0 GB |
| Unique views | 35 | 35 |
| Unique downloads | 7 | 7 |