Dataset Open Access
Blangiardi, Francesco;
Ratliff, Hunter;
Kögler, Toni
Introduction
This dataset corresponds to the simulation data used within AI methods in _"Fast proton transport and neutron production in proton therapy using Fourier neural operators"_ [CITE]. It has been extracted from the corresponding PHITS dataset [1] related to the same work, and is used by the codebase provided in [2] implementing all important AI methods within the paper.
The purpose of this entry is to provide a more easily accessible version of the data in [2] ready to be used for AI applications. The size of the dataset has been greatly reduced, and put into a format allowing the access of the phase space density at each individual depth in the phantom for both protons and neutrons and in the form of discretized histograms.
A concise description of the simulation setup is provided in [2] please refer to the paper for detailed discussion, description, analysis, and further results derived from this dataset.
General information
The phase space density data is divided into discretized histograms as defined in the related paper. This follows the approximation within said paper where only 4 dimensions are kept, related to the depth, radial distance (R), energy (E) and azimuthal divergence (θ) of the particles. The depth dimension is considered as a pseudo-time dimension, meaning that time is not provided within the data. In order to simulate examples of different beams propagatng through different materials, a total of 47 phantoms have been simulated, each with a unique starting energy. Phantoms have been divided into slabs along the depth dimension which are assumed to be of homogeneous material along the dimensions perpendicular to the beam axis, but are composed of different materials among them. The proton density is provided as the Monte Carlo simulated protons appropriately binned into the defined discretizations whenever one of the surfaces of each slab is crossed. When it comes to the neutron phase space density, this is instead provided as the angle, energy and radius distributions of secondary neutrons produced within each slab. Both densities are to be considered as integrated with respect to time. For each slab, also the energy deposited by the proton is provided, coming as an energy deposition probability distribution along E and R. Moreover, each of the 47 phantoms has been irradiated according to three different sets of treatment head paramenter, leading to the creation of three dataset: ES8, ES9 and NES8. For the sake of reproducibility, weights for each of the models discussed in [2] are also provided.
Parametrization
The densities are observed through discretizations as identified in the paper. Within this work, the resolution along the beam depth is fixed to 0.5mm, the energy resolution is set to 1 and 2 MeV for the proton and neutron fluences respectively, while the radial distance and angle is handled differently among the two particles. For protons these are discretized in logarithmically spaced bins, with the first bin also comprising 0, and ranging up to 95.9 mm and 58.76 ° respectively. Instead, for neutrons both dimensions are uniformly discretized, ranging from 0 up to 60 mm and 180 ° respectively. The R, E and θ dimensions are divided into 30x250x30 bins within the proton data, and into 30x125x30 in the case of the neutrons, which are provided at each discretized depth. Data about energy deposition follows the same radial binning as in the case of the proton density, but the energy binning is instead logarithmic ranging from 1.0e-3 up to 97.7 MeV.
As already mentioned, the ES8, ES9 and NES8 datasets differ in terms of the treatment head parameters. More details about the specifics of each dataset can be found in [1]. As ES8 and ES9 share the same treatment head parameters with the exception of the intensity, the proton density is not provided for the ES9 dataset to limit storage size.
Model weights for each surrogate trained on each of the provided datasets (called MES8, MES9 and MNES8) are also provided, abiding to the surrogate structure defined in [2]. In particular, each surrogate is composed of a proton and neutron model for both density and intensity prediction. Models can be used as detailed in the GitHub repository [3] related to [2].
File description
Both the aforementioned density discretizations are named internally as "phits_logfull" and "hn_phits" for the proton and neutrons respectively, with the energy deposition one following the same convention as the protons. All files contained within this datasets are therefore named according to the discretizations as either "phits_logfull_cube_protons_\<depth in millimeters\>_data.nc", "phits_logfull_cube_dose_\<depth in millimeters\>_data.nc" or "hn_phits_cube_neutrons_\<depth in millimeters\>_data.nc". Each nc file contains an `xarray` variables, containing the MC-approximated histogram, details of the discretization, as well as important parameters such as the CT number of the considered slab, its density and the material's ID within the PHITS environment.
Surrogates are provided in separate .zip files. Each surrogate contains 4 subfolders related to each surrogate component. The PDF components come in the form of pytorch checkpoints encapsulating Fourier Neural Operator models defined through package `neuraloperator` [4] [5] with version 0.3.0. Intensity components are instead .pickle files containing XGBoostRegressor objects defined through package `XGBoost` [6]. Each component also comes with a pickled dictionary containing important metadata related to model hyperparameters.
Folder Structure
The provided data consists of three different .zip files, each related to the ES8, ES9 and the NES8 datasets. Each .zip file comes already divided within the train, validation and test split on the basis of the starting energy. Within each split folder, simulations are represented through folders named in the format "\<Starting Energy\>MeV_05mm_800layers, and each contain the related proton and neutron fluences in files with the previously specified naming convention.
It should be noted that, although the total size of the proposed dataset is of around 7GB, uncompressing the files requires a total size of 180.2 GB.
References
[1] H. N. Ratliff, F. Blangiardi, PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in hetero-geneous 1D tissue phantoms, Rodare, (in preparation for submission)(2025).
[2] "Fast proton transport and neutron production in proton therapy using Fourier neural operators" (to be filled)
[3] Blangiardi, F. (2025). AI_phase_space_PT [Computer software]. GitHub. [https://github.com/f-blan/AI_phase_space_PT](https://github.com/f-blan/AI_phase_space_PT)
[4] J. Kossaifi, N. Kovachki, Z. Li, D. Pitt, M. Liu-Schiaffini, R. J. George, B. Bonev, K. Azizzadenesheli, J. Berner, A. Anandkumar, A library for learning neural operators (2024). arXiv:2412.10354.
[5] N. B. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces, CoRR abs/2108.08481 (2021).
[6] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, ACM, 2016, p. 785–794. doi:10.1145/2939672.2939785. URL http://dx.doi.org/10.1145/2939672.2939785
Acknowledgements
The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester has received funding from UK Research and Innovation under grant agreement No. 10102118
| Name | Size | |
|---|---|---|
|
ES8.zip
md5:346d0698efd23fa1fb549c435529b80c |
4.6 GB | Download |
|
ES9.zip
md5:407589760a67b72dd42a8397a68428e4 |
431.6 MB | Download |
|
MES8.zip
md5:09c023d96f75da63c2cc7b806d6d7982 |
52.4 MB | Download |
|
MES9.zip
md5:6d3a2392e94e8dfad1425fb0f2e36fa6 |
53.1 MB | Download |
|
MNES8.zip
md5:df16efcaabe875815ba0893a78d9d404 |
52.4 MB | Download |
|
NES8.zip
md5:d18243d164c1c4e5bc9624593d1a9a54 |
3.8 GB | Download |
|
README.md
md5:0cc28a25ac5438454ce7f19a841f6469 |
8.6 kB | Download |
F. Blangiardi, AI_phase_space_PT (2025). GitHub.(https://github.com/f-blan/AI_phase_space_PT)
H. N. Ratliff, F. Blangiardi, PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in hetero-geneous 1D tissue phantoms (in preparation for submission, 2025), Rodare.
J. Kossaifi, N. Kovachki, Z. Li, D. Pitt, M. Liu-Schiaffini, R. J. George, B. Bonev, K. Azizzadenesheli, J. Berner, A. Anandkumar, A library for learning neural operators (2024). arXiv:2412.10354.
N. B. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces, CoRR abs/2108.08481 (2021).
T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, ACM, 2016, p. 785–794. doi:10.1145/2939672.2939785. URL http://dx.doi.org/10.1145/2939672.2939785
| All versions | This version | |
|---|---|---|
| Views | 0 | 0 |
| Downloads | 0 | 0 |
| Data volume | 0 Bytes | 0 Bytes |
| Unique views | 0 | 0 |
| Unique downloads | 0 | 0 |