<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="/static/xsl/oai2.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-05-09T22:17:00Z</responseDate>
  <request metadataPrefix="oai_dc" verb="ListRecords" set="user-novo">https://rodare.hzdr.de/oai2d</request>
  <ListRecords>
    <record>
      <header>
        <identifier>oai:rodare.hzdr.de:4444</identifier>
        <datestamp>2026-02-23T08:29:00Z</datestamp>
        <setSpec>openaire_data</setSpec>
        <setSpec>user-novo</setSpec>
        <setSpec>user-rodare</setSpec>
        <setSpec>user-health</setSpec>
        <setSpec>user-oncoray</setSpec>
        <setSpec>user-hzdr</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:creator>Müller, Sara Tabea</dc:creator>
          <dc:creator>Akgun, Bora</dc:creator>
          <dc:creator>Bekkevoll, Anna</dc:creator>
          <dc:creator>Blorstad Thu, Sander</dc:creator>
          <dc:creator>Engebertsen, Anders</dc:creator>
          <dc:creator>Jagt, Thyrza</dc:creator>
          <dc:creator>Pausch, Guntram</dc:creator>
          <dc:creator>Phan, Than Binh</dc:creator>
          <dc:creator>Ratliff, Hunter</dc:creator>
          <dc:creator>Römer, Katja</dc:creator>
          <dc:creator>Smeland Ytre-Hauge, Kristian</dc:creator>
          <dc:creator>Stokkevag, Camilla</dc:creator>
          <dc:creator>Tarakoglu, Engin</dc:creator>
          <dc:creator>Turko, Joseph</dc:creator>
          <dc:creator>Wolf, Andreas</dc:creator>
          <dc:creator>Yazici, Berkay</dc:creator>
          <dc:creator>Meric, Ilker</dc:creator>
          <dc:creator>Kögler, Toni</dc:creator>
          <dc:date>2026-01-22</dc:date>
          <dc:description>This data set contains the experimental raw data of the NOVO compact detector array (NOVCoDA) from the measurement campaign at OncoRay Dresden, Germany in December 2025. This experiment is the first test of the NOVCoDA prototype at a clinical proton beam. The aim of the measurement campaign was to characterize the response behavior of the scintillators used under high-energy neutron irradiation (especially the pulse-shape discrimination behavior), as well as to test the imaging, range-shift, and rate-processing capabilities of the system.

Setup:

Measurements 01.12.-09.12.:  miniNOVO (version 5): The prototype consists of 12 organic scintillator elements (6 × M600 and 6 × organic glas scintillator) of the dimensions 12×12×140 mm3

Measurements 10.12.-12.12.:  miniNOVO (version 5.1): The prototype consists of 14 organic scintillator elements (7 × M600 and 7 × organic glas scintillator) of the dimensions 12×12×140 mm3

The scintillator bars have dual readout composed of


	2 × Hamamatsu R7378A (1’’) PMTs1,
	4 × Hamamatsu S14161-3050HS-04 SiPM1 + U3012 (+ custom front-end electronics) (only 2 × for miniNOVO version 5) and
	8 × Hamamatsu R2059-01 (2’’) PMTs1.


The data was recorded with 2 CAEN V1730S3 14-bit, 16-channel digitizers (named dta and dtb) with a sampling frequency of 425.216 MS/s.

The detector array was placed at 90° w.r.t. to the fixed-beam research beam line of the Dresden proton therapy facility at OncoRay, Dresden. A cylindrical PMMA (solid/with air gap/with bone insert) was placed centrally in front of the detector head and irradiated with proton energies from 75-225 MeV and varying currents between 10-2000 pA at various positions (± 180 mm w.r.t. central position).

In addition measurements with the online-adaptive RAPTOR phantom in different configurations (air insert/bone insert/swelling/no swelling) were executed.

Data structure:

The directory DOI_calibration contains the position calibration measurements with a Sr-90 source. Energy_calibration holds the energy calibration measurements with a Na-22 and Cs-137 source. In efficiency_measurement the measurements with a Na-22 source at phantom position (with and without PMMA phantom) can be found. PMMA_phantom is dedicated to all the beam measurements with the cylindrical phantom (with and without various inserts) while the directory online_adaptive_phantom provides the same for the measurements with the RAPTOR phantom. All measurements for which waveforms were recorded are stored in waveforms and backend_comparison is comprised of repeat measurements with the cylindrical PMMA phantom where one detector (dtb, ch2 and ch3) was connected to an alternative back-end system for comparison. All other measurements and test runs are in the tests folder.

The PDF-files 2025-12_ NOVO-first-proton-facility-tests-PGTV-Wiki.pdf and 2025-12_ NOVO-first-proton-facility-tests-Week-2-PGTV-Wiki.pdf hold information about the setup of the experiment and and more details about the individual measurements (elog). The file 2025-12_ NOVO-first-proton-facility-tests-Run-List-PGTV-Wiki.pdf contains the run list with all parameters for each measurement.

In 2025-12_OncoRay_HEBC_Monitor_Data.zip csv-files with the beam control meta data can be found (one file for each measurement day).

The main configuration file for the digitizers is called template_main.cfg.

Data Format:

All data is saved in root files which each contain two root trees, one for each digitizer, named “dta” and “dtb”. The trees hold the following information in the form of listmode data for each event: digitizer channel ("channel"), charge integrated over long gate ("Elong"), charge integrated over short gate ("Eshort"), digitizer flags ("flags") and the timestamp (separated in three parts: "timestamp", "timestampExtended", "time"). Additionally, the root files also contain an TArrayD which denotes the start time of the measurement in UNIX time at its first index and the stop time at its second.

There are two configuration files for each data file (named “filename_dtx.config”), one for each digitizer card. These text files contain the information about the digitizer settings for each run.

[1] Hamamatsu Photonics Deutschland GmbH, Arzbergerstr. 10, 82211 Herrsching am Ammersee, Germany.

[2] Target Systemelektronik, Heinz-Fangman-Straße 4, 42287 Wuppertal, Germany. 

[3] CAEN S.p.A., Via Vetraia 11, 55049 Viareggio (LU), Italy.</dc:description>
          <dc:description>The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester have received funding from UK Research and Innovation under grant agreement No. 10102118</dc:description>
          <dc:identifier>https://rodare.hzdr.de/record/4444</dc:identifier>
          <dc:identifier>10.14278/rodare.4444</dc:identifier>
          <dc:identifier>oai:rodare.hzdr.de:4444</dc:identifier>
          <dc:language>eng</dc:language>
          <dc:relation>url:https://www.hzdr.de/publications/Publ-43021</dc:relation>
          <dc:relation>doi:10.14278/rodare.4443</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/health</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/hzdr</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/novo</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/oncoray</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
          <dc:rights>info:eu-repo/semantics/restrictedAccess</dc:rights>
          <dc:subject>NOVO</dc:subject>
          <dc:subject>Neutron imaging</dc:subject>
          <dc:subject>Prompt gamma ray imaging</dc:subject>
          <dc:subject>Dual particle imaging</dc:subject>
          <dc:subject>Range verification in proton therapy</dc:subject>
          <dc:subject>OncoRay</dc:subject>
          <dc:title>First tests of the NOVO Compact Detector Array at a Proton Facility (OncoRay)</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:rodare.hzdr.de:4128</identifier>
        <datestamp>2026-05-05T10:56:02Z</datestamp>
        <setSpec>openaire_data</setSpec>
        <setSpec>user-health</setSpec>
        <setSpec>user-novo</setSpec>
        <setSpec>user-rodare</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:creator>Blangiardi, Francesco</dc:creator>
          <dc:creator>Ratliff, Hunter</dc:creator>
          <dc:creator>Kögler, Toni</dc:creator>
          <dc:date>2025-11-14</dc:date>
          <dc:description>Introduction

This dataset corresponds to the simulation data used within AI methods in _"Fast proton transport and neutron production in proton therapy using Fourier neural operators"_ [CITE]. It has been extracted from the corresponding PHITS dataset [1] related to the same work, and is used by the codebase provided in [2] implementing all important AI methods within the paper.

The purpose of this entry is to provide a more easily accessible version of the data in [2] ready to be used for AI applications. The size of the dataset has been greatly reduced, and put into a format allowing the access of the phase space density at each individual depth in the phantom for both protons and neutrons and in the form of discretized histograms.

A concise description of the simulation setup is provided in [2] please refer to the paper for detailed discussion, description, analysis, and further results derived from this dataset.

General information

The phase space density data is divided into discretized histograms as defined in the related paper. This follows the approximation within said paper where only 4 dimensions are kept, related to the depth, radial distance (R), energy (E) and azimuthal divergence (θ) of the particles. The depth dimension is considered as a pseudo-time dimension, meaning that time is not provided within the data. In order to simulate examples of different beams propagatng through different materials, a total of 47 phantoms have been simulated, each with a unique starting energy. Phantoms have been divided into slabs along the depth dimension which are assumed to be of homogeneous material along the dimensions perpendicular to the beam axis, but are composed of different materials among them. The proton density is provided as the Monte Carlo simulated protons appropriately binned into the defined discretizations whenever one of the surfaces of each slab is crossed. When it comes to the neutron phase space density, this is instead provided as the angle, energy and radius distributions of secondary neutrons produced within each slab. Both densities are to be considered as integrated with respect to time. For each slab, also the energy deposited by the proton is provided, coming as an energy deposition probability distribution along E and R. Moreover, each of the 47 phantoms has been irradiated according to three different sets of treatment head paramenter, leading to the creation of three dataset: ES8, ES9 and NES8. For the sake of reproducibility, weights for each of the models discussed in [2] are also provided.

Parametrization

The densities are observed through discretizations as identified in the paper. Within this work, the resolution along the beam depth is fixed to 0.5mm, the energy resolution is set to 1 and 2 MeV for the proton and neutron fluences respectively, while the radial distance and angle is handled differently among the two particles. For protons these are discretized in logarithmically spaced bins, with the first bin also comprising 0, and ranging up to 95.9 mm and 58.76 ° respectively. Instead, for neutrons both dimensions are uniformly discretized, ranging from 0 up to 60 mm and 180 ° respectively. The R, E and θ dimensions are divided into 30x250x30 bins within the proton data, and into 30x125x30 in the case of the neutrons, which are provided at each discretized depth. Data about energy deposition follows the same radial binning as in the case of the proton density, but the energy binning is instead logarithmic ranging from 1.0e-3 up to 97.7 MeV.

As already mentioned, the ES8, ES9 and NES8 datasets differ in terms of the treatment head parameters. More details about the specifics of each dataset can be found in [1]. As ES8 and ES9 share the same treatment head parameters with the exception of the intensity, the proton density is not provided for the ES9 dataset to limit storage size.

Model weights for each surrogate trained on each of the provided datasets (called MES8, MES9 and MNES8) are also provided, abiding to the surrogate structure defined in [2]. In particular, each surrogate is composed of a proton and neutron model for both density and intensity prediction. Models can be used as detailed in the GitHub repository [3] related to [2].

File description

Both the aforementioned density discretizations are named internally as "phits_logfull" and "hn_phits" for the proton and neutrons respectively, with the energy deposition one following the same convention as the protons. All files contained within this datasets are therefore named according to the discretizations as either "phits_logfull_cube_protons_\&lt;depth in millimeters\&gt;_data.nc", "phits_logfull_cube_dose_\&lt;depth in millimeters\&gt;_data.nc" or "hn_phits_cube_neutrons_\&lt;depth in millimeters\&gt;_data.nc". Each nc file contains an `xarray` variables, containing the MC-approximated histogram, details of the discretization, as well as important parameters such as the CT number of the considered slab, its density and the material's ID within the PHITS environment.

Surrogates are provided in separate .zip files. Each surrogate contains 4 subfolders related to each surrogate component. The PDF components come in the form of pytorch checkpoints encapsulating Fourier Neural Operator models defined through package `neuraloperator` [4] [5] with version 0.3.0. Intensity components are instead .pickle files containing XGBoostRegressor objects defined through package `XGBoost` [6]. Each component also comes with a pickled dictionary containing important metadata related to model hyperparameters.

Folder Structure

The provided data consists of three different .zip files, each related to the ES8, ES9 and the NES8 datasets. Each .zip file comes already divided within the train, validation and test split on the basis of the starting energy. Within each split folder, simulations are represented through folders named in the format "\&lt;Starting Energy\&gt;MeV_05mm_800layers, and each contain the related proton and neutron fluences in files with the previously specified naming convention.

It should be noted that, although the total size of the proposed dataset is of around 7GB, uncompressing the files requires a total size of 180.2 GB.

References

[1] H. N. Ratliff, F. Blangiardi, PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in hetero-geneous 1D tissue phantoms, Rodare, (in preparation for submission)(2025).

[2] "Fast proton transport and neutron production in proton therapy using Fourier neural operators" (to be filled)

[3] Blangiardi, F. (2025). AI_phase_space_PT [Computer software]. GitHub. [https://github.com/f-blan/AI_phase_space_PT](https://github.com/f-blan/AI_phase_space_PT)

[4] J. Kossaifi, N. Kovachki, Z. Li, D. Pitt, M. Liu-Schiaffini, R. J. George, B. Bonev, K. Azizzadenesheli, J. Berner, A. Anandkumar, A library for learning neural operators (2024). arXiv:2412.10354.

[5] N. B. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces, CoRR abs/2108.08481 (2021).

[6] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, ACM, 2016, p. 785–794. doi:10.1145/2939672.2939785. URL http://dx.doi.org/10.1145/2939672.2939785

Acknowledgements

The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester has received funding from UK Research and Innovation under grant agreement No. 10102118</dc:description>
          <dc:identifier>https://rodare.hzdr.de/record/4128</dc:identifier>
          <dc:identifier>10.14278/rodare.4128</dc:identifier>
          <dc:identifier>oai:rodare.hzdr.de:4128</dc:identifier>
          <dc:language>eng</dc:language>
          <dc:relation>url:https://www.hzdr.de/publications/Publ-42226</dc:relation>
          <dc:relation>doi:10.14278/rodare.4127</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/health</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/novo</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
          <dc:subject>Proton Therapy</dc:subject>
          <dc:subject>Surrogate Modelling</dc:subject>
          <dc:subject>Proton Transport</dc:subject>
          <dc:subject>Neutron Production</dc:subject>
          <dc:subject>Deep Learning</dc:subject>
          <dc:subject>Neural Operators</dc:subject>
          <dc:subject>Monte Carlo</dc:subject>
          <dc:title>Proton and Neutron reduced phase space for surrogate modeling of Proton Therapy from PHITS simulations</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:rodare.hzdr.de:4526</identifier>
        <datestamp>2026-05-05T10:54:57Z</datestamp>
        <setSpec>openaire_data</setSpec>
        <setSpec>user-novo</setSpec>
        <setSpec>user-rodare</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:creator>Ratliff, Hunter</dc:creator>
          <dc:creator>Blangiardi, Francesco</dc:creator>
          <dc:creator>Kögler, Toni</dc:creator>
          <dc:date>2025-09-24</dc:date>
          <dc:description>PHITS simulations of neutron and gamma-ray production from and transport of 70--250 MeV protons in heterogeneous 1D tissue phantoms

Hunter N. Ratliff¹, Francesco Blangiardi², Toni Kögler³˒⁴

¹Department of Computer science, Electrical engineering and Mathematical sciences, Western Norway University of Applied Sciences, Inndalsveien 28, Bergen, 5063, Vestland, Norway ORCID

²Technology Methods and Systems Data Based Methods, Fraunhofer ENAS, Technologie Campus 3, Chemnitz, 09126, Saxony, Germany ORCID

³Helmholtz-Zentrum Dresden — Rossendorf, Institute of Radiooncology — OncoRay, Dresden, Germany; ⁴OncoRay — National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Helmholtz-Zentrum Dresden — Rossendorf, Dresden, Germany ORCID

Introduction

This dataset corresponds to the PHITS simulation data used in "Fast proton transport and neutron production in proton therapy using Fourier neural operators" [1]. A concise description of the simulation setup is provided here; please refer to the paper for detailed discussion, description, analysis, and further results derived from this dataset, along with additional references.

Description of simulations

This dataset consists of PHITS [2] simulations for 47 different proton energies from 70 MeV to 250 MeV incident upon different "1D" heterogeneous cylindrical phantoms (varied materials every 0.5 mm in length, uniform radially and rotationally) whose composition (materials and sequence along length) are taken from randomly sampled rays cast through a 3D CT phantom with CT number mapped to material composition and density via the HumanVoxelTable-KumamotoUniv.data conversion table within the RT-PHITS utility distributed with PHITS. Included tallies score spatial distributions of energy deposition, LET, proton current (with an additional angular dimension), neutron production, gamma-ray production, and a variety of diagnostic tallies. Event-by-event "list-mode" data is scored for neutron and gamma-ray production, called "dump" tallies in PHITS.

Given the objective of these simulations was for AI model development, the 47 energies are divided into 37 training energies (70 MeV to 250 MeV in 5 MeV steps) and 10 testing energies (73 MeV to 245.8 MeV in 19.2 MeV steps). For each energy, two simulations were ran: (1) a simulation with 1E8 (one hundred million) protons simulated where all tallies (including dump tallies) were included/enabled and (2) a simulation with 1E9 (one billion) protons simulated but with only dump tallies enabled (other tallies disabled to reduce memory consumption and increase simulation speed). Furthermore, all of the above was actually performed twice: (1) initially with purely monoenergetic beam energies and with a spatial spread of 2.5 mm and (2) a second "more realistic" set with Gaussian-distributed energies (with energy-dependent FWHM) and slightly wider 4.0 mm beam spread.

All simulation outputs were automatically processed from the plaintext and binary files produced by PHITS into compressed pickle file objects (NumPy arrays, Pandas DataFrames, dictionaries) using the PHITS Tools [3] Python utility. These Python objects were then utilized in the subsequent analysis of the paper this simulation set was generated for. The corresponding data repository used for AI model development can be found at [4].

Structure of this repository

The volume of data present in this repository is quite substantial (~700 GB available here / some TB including files only available upon request). Therefore, the repository has been structured in a way to allow flexibility in only downloading data of interest.

The root directory of this repository consists of 39 top-level directories whose names indicate their contents; each has been .tar archived and has either undergone .xz compression via xz on the tarball or with Python's LZMA compression on the tarball's contents prior to archiving. Within each are two directories: training and testing. Within each of these are directories of the format ???_MeV, where ??? is replaced by three digits specifying the nominal beam energy in MeV. (This is ???p? for the energies of the testing dataset, with p in place of a decimal point.) Thus, each training directory contains 37 subdirectories, and each testing directory contains 10 subdirectories. (One should note that there are no setup differences between training and testing data; they are simply divided here in the same way as in the paper.) Each ???_MeV/???p?_MeV directory contains simulation input/output and/or PHITS Tools processed output, depending on the top-level directory it is contained within. Input and output file names do not differ between different energies; directory structure is used to keep them distinguished/separated.

PHITS input information

One top-level directory differs from all of the others, and this is common_inputs. As the name suggests, this directory contains all PHITS input information used in generating all of the simulation outputs.

The core two PHITS input files used are beam-on-target_phits-input_MonoE.inp for the monoenergetic beam simulation set and beam-on-target_phits-input_GaussE.inp for the Gaussian-distributed beam energy simulation set. Within these inputs are lines using the PHITS insert file function infl:{*}; all inserted files used in the PHITS simulations are also contained within this common_inputs directory. The single exception to this is PARAMETERS_files-1-and-7.txt, which is simply the file(1) and file(7) PHITS [Parameters] arguments and will be system-specific paths to PHITS installation/data files. Also note that relative paths are used in the infl:{*} commands; these relative paths differ to how this repository is structured given the repository has been restructured in post for distribution convenience. File names are still unique and can be found in this common_inputs directory. The CELL subdirectory contains the [Cell] sections used for the varied phantom compositions, and the MAPPINGS_OF_ENERGY_TO_CELL_FILES.csv file details how these files are paired with the 47 different beam energies.

PHITS outputs (raw and processed)

The remaining 38 top-level directories contain simulation/processed output. When these simulations were ran, all output was contained in each ???_MeV directory. As detailed earlier, these have been split into various top-level directories here to allow more convenient download of only desired files. Nominally, each of these ???_MeV directories contained the following before being split:


	a beam-on-target_phits-input.inp PHITS input file (and a simple phits.in pointing to this input file, needed for parallel running of PHITS); note that these inputs have all specific source energy information populated within this file
	a phantom_composition_info.csv file also detailing the phantom composition used for that beam energy
	phits*.out file(s), raw summary output files generated by PHITS
	*.out raw plaintext tally output files from PHITS
	*.eps graphical visualizations of tally output, generated by PHITS
	*_dmp.out* raw binary tally dump files from PHITS
	*.pickle.xz processed tally output (and phits.out metadata) from PHITS Tools, LZMA-compressed pickle files
	*_dmp_namedtuple_list.pickle.xz processed tally dump output from PHITS Tools, formatted as a NumPy record array (np.recarray)
	*_dmp_Pandas_df.pickle.xz processed tally dump output from PHITS Tools, formatted as a Pandas DataFrame (same numerical data as in NumPy recarray)
	*.png and *.pdf graphical visualizations of tally output, generated by PHITS Tools


The top-level directories of this repository are named in a way to detail (1) which simulations their contents pertain to and (2) which output files are contained within them. The directories are named using an underscore-delimited pattern whose components have the following names and meanings:


	Beam type:
	
		MonoE refers to simulations with the monoenergetic beams with 2.5 mm spread
		GaussE refers to simulations with the Gaussian-distributed energies and 4.0 mm spread
	
	
	Simulated number of protons:
	
		1E8 refers to simulations with 10^8 (one hundred million) protons simulated
		1E9 refers to simulations with 10^9 (one billion) protons simulated
	
	
	Output source/type:
	
		raw refers to the PHITS input and PHITS-generated output
		processed refers to the Python-formatted processed output produced by PHITS Tools
		plots refers to the *.eps files produced by PHITS and the *.png and *.pdf files produced by PHITS Tools, all containing graphical plots of tally output (only relevant to 1E8 simulations)
	
	
	Other labels:
	
		proton-tally refers to output from the huge [T-Cross] tally used only in 1E8 simulations for scoring proton phase space as a function of energy, position, and direction (separated from others owing to its considerable size)
		neutron-dump refers to the event-by-event neutron production data scored by a [T-Product] tally's "dump" option
		
			NumPy and Pandas to denote if processed contents are formatted as NumPy record arrays or Pandas Dataframes
		
		
		gamma-dump refers to the event-by-event gamma-ray production data scored by a [T-Product] tally's "dump" option
		
			NumPy and Pandas to denote if processed contents are formatted as NumPy record arrays or Pandas Dataframes
		
		
		other refers to output from all other tallies aside from the above three (energy deposition, LET, diagnostic tallies, etc.; only relevant to 1E8 simulations given all tallies except dump tallies were disabled for 1E9 simulations) along with (for raw directories) PHITS input-related files and phits*.out file(s).
	
	


For clarity, the dataset notation here corresponds to that used in [1] as follows: GaussE_1E8 = ES8 and GaussE_1E9 = ES9. (The paper did not use MonoE_1E8 and MonoE_1E9, but if it had they would've been designated with NES8 and NES9, respectively.)

All put together, this results in the following top-level directories contained in this repository:

\(\begin{array}{lrrc} \textbf{Directory} &amp; \textbf{Files} &amp; \textbf{Uncompressed size (GB)} &amp; \textbf{Available upon request} \\ \hline \texttt{common_inputs} &amp; 54 &amp; 0.002 &amp; \\ \texttt{GaussE_1E8_raw_proton-tally} &amp; 564 &amp; 361.30 &amp; \\ \texttt{GaussE_1E8_raw_neutron-dump} &amp; 611 &amp; 41.24 &amp; \\ \texttt{GaussE_1E8_raw_gamma-dump} &amp; 611 &amp; 85.82 &amp; \\ \texttt{GaussE_1E8_raw_other} &amp; 1927 &amp; 22.73 &amp; \\ \texttt{GaussE_1E8_processed_proton-tally} &amp; 564 &amp; 17.19 &amp; \\ \texttt{GaussE_1E8_processed_neutron-dump_NumPy} &amp; 611 &amp; 37.25 &amp; \\ \texttt{GaussE_1E8_processed_neutron-dump_Pandas} &amp; 611 &amp; 45.77 &amp; \\ \texttt{GaussE_1E8_processed_gamma-dump_NumPy} &amp; 611 &amp; 73.81 &amp; \\ \texttt{GaussE_1E8_processed_gamma-dump_Pandas} &amp; 611 &amp; 90.16 &amp; \\ \texttt{GaussE_1E8_processed_other} &amp; 1551 &amp; 4.35 &amp; \\ \texttt{GaussE_1E8_plots} &amp; 3525 &amp; 7.60 &amp; \\ \texttt{GaussE_1E9_raw_neutron-dump} &amp; 1081 &amp; 408.81 &amp; \times \\ \texttt{GaussE_1E9_raw_gamma-dump} &amp; 1081 &amp; 854.58 &amp; \times \\ \texttt{GaussE_1E9_raw_other} &amp; 1316 &amp; 0.59 &amp; \\ \texttt{GaussE_1E9_processed_neutron-dump_NumPy} &amp; 1081 &amp; 372.46 &amp; \times \\ \texttt{GaussE_1E9_processed_neutron-dump_Pandas} &amp; 1081 &amp; 457.42 &amp; \times \\ \texttt{GaussE_1E9_processed_gamma-dump_NumPy} &amp; 1081 &amp; 738.00 &amp; \times \\ \texttt{GaussE_1E9_processed_gamma-dump_Pandas} &amp; 1081 &amp; 901.41 &amp; \times \\ \texttt{GaussE_1E9_processed_other} &amp; 1175 &amp; 0.02 &amp; \\ \texttt{MonoE_1E8_raw_proton-tally} &amp; 94 &amp; 360.92 &amp; \\ \texttt{MonoE_1E8_raw_neutron-dump} &amp; 282 &amp; 40.83 &amp; \\ \texttt{MonoE_1E8_raw_gamma-dump} &amp; 282 &amp; 83.69 &amp; \\ \texttt{MonoE_1E8_raw_other} &amp; 1222 &amp; 30.17 &amp; \\ \texttt{MonoE_1E8_processed_proton-tally} &amp; 47 &amp; 13.69 &amp; \\ \texttt{MonoE_1E8_processed_neutron-dump_NumPy} &amp; 94 &amp; 37.23 &amp; \\ \texttt{MonoE_1E8_processed_neutron-dump_Pandas} &amp; 94 &amp; 45.77 &amp; \\ \texttt{MonoE_1E8_processed_gamma-dump_NumPy} &amp; 94 &amp; 72.23 &amp; \\ \texttt{MonoE_1E8_processed_gamma-dump_Pandas} &amp; 94 &amp; 88.29 &amp; \\ \texttt{MonoE_1E8_processed_other} &amp; 846 &amp; 2.18 &amp; \\ \texttt{MonoE_1E8_plots} &amp; 799 &amp; 2.25 &amp; \\ \texttt{MonoE_1E9_raw_neutron-dump} &amp; 2364 &amp; 407.99 &amp; \times \\ \texttt{MonoE_1E9_raw_gamma-dump} &amp; 2364 &amp; 836.55 &amp; \times \\ \texttt{MonoE_1E9_raw_other} &amp; 329 &amp; 0.04 &amp; \\ \texttt{MonoE_1E9_processed_neutron-dump_NumPy} &amp; 94 &amp; 371.30 &amp; \times \\ \texttt{MonoE_1E9_processed_neutron-dump_Pandas} &amp; 94 &amp; 455.94 &amp; \times \\ \texttt{MonoE_1E9_processed_gamma-dump_NumPy} &amp; 94 &amp; 721.08 &amp; \times \\ \texttt{MonoE_1E9_processed_gamma-dump_Pandas} &amp; 94 &amp; 879.13 &amp; \times \\ \texttt{MonoE_1E9_processed_other} &amp; 188 &amp; 0.01 &amp; \\ \hline \textbf{TOTAL} &amp; \textbf{30397} &amp; \textbf{8969.84} &amp; \end{array}\)

(Data marked as "Available upon request" is only available upon additional specific request.)

And, as stated earlier, each of these top-level directories is divided into a training subdirectory (containing 37 ???_MeV directories) and a testing subdirectory (containing 10 ???p?_MeV directories), where the ???[p?]_MeV directories only (1) contain particular files (2) relevant to certain simulations—as specified by the top-level directory's name.

As a note to anyone surveying the raw files, all GaussE simulations were ran with OpenMP parallelization with 10 processes. For 1E8 simulations, this was conducted as ten PHITS runs of 1E7 protons each; for 1E9 simulations, this was conducted as twenty runs of 5E7 protons each. (PHITS runs can be "chained" as "restart calculations", where one run can resume from where a previous run ended.) In these simulations, the generated phits.out files from each run were renamed to phits-#A-#B.out (where #A was an internal number 1 to 47 pairing with each simulated beam energy, and #B is the run number, 0 to 19) and moved into a phitsout subdirectory after each run's completion. However, this was less uniform for the MonoE simulations; for those, the strategy was to complete each simulation in a single run of PHITS. This generally involved using a hybrid OpenMP + MPI parallelization with anywhere from 80 to 160 processes each, split between OMP and MPI (noting that some 1E9 runs were conducted with only MPI parallelization). None of this influences the output format of the standard tally outputs. However, the number of dump files produced is equal to the number of MPI processes utilized. This means that each GaussE simulation only has one dump file per dump tally owing to only using OpenMP parallelization (which merges its dump files at the end of calculation) while the MonoE simulations contain a varied number of dump files per dump tally owing to varriations in parallelization strategies employed in those simulations. PHITS Tools ultimately merges all dump outputs back together in its processing, meaning if looking at the processed output this quirk of how simulations were conducted should not be apparent at all.

Given PHITS Tools was under ongoing development as this dataset was being produced, the GaussE directories contain some extra output not present in the MonoE directories. Most notably, only for the GaussE simulations do the plot directories contain PNG and PDF plot files generated by PHITS Tools and the *_processed_* directories contain dictionary objects of the processed phits*.out files.

Note that, for convenience, the phits*.out file(s) for each simulation are also copied to all *_raw_* directories. The phits*.out file(s) contain the full PHITS input echo, among other information about the simulation. For the GaussE simulations, these are within a further phitsout subdirectory for each beam energy. Also for all GaussE_*_processed_* directories, the processed phits*.out file(s), phits*_out.pickle.xz, are included too.

References

[1] F. Blangiardi, H.N. Ratliff et al., "Fast proton transport and neutron production in proton therapy using Fourier neural operators", (in preparations for submission) (2025)

[2] T. Sato, Y. Iwamoto, S. Hashimoto, T. Ogawa, T. Furuta, S. Abe, T. Kai, Y. Matsuya, N. Matsuda, Y. Hirata, T. Sekikawa, L. Yao, P.E. Tsai, H.N. Ratliff, H. Iwase, Y. Sakaki, K. Sugihara, N. Shigyo, L. Sihver and K. Niita, "Recent improvements of the Particle and Heavy Ion Transport code System - PHITS version 3.33", Journal of Nuclear Science and Technology, 61, 127-135, (2024) doi:10.1080/00223131.2023.2275736

[3] H.N. Ratliff, "The PHITS Tools Python package for parsing, organizing, and analyzing results from the PHITS radiation transport and DCHAIN activation codes", The Journal of Open Source Software, 10(113), 8311, (2025) doi:10.21105/joss.08311, github.com/Lindt8/PHITS-Tools

[4] F. Blangiardi, H.N. Ratliff and Kögler, "Proton and Neutron reduced phase space for surrogate modeling of Proton Therapy from PHITS simulations", Rodare [Data set], (2025) doi:10.14278/rodare.4128

Acknowledgements

The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester has received funding from UK Research and Innovation under grant agreement No. 10102118.</dc:description>
          <dc:identifier>https://rodare.hzdr.de/record/4526</dc:identifier>
          <dc:identifier>10.14278/rodare.4526</dc:identifier>
          <dc:identifier>oai:rodare.hzdr.de:4526</dc:identifier>
          <dc:language>eng</dc:language>
          <dc:relation>doi:10.14278/rodare.3996</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/novo</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
          <dc:subject>proton therapy</dc:subject>
          <dc:subject>treatment verification</dc:subject>
          <dc:subject>particle transport calculations</dc:subject>
          <dc:subject>PHITS</dc:subject>
          <dc:title>PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in heterogeneous 1D tissue phantoms</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:rodare.hzdr.de:4525</identifier>
        <datestamp>2026-05-05T10:56:02Z</datestamp>
        <setSpec>openaire_data</setSpec>
        <setSpec>user-health</setSpec>
        <setSpec>user-novo</setSpec>
        <setSpec>user-rodare</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:creator>Blangiardi, Francesco</dc:creator>
          <dc:creator>Ratliff, Hunter</dc:creator>
          <dc:creator>Kögler, Toni</dc:creator>
          <dc:date>2025-11-14</dc:date>
          <dc:description>Introduction

This dataset corresponds to the simulation data used within AI methods in _"Fast proton transport and neutron production in proton therapy using Fourier neural operators"_ [CITE]. It has been extracted from the corresponding PHITS dataset [1] related to the same work, and is used by the codebase provided in [2] implementing all important AI methods within the paper.

The purpose of this entry is to provide a more easily accessible version of the data in [2] ready to be used for AI applications. The size of the dataset has been greatly reduced, and put into a format allowing the access of the phase space density at each individual depth in the phantom for both protons and neutrons and in the form of discretized histograms.

A concise description of the simulation setup is provided in [2] please refer to the paper for detailed discussion, description, analysis, and further results derived from this dataset.

General information

The phase space density data is divided into discretized histograms as defined in the related paper. This follows the approximation within said paper where only 4 dimensions are kept, related to the depth, radial distance (R), energy (E) and azimuthal divergence (θ) of the particles. The depth dimension is considered as a pseudo-time dimension, meaning that time is not provided within the data. In order to simulate examples of different beams propagatng through different materials, a total of 47 phantoms have been simulated, each with a unique starting energy. Phantoms have been divided into slabs along the depth dimension which are assumed to be of homogeneous material along the dimensions perpendicular to the beam axis, but are composed of different materials among them. The proton density is provided as the Monte Carlo simulated protons appropriately binned into the defined discretizations whenever one of the surfaces of each slab is crossed. When it comes to the neutron phase space density, this is instead provided as the angle, energy and radius distributions of secondary neutrons produced within each slab. Both densities are to be considered as integrated with respect to time. For each slab, also the energy deposited by the proton is provided, coming as an energy deposition probability distribution along E and R. Moreover, each of the 47 phantoms has been irradiated according to three different sets of treatment head paramenter, leading to the creation of three dataset: ES8, ES9 and NES8. For the sake of reproducibility, weights for each of the models discussed in [2] are also provided.

Parametrization

The densities are observed through discretizations as identified in the paper. Within this work, the resolution along the beam depth is fixed to 0.5mm, the energy resolution is set to 1 and 2 MeV for the proton and neutron fluences respectively, while the radial distance and angle is handled differently among the two particles. For protons these are discretized in logarithmically spaced bins, with the first bin also comprising 0, and ranging up to 95.9 mm and 58.76 ° respectively. Instead, for neutrons both dimensions are uniformly discretized, ranging from 0 up to 60 mm and 180 ° respectively. The R, E and θ dimensions are divided into 30x250x30 bins within the proton data, and into 30x125x30 in the case of the neutrons, which are provided at each discretized depth. Data about energy deposition follows the same radial binning as in the case of the proton density, but the energy binning is instead logarithmic ranging from 1.0e-3 up to 97.7 MeV.

As already mentioned, the ES8, ES9 and NES8 datasets differ in terms of the treatment head parameters. More details about the specifics of each dataset can be found in [1]. As ES8 and ES9 share the same treatment head parameters with the exception of the intensity, the proton density is not provided for the ES9 dataset to limit storage size.

Model weights for each surrogate trained on each of the provided datasets (called MES8, MES9 and MNES8) are also provided, abiding to the surrogate structure defined in [2]. In particular, each surrogate is composed of a proton and neutron model for both density and intensity prediction. Models can be used as detailed in the GitHub repository [3] related to [2].

File description

Both the aforementioned density discretizations are named internally as "phits_logfull" and "hn_phits" for the proton and neutrons respectively, with the energy deposition one following the same convention as the protons. All files contained within this datasets are therefore named according to the discretizations as either "phits_logfull_cube_protons_\&lt;depth in millimeters\&gt;_data.nc", "phits_logfull_cube_dose_\&lt;depth in millimeters\&gt;_data.nc" or "hn_phits_cube_neutrons_\&lt;depth in millimeters\&gt;_data.nc". Each nc file contains an `xarray` variables, containing the MC-approximated histogram, details of the discretization, as well as important parameters such as the CT number of the considered slab, its density and the material's ID within the PHITS environment.

Surrogates are provided in separate .zip files. Each surrogate contains 4 subfolders related to each surrogate component. The PDF components come in the form of pytorch checkpoints encapsulating Fourier Neural Operator models defined through package `neuraloperator` [4] [5] with version 0.3.0. Intensity components are instead .pickle files containing XGBoostRegressor objects defined through package `XGBoost` [6]. Each component also comes with a pickled dictionary containing important metadata related to model hyperparameters.

Folder Structure

The provided data consists of three different .zip files, each related to the ES8, ES9 and the NES8 datasets. Each .zip file comes already divided within the train, validation and test split on the basis of the starting energy. Within each split folder, simulations are represented through folders named in the format "\&lt;Starting Energy\&gt;MeV_05mm_800layers, and each contain the related proton and neutron fluences in files with the previously specified naming convention.

It should be noted that, although the total size of the proposed dataset is of around 7GB, uncompressing the files requires a total size of 180.2 GB.

References

[1] H. N. Ratliff, F. Blangiardi, PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in hetero-geneous 1D tissue phantoms, Rodare, (in preparation for submission)(2025).

[2] "Fast proton transport and neutron production in proton therapy using Fourier neural operators" (to be filled)

[3] Blangiardi, F. (2025). AI_phase_space_PT [Computer software]. GitHub. [https://github.com/f-blan/AI_phase_space_PT](https://github.com/f-blan/AI_phase_space_PT)

[4] J. Kossaifi, N. Kovachki, Z. Li, D. Pitt, M. Liu-Schiaffini, R. J. George, B. Bonev, K. Azizzadenesheli, J. Berner, A. Anandkumar, A library for learning neural operators (2024). arXiv:2412.10354.

[5] N. B. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces, CoRR abs/2108.08481 (2021).

[6] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, ACM, 2016, p. 785–794. doi:10.1145/2939672.2939785. URL http://dx.doi.org/10.1145/2939672.2939785

Acknowledgements

The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester has received funding from UK Research and Innovation under grant agreement No. 10102118</dc:description>
          <dc:description>The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester has received funding from UK Research and Innovation under grant agreement No. 10102118</dc:description>
          <dc:identifier>https://rodare.hzdr.de/record/4525</dc:identifier>
          <dc:identifier>10.14278/rodare.4525</dc:identifier>
          <dc:identifier>oai:rodare.hzdr.de:4525</dc:identifier>
          <dc:language>eng</dc:language>
          <dc:relation>url:https://www.hzdr.de/publications/Publ-42226</dc:relation>
          <dc:relation>doi:10.14278/rodare.4127</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/health</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/novo</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
          <dc:rights>info:eu-repo/semantics/restrictedAccess</dc:rights>
          <dc:subject>Proton Therapy</dc:subject>
          <dc:subject>Surrogate Modelling</dc:subject>
          <dc:subject>Proton Transport</dc:subject>
          <dc:subject>Neutron Production</dc:subject>
          <dc:subject>Deep Learning</dc:subject>
          <dc:subject>Neural Operators</dc:subject>
          <dc:subject>Monte Carlo</dc:subject>
          <dc:title>Proton and Neutron reduced phase space for surrogate modeling of Proton Therapy from PHITS simulations</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:rodare.hzdr.de:3997</identifier>
        <datestamp>2026-05-05T10:54:57Z</datestamp>
        <setSpec>openaire_data</setSpec>
        <setSpec>user-novo</setSpec>
        <setSpec>user-rodare</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:creator>Ratliff, Hunter</dc:creator>
          <dc:creator>Blangiardi, Francesco</dc:creator>
          <dc:creator>Kögler, Toni</dc:creator>
          <dc:date>2025-09-24</dc:date>
          <dc:description>Introduction

This dataset corresponds to the PHITS simulation data used in "Fast Phase Space Reconstruction for Proton Beam Traversal and Neutron Emission in Proton Therapy using Fourier Neural Operators".
A concise description of the simulation setup is provided here; please refer to the paper for detailed discussion, discription, analysis, and further results derived from this dataset.


Description of simulations

This dataset consists of PHITS simulations for 47 different proton energies from 70 MeV to 250 MeV incident upon different "1D" heterogeneous cylindrical phantoms (varied materials every 0.5 mm in length, uniform radially and rotationally) whose composition (materials and sequence along length) are taken from randomly sampled rays cast through a 3D CT phantom [CITE] with CT number mapped to material composition and density via the HumanVoxelTable-KumamotoUniv.data conversion table within the RT-PHITS utilitydistributed with PHITS. 
Included tallies score spatial distributions of energy deposition, LET, proton current (with an additional angular dimension), neutron production, gamma-ray production, and a variety of diagnostic tallies. 
Event-by-event "list-mode" data is scored for neutron and gamma-ray production, called "dump" tallies in PHITS.

Given the objective of these simulations was for AI model development, the 47 energies are divided into 37 training energies (70 MeV to 250 MeV in 5 MeV steps) and 10 testing energies (73 MeV to 245.8 MeV in 19.2 MeV steps). 
For each energy, two simulations were ran: (1) a simulation with 1E8 protons simulated where all tallies (including dump tallies) were included/enabled and (2) a simulation with 1E9 protons simulated (available on request) but with only dump tallies enabled (other tallies disabled to reduce memory consumption and increase simulation speed). 
Furthermore, all of the above was actually performed twice: (1) initially with purely monoenergetic beam energies and with a spatial spread of 2.5 mm and (2) a second "more realistic" set with Gaussian-distributed energies (with energy-dependent FWHM) and slightly wider 4.0 mm beam spread.

All simulation outputs were automatically processed from the plaintext and binary files produced by PHITS into compressed pickle file objects (NumPy arrays, Pandas DataFrames, dictionaries) using the PHITS Tools 

Python utility. 
These Python objects were then utilized in the subsequent analysis of the paper this simulation set was generated for.


Structure of this repository

The volume of data present in this repository is quite substantial (~ 700 GB). 
Therefore, the repository has been structured in a way to allow flexibility in only downloading data of interest.

The root directory of this repository consists of 39 top-level directories whose names indicate their contents.
Within each are two directories: training and testing.
Within each of these are directories of the format ???_MeV, where ??? is replaced by three digits specifying the nominal beam energy in MeV. 
(This is ???p? for the energies of the testing dataset, with p in place of a decimal point.)
Thus, each training directory contains 37 subdirectories, and each testing directory contains 10 subdirectories.
(One should note that there are no setup differences between training and testing data; they are simply divided here in the same way as in the paper.)
Each ???_MeV/???p?_MeV directory contains simulation input/output and/or PHITS Tools processed output, depending on the top-level directory it is contained within.
Input and output file names do not differ between different energies; directory structure is used to keep them distinguished/separated.

PHITS input information

One top-level directory differs from all of the others, and this is common_inputs.
As the name suggests, this directory contains all PHITS input information used in generating all of the simulation outputs. 

The core two PHITS input files used are beam-on-target_phits-input_MonoE.inp for the monoenergetic beam simulation set and beam-on-target_phits-input_GaussE.inp for the Gaussian-distributed beam energy simulation set.
Within these inputs are lines using the PHITS insert file function infl:{*}; all inserted files used in the PHITS simulations are also contained within this common_inputs directory.
The single exception to this is PARAMETERS_files-1-and-7.txt, which is simply the file(1) and file(7) PHITS [Parameters] arguments and will be system-specific paths to PHITS installation/data files.
Also note that relative paths are used in the infl:{*} commands; these relative paths differ to how this repository is structured given the repository has been restructured in post for distribution convenience. 
File names are still unique and can be found in this common_inputs directory.
The CELL subdirectory contains the [Cell] sections used for the varied phantom compositions, and the MAPPINGS_OF_ENERGY_TO_CELL_FILES.csv file details how these files are paired with the 47 different beam energies.

PHITS outputs (raw and processed)

The remaining 38 top-level directories contain simulation/processed output.
When these simulations were ran, all output was contained in each ???_MeV directory. 
As detailed earlier, these have been split into various top-level directories here to allow more convenient download of only desired files.
Nominally, each of these ???_MeV directories contained the following before being split:


	a beam-on-target_phits-input.inp PHITS input file (and a simple phits.in pointing to this input file, needed for parallel running of PHITS); note that these inputs have all specific source energy information populated within this file
	a phantom_composition_info.csv file also detailing the phantom composition used for that beam energy
	tphits*.out file(s), raw summary output files generated by PHITS
	*.out raw plaintext tally output files from PHITS
	*.eps graphical visualizations of tally output, generated by PHITS
	*_dmp.out* raw binary tally dump files from PHITS
	*.pickle.xz processed tally output (and phits.out metadata) from PHITS Tools, LZMA-compressed pickle files
	*_dmp_namedtuple_list.pickle.xz processed tally dump output from PHITS Tools, formatted as a NumPy record array (np.recarray)
	*_dmp_Pandas_df.pickle.xz processed tally dump output from PHITS Tools, formatted as a Pandas DataFrame (same numerical data as in NumPy recarray)
	*.png and *.pdf graphical visualizations of tally output, generated by PHITS Tools


The top-level directories of this repository are named in a way to detail (1) which simulations their contents pertain to and (2) which output files are contained within them.
The directories are named using an underscore-delimited pattern whose components have the following names and meanings:


	Beam type: 
	
		 MonoE refers to simulations with the monoenergetic beams with 2.5 mm spread
		 GaussE refers to simulations with the Gaussian-distributed energies and 4.0 mm spread
	
	
	Simulated number of protons:
	
		1E8 refers to simulations with 108 (one hundred million) protons simulated 
		1E9 refers to simulations with 109 (one billion) protons simulated (only available on request)
	
	
	Output source/type:
	
		raw refers to the PHITS input and PHITS-generated output 
		processed refers to the Python-formatted processed output produced by PHITS Tools
		plots refers to the *.eps files produced by PHITS and the *.png and *.pdf files produced by PHITS Tools, all containing graphical plots of tally output (only relevant to 1E8 simulations)
	
	
	Other labels:
	
		proton-tally refers to output from the huge [T-Cross] tally used only in 1E8 simulations for scoring proton phase space as a function of energy, position, and direction (separated from others owing to its considerable size)
		neutron-dump refers to the event-by-event neutron production data scored by a [T-Product] tally's "dump" option
		NumPy and Pandas to denote if processed contents are formatted as NumPy record arrays or Pandas Dataframes 
		gamma-dump refers to the event-by-event gamma-ray production data scored by a [T-Product] tally's "dump" option
		NumPy and Pandas to denote if processed contents are formatted as NumPy record arrays or Pandas Dataframes 
		other refers to output from all other tallies aside from the above three (energy deposition, LET, diagnostic tallies, etc.; only relevant to 1E8 simulations given all tallies except dump tallies were disabled for 1E9 simulations) along with (for raw directories) PHITS input-related files and phits*.out file(s).
	
	


All put together, this results in the following top-level directories contained in this repository:


	common_inputs
	GaussE_1E8_raw_proton-tally
	GaussE_1E8_raw_neutron-dump
	GaussE_1E8_raw_gamma-dump
	GaussE_1E8_raw_other
	GaussE_1E9_raw_neutron-dump (upon request)
	GaussE_1E9_raw_gamma-dump (upon request)
	GaussE_1E9_raw_other
	GaussE_1E8_processed_proton-tally
	GaussE_1E8_processed_neutron-dump_NumPy
	GaussE_1E8_processed_neutron-dump_Pandas
	GaussE_1E8_processed_gamma-dump_NumPy
	GaussE_1E8_processed_gamma-dump_Pandas
	GaussE_1E8_processed_other
	GaussE_1E9_processed_neutron-dump_NumPy (upon request)
	GaussE_1E9_processed_neutron-dump_Pandas (upon request)
	GaussE_1E9_processed_gamma-dump_NumPy (upon request)
	GaussE_1E9_processed_gamma-dump_Pandas (upon request)
	GaussE_1E9_processed_other 
	GaussE_1E8_plots
	MonoE_1E8_raw_proton-tally
	MonoE_1E8_raw_neutron-dump
	MonoE_1E8_raw_gamma-dump
	MonoE_1E8_raw_other
	MonoE_1E9_raw_neutron-dump (upon request)
	MonoE_1E9_raw_gamma-dump (upon request)
	MonoE_1E9_raw_other
	MonoE_1E8_processed_proton-tally
	MonoE_1E8_processed_neutron-dump_NumPy
	MonoE_1E8_processed_neutron-dump_Pandas
	MonoE_1E8_processed_gamma-dump_NumPy
	MonoE_1E8_processed_gamma-dump_Pandas
	MonoE_1E8_processed_other
	MonoE_1E9_processed_neutron-dump_NumPy (upon request)
	MonoE_1E9_processed_neutron-dump_Pandas (upon request)
	MonoE_1E9_processed_gamma-dump_NumPy (upon request)
	MonoE_1E9_processed_gamma-dump_Pandas (upon request)
	MonoE_1E9_processed_other
	MonoE_1E8_plots


And, as stated earlier, each of these top-level directories is divided into a training subdirectory (containing 37 ???_MeV directories) and a testing subdirectory (containing 10 ???p?_MeV directories), where the ???[p?]_MeV directories only (1) contain particular files (2) relevant to certain simulations&amp;mdash;as specified by the top-level directory's name.

 

As a note to anyone surveying the raw files, all GaussE simulations were ran with OpenMP parallelization with 10 processes.
For 1E8 simulations, this was conducted as ten PHITS runs of 1E7 protons each; for 1E9 simulations, this was conducted as twenty runs of 5E7 protons each.
(PHITS runs can be "chained" as "restart calculations", where one run can resume from where a previous run ended.)
In these simulations, the generated phits.out files from each run were renamed to phits-#.out (where # is the run number, 0 to 19) and moved into a phitsout subdirectory after each run's completion.
However, this was less uniform for the MonoE simulations; for those, the strategy was to complete each simulation in a single run of PHITS. 
This generally involved using a hybrid OpenMP + MPI parallelization with anywhere from 80 to 160 processes each, split between OMP and MPI (noting that some 1E9 runs were conducted with only MPI parallelization).
None of this influences the output format of the standard tally outputs.
However, the number of dump files produced is equal to the number of MPI processes utilized.
This means that each GaussE simulation only has one dump file per dump tally owing to only using OpenMP parallelization (which merges its dump files at the end of calculation) while the MonoE simulations contain a varied number of dump files per dump tally owing to varriations in parallelization strategies employed in those simulations.
PHITS Tools ultimately merges all dump outputs back together in its processing, meaning if looking at the processed output this quirk of how simulations were conducted should not be apparent at all.


Given PHITS Tools was under ongoing development as this dataset was being produced, the GaussE directories contain some extra output not present in the MonoE directories.  Most notably, only for the GaussE simulations do the plot directories contain PNG and PDF plot files generated by PHITS Tools and the *_processed_* directories contain dictionary objects of the processed phits*.out files.

Note that, for convenience, the phits*.out file(s) for each simulation are also copied to all *_raw_* directories.  The phits*.out file(s) contain the full PHITS input echo, among other information about the simulation.  For the GaussE simulations, these are within a further phitsout subdirectory for each beam energy.  Also for all GaussE_*_processed_* directories, the processed phits*.out file(s), phits*_out.pickle.xz, are included too.

References

TO BE POPULATED

Acknowledgements

The NOVO project has received funding from the European Innovation Council (EIC) under grant agreement No. 101130979. The EIC receives support from the European Union's Horizon Europe research and innovation programme. Partners from The University of Manchester has received funding from UK Research and Innovation under grant agreement No. 10102118 </dc:description>
          <dc:identifier>https://rodare.hzdr.de/record/3997</dc:identifier>
          <dc:identifier>10.14278/rodare.3997</dc:identifier>
          <dc:identifier>oai:rodare.hzdr.de:3997</dc:identifier>
          <dc:language>eng</dc:language>
          <dc:relation>url:https://www.hzdr.de/publications/Publ-43014</dc:relation>
          <dc:relation>doi:10.14278/rodare.3996</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/novo</dc:relation>
          <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
          <dc:rights>info:eu-repo/semantics/restrictedAccess</dc:rights>
          <dc:subject>proton therapy</dc:subject>
          <dc:subject>treatment verification</dc:subject>
          <dc:subject>particle transport calculations</dc:subject>
          <dc:subject>PHITS</dc:subject>
          <dc:title>PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in heterogeneous 1D tissue phantoms</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
  </ListRecords>
</OAI-PMH>
