Dataset Open Access

Dataset for Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations

Shah, Karan; Cangi, Attila


Citation Style Language JSON Export

{
  "type": "dataset", 
  "id": "3995", 
  "issued": {
    "date-parts": [
      [
        2025, 
        9, 
        24
      ]
    ]
  }, 
  "title": "Dataset for Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations", 
  "author": [
    {
      "family": "Shah, Karan"
    }, 
    {
      "family": "Cangi, Attila"
    }
  ], 
  "version": "2025_09_24", 
  "DOI": "10.14278/rodare.3995", 
  "language": "eng", 
  "publisher": "Rodare", 
  "abstract": "<p># Dataset for &quot;Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations&quot;</p>\n\n\n\n<p>This repository contains the dataset supporting the paper &quot;Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations&quot; by Karan Shah and Attila Cangi. It comprises time-dependent density functional theory (TDDFT) simulations of one-dimensional diatomic molecules under laser excitation. The data is used to train and evaluate autoregressive Fourier Neural Operator (FNO) models that serve as machine learning time propagators for electron density evolution.</p>\n\n\n\n<p>## Overview</p>\n\n\n\n<p>This dataset comprises time-dependent density functional theory (TDDFT) simulations of one-dimensional diatomic molecules under laser excitation. The data is used to train and evaluate autoregressive Fourier Neural Operator (FNO) models that serve as machine learning time propagators for electron density evolution.</p>\n\n\n\n<p>## Physical System</p>\n\n\n\n<p>The simulations model diatomic molecules with:</p>\n\n\n\n<p>- Soft-Coulomb ionic potential:</p>\n\n<p>$$</p>\n\n<p>v_{\\text{ion}}(x) = -\\frac{Z_{1}}{\\sqrt{(x - d/2)^{2} + a^{2}}} \\;-\\; \\frac{Z_{2}}{\\sqrt{(x + d/2)^{2} + a^{2}}}</p>\n\n<p>$$</p>\n\n<p>- Sinusoidal laser excitation:</p>\n\n<p>$v_{\\text{las}}(t) = A \\sin(\\omega t)$ in dipole approximation</p>\n\n<p>- Two-electron systems under adiabatic local density approximation (ALDA)</p>\n\n<p>- Fixed boundary conditions at domain edges</p>\n\n\n\n<p>## Dataset Specifications</p>\n\n\n\n<p>- **Spatial domain**: $[-9.0, 9.0]$ atomic units with spacing $\\Delta x = 0.05$ a.u. (361 grid points)</p>\n\n<p>- **Temporal domain**: $[0, 5.0]$ femtoseconds with ML time step $\\Delta t = 0.1$ fs (51 time steps)</p>\n\n<p>- **Reference resolution**: $\\Delta t = 0.01$ fs for high-accuracy ground truth</p>\n\n<p>- **Total systems**: 2048 independent simulations</p>\n\n<p>- **System parameters**:</p>\n\n<p>- Nuclear charges $(Z_{1}, Z_{2})$: $1.0$&ndash;$4.0$ a.u.</p>\n\n<p>- Internuclear distances $(d)$: $1.0$&ndash;$4.0$ a.u.</p>\n\n<p>- Laser wavelengths: $400$&ndash;$750$ nm (optical range)</p>\n\n<p>- Laser intensities: $10^{12}$&ndash;$10^{14}$ W/cm$^{2}$</p>\n\n<p>- Softening parameter $(a)$: $1.0$ a.u.</p>\n\n\n\n<p>## Data Format</p>\n\n\n\n<p>Each `combined_data.npz` file contains:</p>\n\n\n\n<p>- `densities`: Electron densities [systems, spatial_points, time_steps]</p>\n\n<p>- `lasers_sliced`: Laser field values during simulation period</p>\n\n<p>- `lasers_val`: Full laser field temporal profile</p>\n\n<p>- `lasers_t`: Time grid for laser fields</p>\n\n<p>- `x`: Spatial coordinate grid</p>\n\n<p>- `t`: Temporal coordinate grid</p>\n\n\n\n<p>## Baseline Directory</p>\n\n\n\n<p>```</p>\n\n<p>baseline/</p>\n\n<p>\u251c\u2500\u2500 combined_data.npz</p>\n\n<p>\u251c\u2500\u2500 combined_static_energy.npz</p>\n\n<p>\u251c\u2500\u2500 data_exclude.yaml</p>\n\n<p>\u251c\u2500\u2500 data_indices.npy</p>\n\n<p>\u251c\u2500\u2500 data_static_exclude.yaml</p>\n\n<p>\u251c\u2500\u2500 data_static_exclude_40_percentile.yaml</p>\n\n<p>\u251c\u2500\u2500 data_static_exclude_60_percentile.yaml</p>\n\n<p>\u251c\u2500\u2500 inp_gs.yaml</p>\n\n<p>\u251c\u2500\u2500 inp_td.yaml</p>\n\n<p>\u251c\u2500\u2500 param_set.yaml</p>\n\n<p>\u251c\u2500\u2500 params.csv</p>\n\n<p>\u2514\u2500\u2500 summary_statistics.md</p>\n\n<p>```</p>\n\n\n\n<p>- `combined_data.npz` &mdash; consolidated float32 arrays for 2,048 TDDFT trajectories, including spatial grid (`x`, 361 points), temporal grid (`t`, 51 steps across 0&ndash;5 fs &asymp; 0&ndash;206.7 a.u.), density snapshots, and laser waveforms.</p>\n\n<p>- `combined_static_energy.npz` &mdash; derived observables such as particle-number integrals, dipole moments, and Thomas&ndash;Fermi energies corresponding to each trajectory and time slice.</p>\n\n<p>- `data_exclude.yaml` &mdash; explicit indices removed from the dataset because of boundary reflections or low temporal variation.</p>\n\n<p>- `data_indices.npy` &mdash; NumPy array of retained example indices matching the rows in `params.csv`; use it to align parameter metadata with data tensors without reparsing the NPZ archive.</p>\n\n<p>- `data_static_exclude*.yaml` &mdash; helper masks listing systems to omit when screening for near-static densities; percentile-specific files (`40`, `60`) filter systems with low temporal variation.</p>\n\n<p>- `inp_gs.yaml`, `inp_td.yaml` &mdash; Octopus ground-state and real-time input templates used for the simulations (Crank&ndash;Nicolson propagator, 0.01 fs internal time step, custom diatomic potential). Parameters are overriden according to `params.csv`</p>\n\n<p>- `param_set.yaml` &mdash; YAML description of the parameter sweep ranges (nuclear charges, internuclear distance, driving-field amplitudes, carrier frequencies).</p>\n\n<p>- `params.csv` &mdash; resolved parameter combinations with units, suitable for quick inspection or tabular joins.</p>\n\n<p>- `summary_statistics.md` &mdash; autogenerated report validating float32 conversion, spatial/temporal ranges, and observable metrics for every stored quantity.</p>\n\n\n\n<p>## Dataset Variants</p>\n\n\n\n<p>- `baseline/` &mdash; fine-grid reference generated at 0.01 fs Crank&ndash;Nicolson resolution and downsampled to 0.1 fs outputs; used for training the model and to get baseline results.</p>\n\n<p>- `octopus_coarse/` &mdash; numerical TDDFT rollouts computed directly on the coarser 0.1 fs grid for solver-versus-model comparisons.</p>\n\n<p>- `spatial_superresolution/` &mdash; trajectories evaluated on a doubled spatial resolution (&Delta;x = 0.025 a.u., 721 grid points) to test Fourier Neural Operator generalization without retraining.</p>\n\n<p>- `time_extension/` &mdash; long-horizon rollouts propagated to 10 fs for assessing error accumulation over time domains outside of training dataset.</p>"
}
370
13
views
downloads
All versions This version
Views 370257
Downloads 138
Data volume 11.3 GB7.0 GB
Unique views 288227
Unique downloads 118

Share

Cite as