Software Closed Access
Starke, Sebastian;
Smid, Michal
{ "url": "https://rodare.hzdr.de/record/2586", "sameAs": [ "https://www.hzdr.de/publications/Publ-37977" ], "datePublished": "2023-11-29", "inLanguage": { "alternateName": "eng", "name": "English", "@type": "Language" }, "@context": "https://schema.org/", "version": "1", "description": "<p>Software for training and inference of neural network models to remove bremsstrahlung background from SAXS imaging data obtained at the European XFEL laboratory.</p>\n\n<p>We thank Peter Steinbach for providing the codebase for the equivariant UNet, which we integrated into our repository.</p>\n\n<p>Below we share a brief description of our method:</p>\n\n<ol>\n\t<li><strong>Introduction</strong>\n\n\t<p>Experimental data from cameras in ultra-high intensity laser interaction experiments very often con-<br>\n\ttains not only the desired signal, but also a large amount of traces of high-energy photons created<br>\n\tvia the bremsstrahlung process during the interaction. For example, the Jungfrau camera detecting<br>\n\tsmall angle x-ray scattering (SAXS) signal in a combined XFEL + optical laser (OL) experiment at<br>\n\tthe European XFEL laboratory still contains lot of bremsstrahlung background, even though strong<br>\n\texperimental effort (adding a mirror to reflect the signal, and a massive lead wall to block direct view)<br>\n\twas taken to reduce those (Šm\u0131\u0301d et al., 2020). Especially in the SAXS case, the signal is gradually<br>\n\tbecoming weaker with increasing scattering angle. Therefore, the experimentally observed signal-to-<br>\n\tnoise ratio determines the limit of the scattering angles for which the signal can be extracted, limiting<br>\n\tthe physics that can be observed.<br>\n\tAs the noise is produced by the high-energy photons, whose origin is very different from the signal<br>\n\tphotons, the signal and noise are additive. The currently used Jungfrau camera has a resolution of<br>\n\t1024 × 512 pixels, pixel size of 75 μm, and the read values are calibrated to deposited keV per pixel.</p>\n\t</li>\n\t<li><strong>Methods</strong><br>\n\tThe process of removing the noise from the data was split into three steps. First, the learning dataset<br>\n\twas curated and cut into patches of 128 × 128 pixels. Second, a neural network was created an trained<br>\n\ton those data. Splitting the data into the patches actually enables the whole process, because no<br>\n\t‘noise-only’ data are measured in the detector areas where signal typically is. In the third step, an<br>\n\timage with actual data is split into the patches, those are processed by the neural network, and merged<br>\n\ttogether to produce the final signal and noise prediction.<br>\n\t<br>\n\t<strong>Data preparation</strong><br>\n\tThe experimental data used for training the neural network came from two sets:<br>\n\t<br>\n\t• X-ray only shots: Those data are collected when only the XFEL beam was used, i.e. they do<br>\n\tcontain an example of the useful signal, but no bremsstrahlung background at all.<br>\n\t• Full shots: Those data are from the real physics shots, contain both the XFEL and OL beams,<br>\n\ttherefore have a mixture of signal and noise.<br>\n\t<br>\n\tIn order to train the neural network in a supervised manner, we need to provide two sets of data: the<br>\n\tsignal and the noise patches. The signal patches are created from the x-ray only data like this: From<br>\n\teach image, a set of randomly positioned and randomly oriented patches is extracted. The randomness<br>\n\tin rotation is important, as those training x-ray data do have significant dominant directions, which<br>\n\tare expected to change in the real full shots data. Next, the patches are checked and only those<br>\n\twhich have integrated intensity above a given threshold are used, to prevent close-to-empty patches<br>\n\tto be used for the training. In the last step, the amplitude of the patches is randomized, to keep the<br>\n\talgorithm more general. Note that the dynamic range of the detector as well as the signal is large,<br>\n\ti.e. above approximately four orders of magnitude.<br>\n\tThe noise patches are created from the full shots data. To avoid the regions with signal to be used,<br>\n\tthose regions are masked out. The masking is performed automatically by using a corresponding x-ray<br>\n\tonly image. Then, patches of given size are randomly selected from the remaining data. Note that<br>\n\tneither rotation nor changes of amplitude are applied, as both can contain signatures of the structure<br>\n\tof bremsstrahlung, which could simplify the task for the neural network.<br>\n\t<br>\n\t<strong>Neural network</strong><br>\n\tIn the modelling approach we followed, noise was assumed to be additive, i.e. a noisy input signal xin<br>\n\tcan be decomposed into noise and clean signal components n and s, respectively via the relationship<br>\n\txin = n + s.<br>\n\tThe removal of the bremsstrahlung background n was achieved with the help of a convolutional<br>\n\tneural network, which estimated both the noise n\u0302 to be subtracted from the input and the denoised<br>\n\timage \u015d itself. More specifically, a UNet architecture (Ronneberger et al., 2015) was adopted with<br>\n\tfour encoder blocks using 32, 64, 128 and 256 feature maps. Each encoder block consisted of two<br>\n\tseparate convolutional layers and ReLU nonlinearities. No batch normalization was employed. The<br>\n\tcorresponding decoder network matched the number of filters. The decoder output produced latent<br>\n\tfeature maps l with 16 channels.<br>\n\tIn preliminary experiments, we have found an equivariant version of the UNet, implemented us-<br>\n\ting the ‘escnn’ library (https://github.com/QUVA-Lab/escnn) (Cesa et al., 2022), to show favorable<br>\n\tperformance compared to the original version. It consisted of 5.88 million trainable parameters and<br>\n\timplemented operations to make the network equivariant to input transformations under discrete ro-<br>\n\ttations with angles corresponding to multiples of 90 degrees.<br>\n\tThe input to the neural network consisted of image patches of shape 128 × 128. The training data<br>\n\tcomprised of 1754 signal patches and another set of 4711 noise patches.<br>\n\tDuring network training, we randomly sampled a new noise patch each time a clean signal patch<br>\n\twas accessed, as a means of data augmentation and to avoid overfitting. The pixelwise addition of<br>\n\tboth patches resulted in a synthetic noisy patch which was used as model input. Both summands<br>\n\twere treated as labels during model training. Image intensity normalization on the raw pixel values<br>\n\twas performed as follows: lower and upper bounds for z-score normalization were computed as the 1<br>\n\tand 99.95 percentiles of the noisy patch. The lower bound was subtracted from the noisy patch and<br>\n\tthe result was divided by the difference between upper and lower bound. Subsequently, the result was<br>\n\tclipped to the unit range, i.e. values below zero were set to zero and values above one were reduced to<br>\n\tone. The same normalization and clipping strategy using the bounds obtained from the noisy patch<br>\n\twere subsequently applied on the signal and the noise patch, respectively.<br>\n\tFrom the latent representation of the equivariant UNet, pixelwise noise was estimated by further<br>\n\tapplying a convolutional layer on the latent feature map, using a kernel size of three, with stride and<br>\n\tpadding of one to retain the spatial dimensionality. A ReLU activation was applied, as the noise<br>\n\tcontribution was known to be non-negative. The estimated noise \u015d was then subtracted from the<br>\n\tinput. To enforce non-negativity also of the estimated signal, again, a ReLU nonlineariy was applied.<br>\n\tIn total, the procedure worked as follows:<br>\n\tl = eqUNet(xin ),<br>\n\tn\u0302 = ReLU (conv(l)) ,<br>\n\t\u015d = ReLU (xin − n\u0302) .<br>\n\tThe network was implemented using the ‘PyTorch’ library (version 1.12.1) for the Python pro-<br>\n\tgramming language (version 3.10.4). It was trained for 400 epochs with a batch size of 16 on a single<br>\n\tNVIDIA A100 GPU using the AdamW optimizer with a learning rate of 10−4 and no weight decay. For<br>\n\tboth estimated components n\u0302 and \u015d, the mean absolute error loss was applied. Both loss components<br>\n\twere added to obtain the loss function the model was trained on.<br>\n\t<br>\n\t<strong>Application</strong><br>\n\tOnce the model was trained, the removal of the bremsstrahlung background of full-sized experimental<br>\n\timaging data was performed by applying the model on image patches, followed by a recombination<br>\n\tof the patch predictions to obtain full-sized model predictions. A simple sliding-window approach,<br>\n\ti.e. a regular splitting of image data into non-overlapping patches and consequent combination would<br>\n\tproduce unwanted effects on the borders between patches, therefore a more complex method was<br>\n\tdeveloped.<br>\n\tEach image is split into a grid of patches four times, with the following initial pixel offsets: [0,0],<br>\n\t[96,32], [32,96], [64,64]. Normalization of the patches is performed in the same way as described for the training procedure, before being processed by the network. The obtained predictions for each<br>\n\tpatch are then rescaled to the original data range by undoing the normalization (i.e. by multiplying<br>\n\tthe output with the difference between upper and lower bound followed by an addition of the lower<br>\n\tbound).<br>\n\tIn the last step, the four predictions produced for the four offsets are combined into a final result.<br>\n\tEach pixel of the final image is calculated as a weighted mean of those four predictions. The weights<br>\n\tfor the mean are calculated as<br>\n\twi = 1 / ((|pi−m|/2) + 2)<br>\n\twhere wi is the weight of i−th prediction pi , and m is the mean of all predictions for a given pixel.<br>\n\tThis approach effectively eliminates the outliers, which are sometimes produced close to the edges of<br>\n\tthe patches.<br>\n\t </li>\n\t<li><strong>References</strong></li>\n</ol>\n\n<p> [1] Cesa, G., Lang, L., & Weiler, M. (2022). A program to build e(n)-equivariant steerable CNNs. International Conference on Learning Representations. https: / /openreview.net/forum?id=WE4qe9xlnQw</p>\n\n<p> [2] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9351, 234–241. https://doi.org/10.1007/978-3-319-24574-4 28</p>\n\n<p> [3] Šm\u0131\u0301d, M., Baehtz, C., Pelka, A., Laso Garc\u0131\u0301a, A., Göde, S., Grenzer, J., Kluge, T., Konopkova, Z., Makita, M., Prencipe, I., Preston, T. R., Rödel, M., & Cowan, T. E. (2020). Mirror to measure small angle x-ray scattering signal in high energy density experiments. Review of Scientific Instruments, 91 (12), 123501. https://doi.org/10.1063/5.0021691</p>", "name": "Software: removal of bremsstrahlung background from SAXS signals with deep neural networks", "creator": [ { "@id": "https://orcid.org/0000-0001-5007-1868", "affiliation": "HZDR", "name": "Starke, Sebastian", "@type": "Person" }, { "@id": "https://orcid.org/0000-0002-7162-7500", "affiliation": "HZDR", "name": "Smid, Michal", "@type": "Person" } ], "@type": "SoftwareSourceCode", "keywords": [ "SAXS", "XFEL", "equivariant neural networks", "noise removal" ], "@id": "https://doi.org/10.14278/rodare.2586", "identifier": "https://doi.org/10.14278/rodare.2586" }
All versions | This version | |
---|---|---|
Views | 290 | 290 |
Downloads | 1 | 1 |
Data volume | 37.7 MB | 37.7 MB |
Unique views | 267 | 267 |
Unique downloads | 1 | 1 |