Is It Here/There Yet? - Real Life Experiences of Generating/Evaluating Extreme Data Sets Around the World

Juckeland, Guido; Huebl, Axel; Bussmann, Michael

doi:10.14278/rodare.71

September 18, 2018 Presentation Open Access

Is It Here/There Yet? - Real Life Experiences of Generating/Evaluating Extreme Data Sets Around the World

Juckeland, Guido; Huebl, Axel; Bussmann, Michael

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Juckeland, Guido</dc:creator>
  <dc:creator>Huebl, Axel</dc:creator>
  <dc:creator>Bussmann, Michael</dc:creator>
  <dc:date>2018-09-18</dc:date>
  <dc:description>Large scale simulations easily produce vast amounts of data that cannot always be evaluated in-situ. At that point parallel file systems come into play, but their per node performance is essentially limited to about the speed of a USB 2.0 thumb drive (e.g. the Spider file system at OLCF provides over 1 TB/s write bandwidth, but with 18000+ nodes of Titan writing simultaneously, this number is reduced to about 50 MB/s per node). Making the most out of such a limited resource requires I/O libraries that actually scale. In addition such libraries also offer on the fly data transformations (e.g. compression) to better utilize the raw I/O bandwidth, albeit, opening a new can of worms by trading compression throughput with compression ratios for performance. We will present a detailed study of I/O performance and various compression techniques at OLCF and compare them against smaller local I/O installations, demonstrating the highest achieved I/O performance for real world applications at OLCF. Furthermore, we demonstrate that the best performing I/O setup can be determined prior to starting the job based on hardware characteristics.
Now that you have your data on disk the clock starts ticking and you are fighting against the deadline until your data will be purged, since most centers only offer the high performing storage spaces on a temporary basis. Extracting all valuable information out of a petabyte sized data set requires parallel processing as well and induces wait times until the resources are available and quite naturally a lot of trial-and-error for the evaluation. The time constraint for keeping the temporary data becomes even more troublesome when trying to compare multiple large simulations that naturally have a delay of multiple days until they are scheduled and write their results. And ideally analysis could embrace the data of multiple simulations of a quarterly accounted, yet year-long computing campaign. Another challenge for actually conducting scientific discoveries comes when utilizing multiple compute sites. This seems to be rather usual for research groups as they will use all the compute clock cycles they
can get wherever that may be. For comparative studies the data sets now need to be available at the same time for analysis, e.g. via archiving solutions or transfer to one location. The achievable transfer bandwidth between data centers is in our experience still much lower than expected. The talk will also present on the experiences of evaluating petabyte sized data sets in such a diverse environment.</dc:description>
  <dc:identifier>https://rodare.hzdr.de/record/71</dc:identifier>
  <dc:identifier>10.14278/rodare.71</dc:identifier>
  <dc:identifier>oai:rodare.hzdr.de:71</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:relation>url:https://www.hzdr.de/publications/Publ-27984</dc:relation>
  <dc:relation>doi:10.14278/rodare.70</dc:relation>
  <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:title>Is It Here/There Yet? - Real Life Experiences of Generating/Evaluating Extreme Data Sets Around the World</dc:title>
  <dc:type>info:eu-repo/semantics/lecture</dc:type>
  <dc:type>presentation</dc:type>
</oai_dc:dc>

945

240

views

downloads

See more details...

	All versions	This version
Views	945	945
Downloads	240	240
Data volume	635.1 MB	635.1 MB
Unique views	750	750
Unique downloads	203	203

More info on how stats are collected.

Publication date:

September 18, 2018

DOI:

Related identifiers:

Identical to:
https://www.hzdr.de/publications/Publ-27984

Communities:

RODARE

License (for files):

Creative Commons Attribution 4.0 International

Versions

Version 1 10.14278/rodare.71

Sep 18, 2018

Cite all versions? You can cite all versions by using the DOI 10.14278/rodare.70. This DOI represents all versions, and will always resolve to the latest one. Read more.

Is It Here/There Yet? - Real Life Experiences of Generating/Evaluating Extreme Data Sets Around the World

Dublin Core Export

Versions

Share

Cite as

Export

About

Help

Contribute

Follow us

Registered in

Is It Here/There Yet? - Real Life Experiences of Generating/Evaluating Extreme Data Sets Around the World

Dublin Core Export

RODARE DOI Badge

DOI

10.14278/rodare.71

Markdown

[![DOI](https://rodare.hzdr.de/badge/DOI/10.14278/rodare.71.svg)](https://doi.org/10.14278/rodare.71)

reStructedText

.. image:: https://rodare.hzdr.de/badge/DOI/10.14278/rodare.71.svg :target: https://doi.org/10.14278/rodare.71

HTML

<a href="https://doi.org/10.14278/rodare.71"><img src="https://rodare.hzdr.de/badge/DOI/10.14278/rodare.71.svg" alt="DOI"></a>

Image URL

https://rodare.hzdr.de/badge/DOI/10.14278/rodare.71.svg

Target URL

https://doi.org/10.14278/rodare.71

Versions

Share

Cite as

Export