
Its main target are two- and three- dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique.

#Ivi unblocked software#
Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak).įeltor is both a numerical library and a scientific software package built on top of it. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 10⁻¹ and 10³ MB. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in FELTORand test our model on a selection of parallel hardware architectures. We identify latency and memory bandwidth as the main performance indicators of our routines. In a second part, we explore important performance tuning considerations.

We briefly discuss alternative methods to ensure the correctness of results like the convergence of reduced physical quantities of interest, ensemble simulations, invariants or reduced simulation times. Pointwise convergence, even in principle, becomes impossible for long simulation times. This behavior translates to its numerical representation. In fact, in the physical model slightly different initial conditions lead to vastly different end states. However, reproducibility and accuracy alone fail to indicate correct simulation behavior. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. Its main targets are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. FELTORconsists of both a numerical library and a collection of application codes built on top of the library.
#Ivi unblocked code#
It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems.
#Ivi unblocked free#
Finally, we verify these approaches on two modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of less than 37.7 % on 768 cores.įELTORis a modular and free scientific software package. These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Instead of converting the entire solver into its ExBLAS-related implementation, we identify those parts that violate reproducibility/non-associativity, secure them, and combine this with the sequential executions. One is based on ExBLAS and preserves every bit of information until the final rounding, while the other relies upon floating-point expansions and, hence, expands the intermediate precision. In this article, we propose two algorithmic solutions that originate from the ExBLAS project to enhance the accuracy of the solver as well as to ensure its reproducibility in a hybrid MPI + OpenMP tasks programming environment. While being widely used, the solver is also known for its lack of accuracy while computing the residual.

The Preconditioned Conjugate Gradient method is often employed for the solution of linear systems of equations arising in numerical simulations of physical phenomena. Finally, we verify these approaches on two modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of less than 37.7% on 768 cores.
