4 Conclusion

This paper presents MPI_XSTAR, which is a parallel implementation of multiple XSTAR runs using the MPI protocol (e.g.. Gropp et al., 1999) for clusters with distributed memory. MPI_XSTAR expedites the tabulated model generation on the high-performance computing environments. MPI_XSTAR, similar to XSTAR2XSPEC and PVM_XSTAR, invokes the FTOOLS programs XSTINITABLE and XSTAR2TABLE. The table models take an extremely long time to be produced by XSTAR2XSPEC. Moreover, PVM_XSTAR relies on on the PVM technology (Geist et al., 1994), which is no longer supported by modern supercomputers. Hence, an MPI-based manager for parallelizing XSTAR can overcome the current difficulties in producing the multiplicative tabulated models.

The MPI_XSTAR code that we have developed is available via GitHub (github.com/xstarkit/mpi_xstar). Note that it makes use of the locally installed XSTAR and its associated tools, and will run regardless of XSTAR version so long as the XSTAR parameter inputs and calling sequence do not change. However, should newer versions of XSTAR arise requiring such changes, updates to the MPI_XSTAR code will be made and documented on the GitHub site.

The code was evaluated for the generation of the XSTAR table models with a grid of $ 9 \times 6$ on the $ N_{\rm H}$-$ \xi$ parameter space. The parallel multiprocessing execution is significantly faster than the serial execution, as the computation, which previously took 10 days, requires only about 18 hours using 32 CPUs. However, our benchmarking studies with 1 to 54 CPUs indicates that the parallel efficiency decreases with increasing the number of processors. Moreover, we did not find any linear correlation between the speedup and the number of processors, as shown in Fig. 1. Although we did not achieved an ideal speedup ( $ \mathcal{S}(N)\approx N$), the running times (see Table 2) of parallel execution with 32 and 54 CPUs are enormously shorter than the time of a serial execution. We notice that the performance of MPI_XSTAR is restricted by the maximum running time of a single XSTAR run (about 17.5 hours for our benchmark model results listed in Table 2). However, MPI_XSTAR provides a faster way for the generation of photoionization grid models for spectral model fitting tools such as XSPEC (Arnaud, 1996) and ISIS (Houck & Denicola, 2000).

In summary, the new code MPI_XSTAR is able to speed up the photoionization modeling procedure. An important application is a fast generation of photoionization table models of X-ray warm absorbers in AGNs (e.g., Danehkar et al., 2017), whose computation, depending on the number of CPUs requested for the parallel execution, is shorter than a serial execution using XSTAR2XSPEC. The parallelization of XSTAR might be implementable on the graphical processing units (GPU) using the CUDA library. Moreover. it might be possible to parallelize the internal routines (currently in Fortran 77) of the program XSTAR, which will significantly expedite photoionization simulations of ionized gaseous clouds. An MPI-CUDA GPU-based parallelization and rewriting the XSTAR internal routines based on the MPI library for the high-performance computing environments deserve further investigations in the future.

Ashkbiz Danehkar