The flag -DSCHWARZ_BUILD_BENCHMARKING
(default ON
) enables the examples and benchmarking snippets.
If schwarz-lib
has been built with deal.ii
, then the deal.ii
examples, ex_6
and ex_9
are also built, else only the bench_ras
example is built.
The following command line options are available for this example. This is setup using gflags
.
The executable is run in the following fashion:
[MPI_COMMAND] [MPI_OPTIONS] PATH_TO_EXECUTABLE [FLAGS]
Where FLAGS
are the options below with the template flag_name [type][default_value]
. For example, to set the number of iterations of the RAS solver to 100 one would add --num_iters=100
to the executable command above.
executor
[std::string][reference] : The executor used to run the solver, one ofreference
,cuda
oromp
.explicit_laplacian
[bool][false] : Use the explicit laplacian instead of deal.ii's matrix.set_1d_laplacian_size
[uint32][16] : The number of grid points in one dimension for the 2D laplacian problem.enable_random_rhs
[bool][false] : Use a random rhs instead of the default 1.0's .overlap
[uint32][2] : Overlap between the domains.timings_file
[std::string][null] : The filename for the timings.partition
[std::string][regular] : The partitioner used. The choices aremetis
,regular
orregular2d
.metis_objtype
[std::string][null] : The objective type to minimize for the metis partitioner. The choices areedgecut
andtotalvol
.num_threads
[uint32][1] : Number of threads to bind to a process.non_symmetric_matrix
[bool][false] : Explicitly state that the matrix is non-symmetric so that the local GMRES solver is used.use_mixed_precision
[bool][false] : Use mixed precision in the communication.
matrix_filename
[std::string][null] : The matrix file to read the global system matrix from.
enable_debug_write
[bool][false] : Enable some debugging outputs to stdout.write_comm_data
[bool][false] : Write the number of sends and recvs of each subdomain to files.write_perm_data
[bool][false] : Write the permutation data from CHOLMOD to a file.print_config
[bool][true] : Print the configuration of the run.print_matrices
[bool][false] : Print the local system matrices to a file.debug
[bool][false] : Enable some possible expensive debug checks.enable_logging
[bool][false] : Enable some possible expensive logging from Ginkgo.
num_iters
[uint32][100] : The number of outer iterations for the RAS solver.set_tol
[double][1e-6] : The Outer tolerance for the RAS solver.local_tol
[double][1e-12] : The Inner tolerance for the local iterative solver.
enable_onesided
[bool][false] : Enable the onesided asynchronous communication.enable_twosided
[bool][true] : Enable the twosided asynchronous communication. A dummy flag.stage_through_host
[bool][false] : Enable staging transfers through host.enable_one_by_one
[bool][false] : Enable putting/getting of each element in onesided communication.enable_put_all_local_residual_norms
[bool][false] : Enable putting of all local residual norms"enable_comm_overlap
[bool][false] : Enable overlap of communication and computation.flush_type
[std::string][flush-all] : The window flush strategy. The choices areflush-local
andflush-all
.lock_type
[std::string][lock-all] : The window lock strategy. The choices arelock-local
andlock-all
.remote_comm_type
[std::string][get] : The type of the remote communication.get
usesMPI_Get
andput
usesMPI_Put
.
enable_global_check
[bool][false] : Use the global convergence check for twosided.global_convergence_type
[std::string][centralized-tree] : Choose the convergence detection algorithm for onesided.enable_decentralized_accumulate
[bool][false] : Use accumulate strategy for decentralized convergence check..enable_global_check_iter_offset
[bool][false] : Enable global convergence check only after a certain number of iterations.
local_solver
[std::string][iterative-ginkgo] : The local solver used in the local domains. The current choices aredirect-cholmod
,direct-ginkgo
oriterative-ginkgo
.local_factorization
[std::string][cholmod] : The factorization for the local direct solver "cholmod" or "umfpack".local_reordering
[std::string][none] : The reordering for the local direct solver "none", "metis_reordering" or "rcm_reordering".factor_ordering_natural
[bool][false] : If true uses natural ordering instead of the default optimized ordering. This is needed for CUDA runs as the factorization ordering needs to be given to the solver.enable_local_precond
[bool][false] : If true uses the Block jacobi preconditioning for the local iterative solver.precond_max_block_size
[uint32][16]: Maximum size of the blocks for the block jacobi preconditionershifted_iter
[uint32][1] : The number of iterations to communicate for the local subdomains.local_max_iters
[int32][-1] : The maximum number of iterations for the local iterative solver.restart_iter
[uint32][1] : The restart iter for the GMRES solver.reset_local_crit_iter
[int32][-1] : The RAS iteration to reset the local iteration count.
This example runs is written within the benchmarking/bench_ras.cpp
file. This demonstrates the basic capabilities of schwarz-lib
. You can use it to solve the 2D Poisson equation with a 5 point stencil or solve a generic matrix by providing it a matrix file.
These examples use deal.ii
's capabilities to generate a matrix and solution is computed with the RAS method.
Possible settings are:
num_refine_cycles
[uint32][1][disabled] : The number of refinement cycles when used withdeal.ii
.init_refine_level
[uint32][4] : The initial refinement level of the problem. This sets the initial number of dof's.dealii_orig
[bool][false] : Solve with the deal.ii iterative CG instead of the RAS solver.vis_sol
[bool][false] : Print the solution for visualization.
The benchmarking/dealii_ex_6.cpp
demonstrates the solution of the Poisson equation with adaptive refinement as explained on the deal.ii example documentation page
The benchmarking/dealii_ex_9.cpp
demonstrates the solution of the Advection equation with adaptive refinement as explained on the deal.ii example documentation page