GitHub - arm-hpc/miniAMR: Mantevo miniAMR reference proxy application

arm-hpc / miniAMR Public

Notifications You must be signed in to change notification settings
Fork 2
Star 1

Mantevo miniAMR reference proxy application

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
plot		plot
LICENSE		LICENSE
Makefile.aarch64		Makefile.aarch64
Makefile.mpi		Makefile.mpi
README		README
block.c		block.c
block.h		block.h
check_sum.c		check_sum.c
comm.c		comm.c
comm.h		comm.h
comm_block.c		comm_block.c
comm_parent.c		comm_parent.c
comm_refine.c		comm_refine.c
comm_util.c		comm_util.c
driver.c		driver.c
init.c		init.c
main.c		main.c
marker_stub.h		marker_stub.h
move.c		move.c
pack.c		pack.c
param.h		param.h
plot.c		plot.c
profile.c		profile.c
proto.h		proto.h
rcb.c		rcb.c
refine.c		refine.c
stencil.c		stencil.c
target.c		target.c
timer.h		timer.h
util.c		util.c

Repository files navigation

miniAMR mini-application

--------------------------------------
Contents of this README file:
1. miniAMR overview
2. miniAMR versions
3. building miniAMR
4. running miniAMR
5. notes about the code
--------------------------------------

--------------------------------------
1. miniAMR overview

miniAMR applies a stencil calculation on a unit cube computational domain,
which is divided into blocks. The blocks all have the same number of cells
in each direction and communicate ghost values with neighboring blocks. With
adaptive mesh refinement, the blocks can represent different levels of
refinement in the larger mesh. Neighboring blocks can be at the same level
or one level different, which means that the length of cells in neighboring
blocks can differ by only a factor of two in each direction. The calculations
on the variables in each cell is an averaging of the values in the chosen
stencil. The refinement and coarsening of the blocks is driven by objects
that are pushed through the mesh. If a block intersects with the surface
or the volume of an object, then that block can be refined. There is also
an option to uniformly refine the mesh. Each cell contains a number of
variables, each of which is evaluated indepently.

--------------------------------------
2. miniAMR versions:

- miniAMR_ref:

reference version: self-contained MPI-parallel.

- miniAMR_serial

serial version of reference version

-------------------
3. Building miniAMR:

To make the code, type 'make' in the directory containing the source.
The enclosed Makefile.mpi is configured for a general MPI installation.
Other compiler or other machines will need changes in the CFLAGS
variable to correspond with the flags available for the compiler being used.

-------------------
4. Running miniAMR:

miniAMR can be run like this:

% <mpi-run-command> ./miniAMR.x

where <mpi-run-command> varies from system to system but usually looks something like 'mpirun -np 4 ' or similar.

Execution is then driven entirely by the default settings, as configured in default-settings.h. Options may be listed using

% ./miniAMR.x --help

To run the program, there are several arguments on the command line.
The list of arguments and their defaults is as follows:

--nx - block size in x
--ny - block size in y
--nz - block size in z
These control the size of the blocks in the mesh. All of these need to
be even and greater than zero. The default is 10 for each variable.

--init_x - initial blocks in x
--init_y - initial blocks in y
--init_z - initial blocks in z
These control the number of the blocks on each processor in the
initial mesh. These need to be greater than zero. The default
is 1 block in each direction per processor. The initial mesh
is a unit cube regardless of the number of blocks.

--reorder - ordering of blocks
This controls whether the blocks are ordered by the RCB algorithm
or by a natural ordering of the processors. The default is 1 which
selects the RCB ordering and the natural ordering is 0.

--npx - number of processors in the x direction
--npy - number of processors in the y direction
--npz - number of processors in the z direction
These control the number of processors is each direction. The product
of these number has to equal the number of processors being used. The
default is 1 block in each direction.

--max_blocks - maximun number of blocks per processor
The maximun number of blocks used per processor. This is the number of
blocks that will be allocated at the start of the run and the code will
fail if this number is exceeded. The default is 500 blocks.

--num_refine - number of levels of refinement
This is the number of levels of refinement that blocks which are refined
will be refined to. If it is zero then the mesh will not be refined.
the default is 5 levels of refinement.

--block_change - number of levels a block can change during refinement
This parameter controls the number of levels that a block can change
(either refining or coarsening) during a refinement step. The default
is the number of levels of refinement.

--uniform_refine - if 1, then grid is uniformly refined
This controls whether the mesh is uniformly refined. If it is 1 then the
mesh will be uniformly refined, while if it is zero, the refinement will
be controlled by objects in the mesh. The default is 1.

--refine_freq - frequency (in timesteps) of checking for refinement
This determines the frequency (in timesteps) between checking if
refinement is needed. The default is every 5 timesteps.

--target_active - target number of blocks per processor
--target_max - max number of blocks per processor
--target_min - min number of blocks per processor
These allow the user to control the number of blocks per processor.
If these are zero, then no adjustment is made. If target_active is
greater than zero than the code will adjust the number of blocks to
that target after the refinement step. If target_max is greater than
zero then the number of blocks will be reduced if it exceeds this
number. Likewise, if target_min is greater than zero, than the number
of blocks will be raised if there is less than that number after the
refinement step. The default for all of these is zero.

--inbalance - percentage inbalance to trigger inbalance
This parameter allows the user to set a percentage threshold above
which the load will be balanced amoung the processors. The value
that this is checked against is the maximum number of blocks on a
processor minus the minimum number of blocks on a processor divided
by the average. The default is zero, which means to always load
balance at each refinement step.

--lb_opt - (0, 1, 2) determine load balance strategy
If set to 0, then load balancing is not performed. The default is
set to 1 which load balances each refinement step. Setting the
parameter to 2 results in load balancing at each stage of the
refinement step. If a processor has a large number of blocks which
are refined several steps, this allows the work (and space needed)
to be shared amoung more processors.

--num_vars - number of variables (> 0)
The number of variables the will be calculated on and communicated.
The default is 40 variables.

--comm_vars - number of vars to communicate together
The number of variables that will communicated together. This will
allow shorter but more variables if it is set to something less than
the total number of variables. The default is zero which will
communicate all of the variables at once.

--num_tsteps - number of timesteps (> 0)
The number of timesteps for which the simulation will be run. The
default is 20.

--stages_per_ts - number of comm/calc stages per timestep
The number of calculate/communicate stages per timestep. The default
is 20.

--permute - (no argument) permute communication directions
If this is set, then the order of the communication directions will
be permuted through the six options available. The default is
to send messages in the x direction first, then y, and then z.

--blocking_send - (no argument) Use blocking sends in the communication
routine instead of the default nonblocking sends.

--code - change the way communication is done
The default is 0 which communicates only the ghost values that are
needed. Setting this to 1 sends all of the ghost values, and setting
this to 2 also does all of the message processing (refinement or
unrefinement) to be done on the sending side. This allows us to
more closely minic the communication behaviour of codes.

--checksum_freq - number of stages between checksums
The number of stages between calculating checksums on the variables.
The default is 5. If it is zero, no checks are performed.

--stencil - 7 or 27 point 3D stencil
The 3D stencil used for the calculations. It can be either 7 or 27
and the default is 7 since the 27 point calculation will not conserve
the sum of the variables except for the case of uniform refinement.

--error_tol - (e^{-error_tol} ; >= 0)
This determines the error tolerance for the checksums for the variables.
the tolerance is 10 to the negative power of error_tol. The default
is 8, so the default tolerance is 10^(-8).

--report_diffusion - (>= 0) none if 0
This determines if the checksums are printed when they are calculated.
The default is 0, which is no printing.

--report_perf - (0 .. 15)
This determines how the performance output is displayed. The default
is YAML output (value of 1). There are four output modes and each is
controlled by a bit in the value. The YAML output (to a file called
results.yaml) is controlled by the first bit (report_perf & 1), the
text output file (results.txt) is controlled by the second bit
(report_perf & 2), the output to standard out is controlled by the
third bit (report_perf & 4), and the output of block decomposition
at each refine step is controlled by the forth bit (report_perf & 8).
These options can be combined in any way desired and zero to four
of these options can be used in any run. Setting report_perf to 0
will result in no output.

--refine_freq - frequency (timesteps) of refinement (0 for none)
This determines how frequently (in timesteps) the mesh is checked
and refinement is done. The default is every 5 timesteps. If
uniform refinement is turned on, the setting of refine_freq does
not matter and the mesh will be refined before the first timestep.

--refine_ghosts - (no argument)
The default is to not use the ghost cells of a block to determine if
that block will be refined. Specifying this flag will allow those
ghost cells to be used.

--num_objects - (>= 0) number of objects to cause refinement
The number of objects on which refinement is based. Default is zero.

--object - type, position, movement, size, size rate of change
The object keyword has 14 arguments. The first two are integers
and the rest are floating point numbers. They are:
type - The type of object. There is 16 types of objects. They include
the surface of a rectangle (0), a solid rectangle (1),
the surface of a spheroid (2), a solid spheroid (3),
the surface of a hemispheroid (+/- with 3 cutting planes)
(4, 6, 8, 10, 12, 14),
a solid spheroid (+/- with 3 cutting planes)(5, 7, 9, 11, 13, 15),
the surface of a cylinder (20, 22, 24),
and the volume of a cylinder (21, 23, 25).
bounce - If this is 1 then an object will bounce off of the walls
when the center hits an edge of the unit cube. If it is
zero, then the object can leave the mesh.
center - Three doubles that determine the center of the object in the
x, y, and z directions.
move - Three doubles that determine the rate of movement of the center
of the object in the x, y, and z directions. The object moves
this far at each timestep.
size - The initial size of the object in the x, y, and z directions.
If any of these become negative, the object will not be used
in the calculations to determine refinement. These sizes are
from the center to the edge in the specified direction.
inc - The change in size of the object in the x, y, and z directions.

Examples of run scripts for a Cray XE6 that illustrate several of the options:

One sphere moving diagonally on 27 processors:

mpirun -np 27 -N 7 miniAMR.x --num_refine 4 --max_blocks 9000 --npx 3 --npy 3 --npz 3 --nx 8 --ny 8 --nz 8 --num_objects 1 --object 2 0 -1.71 -1.71 -1.71 0.04 0.04 0.04 1.7 1.7 1.7 0.0 0.0 0.0 --num_tsteps 100 --checksum_freq 1

An expanding sphere on 64 processors:

mpirun -np 64 miniAMR.x --num_refine 4 --max_blocks 6000 --init_x 1 --init_y 1 --init_z 1 --npx 4 --npy 4 --npz 4 --nx 8 --ny 8 --nz 8 --num_objects 1 --object 2 0 -0.01 -0.01 -0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0009 0.0009 0.0009 --num_tsteps 200 --comm_vars 2

Two moving spheres on 16 processors:

mpirun -np 16 miniAMR.x --num_refine 4 --max_blocks 4000 --init_x 1 --init_y 1 --init_z 1 --npx 4 --npy 2 --npz 2 --nx 8 --ny 8 --nz 8 --num_objects 2 --object 2 0 -1.10 -1.10 -1.10 0.030 0.030 0.030 1.5 1.5 1.5 0.0 0.0 0.0 --object 2 0 0.5 0.5 1.76 0.0 0.0 -0.025 0.75 0.75 0.75 0.0 0.0 0.0 --num_tsteps 100 --checksum_freq 4 --stages_per_ts 16

-------------------
5. The code:

block.c Routines to split and recombine blocks
check_sum.c Calculates check_sum for the arrays
comm_block.c Communicate new location for block during refine
comm.c General routine to do interblock communication
comm_parent.c Communicate refine/unrefine information to parents/children
comm_refine.c Communicate block refine/unrefine to neighbors during refine
comm_util.c Utilities to manage communication lists
driver.c Main driver
init.c Initialization routine
main.c Main routine that reads command line and launches program
move.c Routines that check overlap of objects and blocks
pack.c Pack and unpack blocks to move
plot.c Write out block information for plotting
profile.c Write out performance data
rcb.c Load balancing routines
refine.c Routines to direct refinement step
stencil.c Perform stencil calculations
target.c Add/subtract blocks to reach a target number
util.c Utility routines for timing and allocation

-- End README file.

Courtenay T. Vaughan
([email protected])