README

************************
miniVite (/mini/ˈviːte/)
************************

*******
-------
 ABOUT
-------
*******
miniVite[*] is a proxy app that implements a single phase of Louvain 
method in distributed memory for graph community detection. Please 
refer to the following paper for a detailed discussion on
distributed memory Louvain method implementation:
https://ieeexplore.ieee.org/abstract/document/8425242/

Apart from real world graphs, users can use specific options 
to generate a Random Geometric Graph (RGG) in parallel.
RGGs have been known to have good community structure:
https://arxiv.org/pdf/1604.03993.pdf

The way we have implemented a parallel RGG generator, vertices 
owned by a process will only have cross edges with its logical
neighboring processes (each process owning 1x1/p chunk of the
1x1 unit square). If MPI process mapping is such that consecutive 
processes (for e.g., p and p+1) are physically close to each other, 
then there is not much communication stress in the application. 
Therefore, we allow an option to add extra edges between randomly 
chosen vertices, whose owners may be physically far apart. 

We require the total number of processes to be a power of 2 and 
total number of vertices to be perfectly divisible by the number of 
processes when parallel RGG generation options are used. 
This constraint does not apply to real world graphs passed to miniVite.

We also allow users to pass any real world graph as input. However,
we expect an input graph to be in a certain binary format, which
we have observed to be more efficient than reading ASCII format
files. The code for binary conversion (from a variety of common
graph formats) is packaged separately with Vite, which is our
full implementation of Louvain method in distributed memory.
Please follow instructions in Vite README for binary file 
conversion.

Vite could be downloaded from (please don't use the past
PNNL/PNL link to download Vite, the following GitHub
link is the correct one):
https://github.com/ECP-ExaGraph/vite

Unlike Vite, we do not implement any heuristics to improve the
performance of Louvain method. miniVite is a baseline parallel
version, implementing only the first phase of Louvain method.

This code requires an MPI library (preferably MPI-3 compatible) 
and C++11 compliant compiler for building. 

Please contact the following for any queries or support:

Sayan Ghosh, PNNL (sg0 at pnnl dot gov)
Mahantesh Halappanavar, PNNL (hala at pnnl dot gov)

[*] Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman 
A, Gebremedhin AH. miniVite: A graph analytics benchmarking tool 
for massively parallel systems. In 2018 IEEE/ACM Performance Modeling, 
Benchmarking and Simulation of High Performance Computer Systems 
(PMBS) 2018 Nov 12 (pp. 51-56). IEEE.

Please '*' this repository on GitHub if the code is useful to you.

*************
-------------
 COMPILATION
-------------
*************
Please update the Makefile with compiler flags and use a C++11 compliant 
compiler of your choice. Invoke `make clean; make` after setting paths 
to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun`
to execute the code with specific runtime arguments mentioned in the
next section.

Pass -DPRINT_DIST_STATS for printing distributed graph 
characteristics.

Pass -DDEBUG_PRINTF if detailed diagonostics is required along
program run. This program requires OpenMP and C++11 support,
so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/
-std=c++0x.

Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are 
within 32-bit range (2 x 10^9), else 64-bit range is assumed.

Pass -DOMP_SCHEDULE_RUNTIME if you want to set OMP_SCHEDULE
for all parallel regions at runtime. If -DOMP_SCHEDULE_RUNTIME 
is passed, and OMP_SCHEDULE is not set, then the default schedule will
be chosen (which is most probably "static" or "guided" for most of 
the OpenMP regions).

Communicating vertex-community information (per iteration) 
is the most expensive step of our distributed Louvain 
implementation. We use the one of the following MPI communication 
primitives for communicating vertex-community during a Louvain
iteration, that could be enabled by passing predefined
macros at compile time:

1. MPI Collectives:  -DUSE_MPI_COLLECTIVES
2. MPI Send-Receive: -DUSE_MPI_SENDRECV
3. MPI RMA:          -DUSE_MPI_RMA (using -DUSE_MPI_ACCUMULATE 
                     additionally ensures atomic put) 
4. Default:          Uses MPI point-to-point nonblocking API in communication 
                     intensive parts..

Apart from these, we use MPI (blocking) collectives, mostly
MPI_Alltoall.

There are other predefined macros in the code as well for printing
intermediate results or checking correctness or using a particular
C++ data structure. 

***********************
-----------------------
 EXECUTING THE PROGRAM
-----------------------
***********************

E.g.: 
mpiexec -n 2 bin/./minivite -f karate.bin
mpiexec -n 2 bin/./minivite -l -n 100
mpiexec -n 2 bin/./minivite -n 100
mpiexec -n 2 bin/./minivite -p 2 -n 100

[On Cray systems, pass MPICH_MAX_THREAD_SAFETY=multiple or 
pass -DDISABLE_THREAD_MULTIPLE_CHECK while building miniVite.]

Possible options (can be combined):

1. -f <bin-file>   : Specify input binary file after this argument. 
2. -b              : Only valid for real-world inputs. Attempts to distribute approximately 
                     equal number of edges among processes. Irregular number of vertices
                     owned by a particular process. Increases the distributed graph creation
                     time due to serial overheads, but may improve overall execution time.
3. -n <vertices>   : Only valid for synthetically generated inputs. Pass total number of 
                     vertices of the generated graph.
4. -l              : Use distributed LCG for randomly choosing edges. If this option 
                     is not used, we will use C++ random number generator (using 
                     std::default_random_engine).
5. -p <percent>    : Only valid for synthetically generated inputs. Specify percent of overall 
                     edges to be randomly generated between processes.
6. -t <threshold>  : Specify threshold quantity (default: 1.0E-06) used to determine the 
                     exit criteria in an iteration of Louvain method.
7. -w              : Only valid for synthetically generated inputs. Use Euclidean distance as edge weight. 
                     If this option is not used, edge weights are considered as 1.0. Generate 
                     edge weight uniformly between (0,1) if Euclidean distance is not available.                    
8. -r <nranks>     : This is used to control the number of aggregators in MPI I/O and is
                     meaningful when an input binary graph file is passed with option "-f".
                     naggr := (nranks > 1) ? (nprocs/nranks) : nranks;
9. -s              : Print graph data (edge list along with weights).