forked from gcdart/dense-matrix-mult
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
60 lines (43 loc) · 2.13 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
/* Copyright (C) 2013, Siddharth Gopal (gcdart AT gmail)
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2.1 of the GNU Lesser General Public License
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA, 02111-1307, USA,
* or visit the GNU web site, www.gnu.org.
*/
This folder contains a simple matrix-multiplication routine to test the
performance of 5 BLAS implementations - ACML , ATLAS , EIGEN , MKL and OpenBLAS.
All the libraries are assumed to be already installed in the system.
To compile,
1. Go inside each folder.
2. Open compile.sh and put in the appropriate paths.
3. sh compile.sh
This will product two executables in each folder
1. a.out
This is the single-threaded version of matrix-multiplication.
2. ap.out
This is the multithreaded version of matrix-multiplication. The number of
threads can be controlled by setting OMP_NUM_THREADS variables.
Note that the number of threads for the ATLAS library is determined at
compile time and cannot be dynamically changed. The OMP_NUM_THREADS variable
has no effect on ATLAS
There are three shell scripts,
1. run_single_cpu.sh
Computes the average matrix-multiplication time (over 5 runs) for each
implementation on progressively increasing sizes of the matrices.
2. run_multiple_cpu.sh
Computes the average matrix-multiplication time (over 5 runs) for each
implementation on progressively increasing sizes of the matrices and
increasing number of threads.
3. run_scaling.sh
Computes the average matrix-multiplication time (over 5 runs) for each
implementation on a 5000x5000 matrix with increasing number of threads.