Skip to content

Commit

Permalink
moses v2, rises
Browse files Browse the repository at this point in the history
  • Loading branch information
andylamp committed Dec 30, 2018
1 parent 05e8972 commit ed53c83
Show file tree
Hide file tree
Showing 20 changed files with 671 additions and 182 deletions.
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ ambient dimension (`n`) is large (>1000).
# Requirements

The code is generally self contained and all datasets are included or
generated thus, in theory, just having `Matlab` installed should be more than
enough. It has to be noted though that due the recent `Matlab` changes on
how it handles character and string arrays you should use a recent
generated thus, in theory, just having `Matlab` installed should be more
than enough. It has to be noted though that due the recent `Matlab` changes
on how it handles character and string arrays you should use a recent
version of it -- the code was developed and tested in `Matlab` `2017b` and
tested also on versions `2017a`, `2018a`; moreover, to address different
OSes, care has been taken so that this code runs without any problems both
Expand All @@ -30,18 +30,22 @@ on Windows-based machines as well as Unix-based ones.
# Streaming, memory limited, r-truncated SVD Method Comparison

In this instance we perform a comparison using both synthetic and real
data against three methods which compute an approximate *memory-limited,
streaming r-truncated SVD*. These methods are the following:
data against a few similar methods which compute in part or fully an
approximate *memory-limited, streaming r-truncated SVD*.
These methods are the following:

* MOSES (https://arxiv.org/pdf/1806.01304.pdf)
* Power Method (https://arxiv.org/pdf/1307.0032.pdf)
* Frequent Directions (https://arxiv.org/abs/1501.01711.pdf)
* Robust Frequent Directions (https://arxiv.org/pdf/1705.05067.pdf)
* GROUSE (https://arxiv.org/pdf/1702.01005.pdf)

# Running the comparison

Running the comparison is simple -- just `cd` to the cloned `moses` directory
within `Matlab` and run `comparison.m`. Running might take a while, if you want
to speed things up just try altering the setup parameters shown below:
Running the comparison is simple -- just `cd` to the cloned `moses`
directory within `Matlab` and run `comparison.m`. Running might take a
while, if you want to speed things up just try altering the setup
parameters shown below:

```Matlab
% experiments to run
Expand All @@ -61,7 +65,7 @@ fig_print = 1; % print resulting figures as .fig
use_fast_moses_only = 1;% speed up by using fast moses <-- USE IT :)
use_offline_svds = 1; % drastically speed up execution by disabling
% offline svds calculation WARNING THIS OPTION IS
% PAINFULLY SLOW. <- DEf. DISABLE IT :)
% PAINFULLY SLOW. <- DEF. DISABLE IT :)
use_blk_err = 0; % calc. errors per block not per column
% provides a DRASTIC improvement in speed but less
% granular error reporting. For GROUSE it is 100
Expand Down Expand Up @@ -252,6 +256,9 @@ plots are the following:
The code is organised mainly in the following files:

* `comparison.m`: The main starting point of the experiments, initial parameters are defined there.
* `fd.m`: Implementation of Frequent Directions
* `fd_rotate_sketch.m`: helper method for both Frequent Directions methods
* `fdr.m`: Implementation of Robust Frequent Directions
* `grouse.m`: Original `GROUSE` algorithm code as provided from the paper
* `mitliag_pm.m`: Implementation of Mitliagkas Power Method for Streaming PCA
* `moses_fast.m`: A more efficient implementation of MOSES
Expand All @@ -276,8 +283,8 @@ the paper authors retain their respective copyrights.

# Acknowledgement

If you find our paper useful or use this code, please consider citing our work
as such:
If you find our paper useful or use this code, please consider citing our
work as such:

```
@misc{1806.01304,
Expand All @@ -290,7 +297,7 @@ Eprint = {arXiv:1806.01304},

# Disclaimer

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS AS IS
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
Expand Down
43 changes: 25 additions & 18 deletions comparison.m
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
%% Comparison script for Streaming r-truncated SVD (Moses, PM, GROUSE)
%% Comparison script for Streaming r-truncated SVD (Moses, PM, FD, & GROUSE)
%
% Description:
% This code is supplied as additional material alongside our paper:
Expand All @@ -11,30 +11,31 @@
%
% The script is segmented into four main categories:
%
% -- Synthetic data evaluation: bench PM, MOSES, & GROUSE using
% -- Synthetic data evaluation: bench PM, MOSES, FD, RFD, & GROUSE using
% synthetic datasets
% -- Real data evaluation: bench PM, MOSES, & GROUSE using real datasets
% -- Real data evaluation: bench PM, MOSES, FD, RFD, & GROUSE using real
% datasets
% -- Speed tests: compare the execution speed of MOSES when compared
% with PM & GROUSE
% with PM, FD, RFD, & GROUSE
% -- MOSES scaling tests: compare the performance of MOSES, in terms of
% error across different parameters of
% block size (b), rank (r), and ambient dim. (n)
%
% Author: Andreas Grammenos ([email protected])
%
% Last touched date 06/06/2018
% Last touched date: 30/12/2018
%
% License:
% code: GPLv3
% paper: A. Eftekhari, R. A. Hauser, and A. Grammenos retain their
% respective copyrights (link: https://arxiv.org/abs/1806.01304)
% code: GPLv3, author: A. Grammenos
% paper: A. Eftekhari, R. Hauser, and A. Grammenos retain their respective
% copyrights (pre-print link: https://arxiv.org/abs/1806.01304)
%
%

%% Initialisation

% clear/close everything
clc; clear all; close all;
clc; clear; close all;

% enable for reproducibility, comment for (slightly) different
% (~random) results
Expand All @@ -45,6 +46,7 @@
global datasetPath
global use_fast_moses_only
global use_offline_svds
global use_fdr
global use_blk_err
global pdf_print
global fig_print
Expand All @@ -60,8 +62,8 @@
global run_exp3

% experiments to run
run_synthetic = 0; % run synthetic evaluation (set 0 to skip)
run_real = 1; % run real data evaluation (set 0 to skip)
run_synthetic = 1; % run synthetic evaluation (set 0 to skip)
run_real = 0; % run real data evaluation (set 0 to skip)
run_speed_test = 0; % run the calc. speed tests (set 0 to skip)
run_moses_scaling = 0; % run the scaling moses tests (set 0 to skip)

Expand All @@ -76,10 +78,13 @@
use_fast_moses_only = 1;% speed up by using fast moses <-- USE IT :)
use_offline_svds = 0; % drastically speed up execution by disabling
% offline svds calculation WARNING THIS OPTION IS
% PAINFULLY SLOW. <- DEf. DISABLE IT :)
% PAINFULLY SLOW. <- DEF. DISABLE IT :)
use_fdr = 0; % use robust fd -- same as fd but on the recon.
% we normalise using a*Id; using the shifted
% subspace by a*Id does not work well in our case.
use_blk_err = 0; % calc. errors per block not per column
% provides a DRASTIC improvement in speed but less
% granular error reporting. For GROUSE it is 100
% granular error reporting. For GROUSE & FD is 100
% for PM and MOSES is equal to their respective
% block sizes for each run. <- Prob. use it

Expand Down Expand Up @@ -156,7 +161,6 @@
else
fprintf("\n ** Running algorithm speed evaluation **\n");


% power law distribution params
alpha = 1;
% no. of trials
Expand All @@ -178,6 +182,11 @@
fprintf("\n !! Testing fat-r recovery n > r, with r=%d !!\n", r);
speed_test(n_arr, r, alpha, trials)

r = 100; % target rank
fprintf("\n !! Testing super fat-r recovery n > r, with r=%d !!\n", r);
speed_test(n_arr, r, alpha, trials)


fprintf("\n ** Finished algorithm speed evaluation **\n");
end

Expand All @@ -192,14 +201,12 @@

n_arr = 200:200:1200; % ambient dimension array
r_arr = 5:5:25; % r-rank
m_blk_mul = 1:1:15; % block multiplier (we are bound by 2*r)
m_blk_mul = 1:1:15; % block multiplier (we are bound by r)

% Execute the scaling test
moses_scaling(n_arr, r_arr, m_blk_mul);

fprintf("\n ** Finished MOSES scaling evaluation **\n");
end

%% Comparison script end.


%% Comparison script end.
86 changes: 86 additions & 0 deletions fd.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
function [Bout, ErrFro, T, Yr, t] = fd(Y, ell, no_err)
%FD Find the frequent directions of a given matrix Y in a
%streaming fashion
%
% FD is based on Liberty et al.: https://arxiv.org/abs/1501.01711.pdf
%
% Author: Andreas Grammenos ([email protected])
%
% Last touched date: 30/12/2018
%
% License: GPLv3
%
fprintf('\n ** Running regular FD...\n');

% scope in global variables
global use_blk_err

% initialisations
m = 2 * ell;
[~, cols] = size(Y);
Br = zeros(m, cols);
nz_row = 1;

% default block size
blk_size = 100;
cnt = 1;

% no error by default
if nargin < 3
no_err = 1;
end

% the number of rows
numr = size(Y, 1);

% initialise error metrics
if use_blk_err == 1
ErrFro = nan(1, floor(numr/blk_size));
T = nan(1, floor(numr/blk_size));
else
ErrFro = nan(1, numr);
T = 1:numr;
end

% start timing
ts = tic;

% loop through matrix
for k = 1:numr
% check if we need to squeeze
if (nz_row >= m)
% squeeze
[Br, nz_row, ~] = fd_rotate_sketch(Br, ell);
end
% append the current values
Br(nz_row, :) = Y(k, :);
% increment the next zero row counter
nz_row = nz_row + 1;

% calcualte the error, if needed
if no_err == 0
if use_blk_err == 1
if mod(k, blk_size) == 0
y_c = Y(1:k, :);
YrHat_c = y_c*(Br(1:ell, :)'*Br(1:ell, :));
temp = sum(sum((y_c-YrHat_c).^2, 1));
ErrFro(cnt) = temp/k;
T(cnt) = k; cnt = cnt + 1;
end
else
% calculate the reconstruction error
y_c = Y(1:k, :);
YrHat_c = y_c*(Br(1:ell, :)'*Br(1:ell, :));
temp = sum(sum((y_c-YrHat_c).^2, 1));
ErrFro(k) = temp/k;
end
end

end
% also set the final estimate of Yr
Yr = Y*(Br(1:ell, :)'*Br(1:ell, :));
% only return the subset of the sketch that is of value
Bout = Br(1:ell, :);
% calcualte the current trial execution delta
t = my_toc(ts);
end
38 changes: 38 additions & 0 deletions fd_rotate_sketch.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
function [B_out, nz_row, alpha] = fd_rotate_sketch(B_in, ell, alpha_prev)
%FD_ROTATE_SKETCH the main shirk and rotation that's performed when
%receiving a new row and our buffer is full from our stream.
%
% Author: Andreas Grammenos ([email protected])
%
% Last touched date: 30/12/2018
%
% License: GPLv3
%
% initialise output
B_out = B_in;
% default alpha value is zero, if we don't use it
if nargin < 3
alpha_prev = 0;
end
% calculate the svds of B
[~, S, Vt] = svd(B_in);
Sd = diag(S);
[sd_rows, ~] = size(Sd);
if sd_rows >= ell
% take the square error or the last row compared to the sketch
shrunk_sketch = sqrt(Sd(1:ell, :).^2 - Sd(ell).^2);
% update the sketch
B_out(1:ell, :) = diag(shrunk_sketch) * Vt(1:ell, :);
% zero out the last row
B_out(ell + 1, :) = 0;
nz_row = ell + 1;
else
% update the portion of the sketch
B_out(1:sd_rows, :) = S * Vt(1:sd_rows, :);
% zero out the last row of the sketch
B_out(1:sd_rows + 1, :) = 0;
nz_row = sd_rows + 1;
end
% calculate the new regulariser vector
alpha = alpha_prev + (Sd(ell)^2)/2;
end
Loading

0 comments on commit ed53c83

Please sign in to comment.