Pairwise summation #577

LukeMathWalker · 2019-01-04T22:35:58Z

Motivation

The naive summation algorithm can cause significant loss in precision when summing several floating point numbers - pairwise summation mitigates the issue without adding significant computational overhead.

Ideally, this should only be used when dealing with floats, but unfortunately we can't specialize on the argument type. Mitigating precision loss in floating point sums, on the other hand, is of paramount importance for a lot of algorithms relying on it as a building block - given that the overhead is minimal, I'd argue that the precision benefit is sufficient to adopt it as standard summation method.

Public API

Nothing changes.

Technicalities

I haven't modified the overall structure in sum and sum_axis: I have just provided some helper functions that isolate the pairwise summation algorithm.

The function that sums slices leverages the available unrolled_fold to vectorise the sum once the vector length is below the size threhold.

The function that sums iterators falls back on the slices summation function as soon as the first pass is over, to recover vectorisation given that the array of partial sums is contiguous in memory (only relevant for array with more than 512^2 elements).

The function that sums an iterator of arrays is self-recursive.

It would probably be possible to share the pagination logic (to sum in blocks of 512) between the two functions operating on iterators, but I haven't been able to do it so far.

…th array items

… across implementations

bluss · 2019-01-05T11:02:01Z

Nice. It's not obvious to me that we should use the current implementation, this, or other compensated summing algorithm in the default sums.

We can specialize on the argument type - we only need to take the step to requiring 'static on the scalar type to be able to use type id / "static" / "simple and stupid" specialization which is what for example matrix multiplication does. I think we should take this step. (In future Rust we hope that type id based specialization is usable without this bound.)
This is a scalable solution. Unrolled fold is already using a split into 8 subfactors which should improve precision, but it's only used when we have contiguous input of course.
Not complete without benchmarks of contiguous and other cases

LukeMathWalker · 2019-01-05T11:21:06Z

The only other compensated summation algorithm that comes to my mind is Kahan's summation algorithm, but its performance overhead is not negligible, hence I wouldn't provide it as a default implementation.
Alternatively we could provide different sum methods and let the user choose what they need: this is what Julia does, using pairwise summation in their default sum method (here) while providing a variant of Kahan's summation as an additional method - here. NumPy instead opted to provide pairwise summation as default implementation without an additional method for Kahan's summation - here.

Re 1. - Wouldn't using TypeId + 'static enable this implementation only for float primitives, f32 and f64? There are structs that implement the Float trait while encapsulating a float primitive to provide additional invariants (e.g. N64/N32 and R64/R32 in noisy_floats - docs) that would not be specialized unless we can specialize to types that implement the Float trait, which as far as I understand is not currently possible.
Alternatively, we could have a list of "common" float wrappers and specialize for all of them, but it doesn't sound very scalable.

Re. 3 - Ok, I'll work on it and add them to the PR.

This reverts commit 3085194.

LukeMathWalker · 2019-01-05T13:33:39Z

Running these benchmarks I get:

 name                    master ns/iter  branch ns/iter  diff ns/iter  diff %  speedup 
 contiguous_sum_1e2      16               16                         0   0.00%   x 1.00 
 contiguous_sum_1e4      1,223            1,415                    192  15.70%   x 0.86 
 contiguous_sum_1e7      3,956,575        4,089,350            132,775   3.36%   x 0.97 
 contiguous_sum_int_1e4  1,568            1,938                    370  23.60%   x 0.81
 sum_by_col_1e4          297,202          290,586               -6,616  -2.23%   x 1.02 
 sum_by_row_1e4          431,325          425,512               -5,813  -1.35%   x 1.01

Never done serious benchmarking before, so take these results with a grain of salt.

jturner314 · 2019-01-08T20:34:28Z

@LukeMathWalker Thanks for working on this! Pairwise summation is better than the current implementation.

I would hope we'd be able to use pairwise summation without significant performance cost. I'm surprised that the differences in some of those benchmarks are as large as they are.

A few thoughts:

I think we should use pairwise summation as the implementation of .sum(). It's the default for NumPy and Julia, and it seems like a clear win if the performance cost is small.
We can specialize on integer types to avoid pairwise summation for them if that turns out to provide a significant performance benefit. However, the general case should be pairwise, because as @LukeMathWalker pointed out, there are many types other than f32 and f64 that need the pairwise behavior for accuracy.

It's unsatisfying to add an array_pairwise_sum function just for sum_axis. How about something like the example below instead? I would hope the performance would be similar.

pub fn sum_axis(&self, axis: Axis) -> Array<A, D::Smaller>
where
    A: Clone + Zero + Add<Output = A>,
    D: RemoveAxis,
{
    let mut out = Array::zeros(self.dim.remove_axis(axis));
    Zip::from(&mut out)
        .and(self.lanes(axis))
        .apply(|out, lane| *out = lane.sum());
    out
}

We should make sure that pairwise_sum and iterator_pairwise_sum add up the same number of values in the base case. In the current version of this PR, they don't, since pairwise_sum calls unrolled_fold which performs an eightfold unrolled sum. To make sure pairwise_sum and iterator_pairwise_sum add up the same number of values in the base case, iterator_pairwise_sum needs to divide NAIVE_SUM_THRESHOLD by 8.
It would be nice for .product() to be pairwise too. We could add a .pairwise_reduce() method for ArrayBase that would assume that the operation f is associative and operate in a pairwise way, and then use that method to implement both .sum() and .product(). Assuming the compiler inlines sufficiently aggressively, this shouldn't have a performance cost.
```
impl<A, S, D> ArrayBase<S, D>
where
    S: RawData<Elem = A>,
    D: Dimension,
{
    fn pairwise_reduce(&self, base: usize, id: A, f: F) -> A
    where
        F: Fn(A, A) -> A,
        A: Clone,
    {}
}
```

LukeMathWalker · 2019-01-09T08:54:39Z

We can specialize on integer types to avoid pairwise summation for them if that turns out to provide a significant performance benefit. However, the general case should be pairwise, because as @LukeMathWalker pointed out, there are many types other than f32 and f64 that need the pairwise behavior for accuracy.

We will still fall short on new types wrapping integer values, but this seems like a better solution. If we plan on specializing, we might also take advantage of the fact that, depending on the byte size, we might get more integers packed together using SIMD compared to floats. But I guess this falls under the broader discussion on how to leverage SIMD instructions in ndarray.

It's unsatisfying to add an array_pairwise_sum function just for sum_axis. How about something like the example below instead? I would hope the performance would be similar.

I tried it out and this is the performance I get:

name            master ns/iter   branch ns/iter  diff ns/iter   diff %  speedup 
sum_by_col_1e4  287,381          303,410               16,029    5.58%   x 0.95 
sum_by_row_1e4  420,978          1,019,645            598,667  142.21%   x 0.41

We should make sure that pairwise_sum and iterator_pairwise_sum add up the same number of values in the base case. In the current version of this PR, they don't, since pairwise_sum calls unrolled_fold which performs an eightfold unrolled sum. To make sure pairwise_sum and iterator_pairwise_sum add up the same number of values in the base case, iterator_pairwise_sum needs to divide NAIVE_SUM_THRESHOLD by 8.

Good catch - I'll fix it.

It would be nice for .product() to be pairwise too.

Would it really be beneficial? When doing long products what I usually fear is overflowing/underflowing. To get around that, I use the exp-ln trick, as in rust-ndarray/ndarray-stats#20 for geometric_mean.

This reverts commit 8f95705.

jturner314 · 2019-01-11T01:15:46Z

I took another look at .sum_axis() and decided to benchmark this time instead of just guessing. :) This has slightly better performance on my machine than the PR for the sum_by_col_1e4 and sum_by_row_1e4 benchmarks and is more concise:

    pub fn sum_axis(&self, axis: Axis) -> Array<A, D::Smaller>
        where A: Clone + Zero + Add<Output=A>,
              D: RemoveAxis,
    {
        let n = self.len_of(axis);
        let stride = self.strides()[axis.index()];
        if self.ndim() == 2 && stride == 1 {
            // contiguous along the axis we are summing
            let mut res = Array::zeros(self.raw_dim().remove_axis(axis));
            let ax = axis.index();
            for (i, elt) in enumerate(&mut res) {
                *elt = self.index_axis(Axis(1 - ax), i).sum();
            }
            res
        } else if self.len_of(axis) <= numeric_util::NO_SIMD_NAIVE_SUM_THRESHOLD {
            self.fold_axis(axis, A::zero(), |acc, x| acc.clone() + x.clone())
        } else {
            let (v1, v2) = self.view().split_at(axis, n / 2);
            v1.sum_axis(axis) + v2.sum_axis(axis)
        }
    }

To make sure pairwise_sum and iterator_pairwise_sum add up the same number of values in the base case, iterator_pairwise_sum needs to divide NAIVE_SUM_THRESHOLD by 8.

Actually, there's another issue I didn't catch earlier. In the base case, pairwise_sum adds 64 values, and then for higher levels, adds pairs of sums. In contrast, in the base case, iterator_pairwise_sum adds 64 values (computing partial_sums), then in the next higher level adds groups of 64 sums (in the pairwise_sum call), and then for the higher levels adds pairs of sums. In other words, iterator_pairwise_sum has two levels of summing groups of 64 values, not just the base level, so it needs to be changed to match pairwise_sum.

I also noticed that we technically need to implement unrolled_fold pairwise with something like this:

pub fn unrolled_fold<A, I, F>(mut xs: &[A], init: I, f: F) -> A
    where A: Clone,
    I: Fn() -> A,
    F: Fn(A, A) -> A,
{
    // eightfold unrolled so that floating point can be vectorized
    // (even with strict floating point accuracy semantics)
    let (mut p0, mut p1, mut p2, mut p3,
         mut p4, mut p5, mut p6, mut p7) =
        (init(), init(), init(), init(),
         init(), init(), init(), init());
    while xs.len() >= 8 {
        p0 = f(p0, xs[0].clone());
        p1 = f(p1, xs[1].clone());
        p2 = f(p2, xs[2].clone());
        p3 = f(p3, xs[3].clone());
        p4 = f(p4, xs[4].clone());
        p5 = f(p5, xs[5].clone());
        p6 = f(p6, xs[6].clone());
        p7 = f(p7, xs[7].clone());

        xs = &xs[8..];
    }
    let (q0, q1, q2, q3) = (f(p0, p4), f(p1, p5), f(p2, p6), f(p3, p7));
    let (r0, r1) = (f(q0, q2), f(q1, q3));
    let unrolled = f(r0, r1);

    // make it clear to the optimizer that this loop is short
    // and can not be autovectorized.
    let mut partial = init();
    for i in 0..xs.len() {
        if i >= 7 { break; }
        partial = f(partial.clone(), xs[i].clone())
    }

    f(unrolled, partial)
}

The performance difference is within the +/- bounds of the benchmarks.

By the way, after more thought, I'd prefer to rename NO_SIMD_NAIVE_SUM_THRESHOLD to NAIVE_SUM_THRESHOLD because that's really the base case, and add a constant UNROLL_ACCUMULATORS or UNROLL_DEGREE of 8. Then, the condition for pairwise_sum would be n <= NAIVE_SUM_THRESHOLD * UNROLL_ACCUMULATORS.

It would be nice for .product() to be pairwise too.

Would it really be beneficial?

I assumed that floating-point multiplication would have the same issue with error accumulation as addition, but I don't really know. We can keep .product() as-is for now.

This is slightly faster and is easier to understand.

jturner314 · 2019-02-03T06:01:36Z

I noticed one more issue -- in the non-contiguous case, the current .sum() implementation in this PR operates pairwise only over the last dimension (since it's naively adding up the pairwise sums of the lanes). I've created LukeMathWalker#3 to fix this and make some additional simplifications and performance improvements.

The only other things I'd like to see are:

A few more tests of .sum()/.sum_axis(). In particular, I'd like to see a test with more than 2 dimensions, a test with axis lengths that are larger than NAIVE_SUM_THRESHOLD * UNROLL_SIZE but not evenly divisible by NAIVE_SUM_THRESHOLD or UNROLL_SIZE, and the same for a discontiguous array.
Some benchmarks with integer elements to see whether we need to specialize the implementation for integers.

Edit: I noticed that LukeMathWalker#3 causes a performance regression in the middle_discontiguous_sum_ix3_1e2 benchmark since sum no longer takes advantage of pairwise_sum when the last axis is contiguous. I have a fix for this, but it needs some cleanup before I push it.

Improve pairwise summation

LukeMathWalker · 2019-02-03T14:56:24Z

I looked at LukeMathWalker#3 and it seems good to me - I'll merge it and in the meanwhile you can polish the edits required to revert the performance regression you have observed. I'll work on getting more tests in there and some significant benchmarks with integers.

The only thing about LukeMathWalker#3 that I was slightly confused about is:

let cap = len.saturating_sub(1) / NAIVE_SUM_THRESHOLD + 1; // ceiling of division

Why are we subtracting 1 from len? 🤔

LukeMathWalker · 2019-02-03T17:33:02Z

Re-running benchmarks:

Desktop (AMD):

name                                  master ns/iter  branch ns/iter  diff ns/iter   diff %  speedup 
 clip                                  1,730           1,658                    -72   -4.16%   x 1.04 
 contiguous_sum_1e2                    16              14                        -2  -12.50%   x 1.14 
 contiguous_sum_1e4                    1,277           1,479                    202   15.82%   x 0.86 
 contiguous_sum_1e7                    4,057,990       4,187,331            129,341    3.19%   x 0.97 
 contiguous_sum_int_1e2                17              17                         0    0.00%   x 1.00 
 contiguous_sum_int_1e4                1,619           2,013                    394   24.34%   x 0.80 
 contiguous_sum_int_1e7                4,993,451       5,037,513             44,062    0.88%   x 0.99 
 contiguous_sum_int_ix3_1e2            326,134         333,641                7,507    2.30%   x 0.98 
 contiguous_sum_ix3_1e2                283,887         289,471                5,584    1.97%   x 0.98 
 inner_discontiguous_sum_int_ix3_1e2   809,916         1,164,653            354,737   43.80%   x 0.70 
 inner_discontiguous_sum_ix3_1e2       809,432         1,014,721            205,289   25.36%   x 0.80 
 middle_discontiguous_sum_int_ix3_1e2  1,344,886       2,116,789            771,903   57.40%   x 0.64 
 middle_discontiguous_sum_ix3_1e2      1,072,775       1,987,436            914,661   85.26%   x 0.54 
 sum_by_col_1e4                        39,772,711      40,890,980         1,118,269    2.81%   x 0.97 
 sum_by_col_int_1e4                    51,297,923      51,720,879           422,956    0.82%   x 0.99 
 sum_by_middle_1e2                     659,063         682,702               23,639    3.59%   x 0.97 
 sum_by_middle_int_1e2                 645,698         663,506               17,808    2.76%   x 0.97 
 sum_by_row_1e4                        54,742,936      56,152,978         1,410,042    2.58%   x 0.97 
 sum_by_row_int_1e4                    42,806,355      44,406,731         1,600,376    3.74%   x 0.96

Laptop (Intel):

name                                  master ns/iter  branch ns/iter  diff ns/iter   diff %  speedup
 clip                                  1,212           1,215                      3    0.25%   x 1.00
 contiguous_sum_1e2                    15              12                        -3  -20.00%   x 1.25
 contiguous_sum_1e4                    1,236           1,325                     89    7.20%   x 0.93
 contiguous_sum_1e7                    3,170,928       3,425,280            254,352    8.02%   x 0.93
 contiguous_sum_int_1e2                13              12                        -1   -7.69%   x 1.08
 contiguous_sum_int_1e4                2,227           2,196                    -31   -1.39%   x 1.01
 contiguous_sum_int_1e7                4,156,062       4,232,262             76,200    1.83%   x 0.98
 contiguous_sum_int_ix3_1e2            252,058         266,392               14,334    5.69%   x 0.95
 contiguous_sum_ix3_1e2                166,770         183,237               16,467    9.87%   x 0.91
 inner_discontiguous_sum_int_ix3_1e2   543,743         883,764              340,021   62.53%   x 0.62
 inner_discontiguous_sum_ix3_1e2       672,002         880,811              208,809   31.07%   x 0.76
 middle_discontiguous_sum_int_ix3_1e2  549,559         903,988              354,429   64.49%   x 0.61
 middle_discontiguous_sum_ix3_1e2      441,214         989,728              548,514  124.32%   x 0.45
 sum_by_col_1e4                        31,920,821      34,300,217         2,379,396    7.45%   x 0.93
 sum_by_col_int_1e4                    42,802,478      43,464,495           662,017    1.55%   x 0.98
 sum_by_middle_1e2                     385,995         399,979               13,984    3.62%   x 0.97
 sum_by_middle_int_1e2                 398,241         402,326                4,085    1.03%   x 0.99
 sum_by_row_1e4                        43,487,940      45,690,264         2,202,324    5.06%   x 0.95
 sum_by_row_int_1e4                    42,770,556      44,198,493         1,427,937    3.34%   x 0.97

This causes no change in performance according to the relevant benchmarks in `bench1`.

This improves the performance of `sum` when an axis is contiguous but the array as a whole is not contiguous.

jturner314 · 2019-02-04T02:14:39Z

Okay, I've added my changes in LukeMathWalker#4. They should improve performance on the middle_discontiguous benchmarks.

The only thing about LukeMathWalker#3 that I was slightly confused about is:
let cap = len.saturating_sub(1) / NAIVE_SUM_THRESHOLD + 1; // ceiling of division
Why are we subtracting 1 from len?

The goal of this line is to determine the ceiling of len / NAIVE_SUM_THRESHOLD (see Stack Overflow). (We need the ceiling to account for the last few elements in the iterator if len is not evenly divisible by NAIVE_SUM_THRESHOLD.) I've rewritten this line in LukeMathWalker#4 to be more clear (and to give zero when len == 0):

let cap = len / NAIVE_SUM_THRESHOLD + if len % NAIVE_SUM_THRESHOLD != 0 { 1 } else { 0 };

The benchmarks are interesting, especially the difference between the two machines. All the results look favorable except for the discontiguous cases. I'm not really sure why they're so much worse. I've tried running perf for profiling but haven't had any success (perf report keeps giving an error message). I've tried a few ideas to improve iterator_pairwise_sum and pure_pairwise_sum but haven't had any success.

Improve pairwise summation

LukeMathWalker · 2019-02-04T08:49:47Z

The new formulation for cap is much clearer, thanks!

I have reran the benchmarks with your latest changes - the inner discontinuous case is the only one that is clearly suffering right now.

Desktop (AMD):

name                                  master ns/iter  branch ns/iter  diff ns/iter   diff %  speedup 
 clip                                  1,651           1,650                     -1   -0.06%   x 1.00 
 contiguous_sum_1e2                    16              13                        -3  -18.75%   x 1.23 
 contiguous_sum_1e4                    1,224           1,385                    161   13.15%   x 0.88 
 contiguous_sum_1e7                    3,869,448       4,103,742            234,294    6.05%   x 0.94 
 contiguous_sum_int_1e2                17              15                        -2  -11.76%   x 1.13 
 contiguous_sum_int_1e4                1,568           1,931                    363   23.15%   x 0.81 
 contiguous_sum_int_1e7                4,876,993       5,036,355            159,362    3.27%   x 0.97 
 contiguous_sum_int_ix3_1e2            332,453         341,591                9,138    2.75%   x 0.97 
 contiguous_sum_ix3_1e2                274,877         293,840               18,963    6.90%   x 0.94 
 inner_discontiguous_sum_int_ix3_1e2   772,709         1,116,399            343,690   44.48%   x 0.69 
 inner_discontiguous_sum_ix3_1e2       762,707         1,013,607            250,900   32.90%   x 0.75 
 middle_discontiguous_sum_int_ix3_1e2  1,162,295       1,425,353            263,058   22.63%   x 0.82 
 middle_discontiguous_sum_ix3_1e2      934,716         1,136,989            202,273   21.64%   x 0.82 
 sum_by_col_1e4                        38,968,247      40,410,507         1,442,260    3.70%   x 0.96 
 sum_by_col_int_1e4                    50,080,514      51,510,736         1,430,222    2.86%   x 0.97 
 sum_by_middle_1e2                     703,151         662,579              -40,572   -5.77%   x 1.06 
 sum_by_middle_int_1e2                 637,496         644,015                6,519    1.02%   x 0.99 
 sum_by_row_1e4                        54,319,725      55,886,431         1,566,706    2.88%   x 0.97 
 sum_by_row_int_1e4                    42,122,728      43,521,191         1,398,463    3.32%   x 0.97

(Intel benchmarks coming later)

tanriol · 2020-06-28T20:52:06Z

What's the status of this PR?

LukeMathWalker added 10 commits January 4, 2019 19:26

Add function declaration for pairwise_sum

483b29a

Base case: array with 512 elements

75860b6

Base case: use unrolled_fold

b8304c0

Implemented algorithm for not-base case branch

ec3722a

Implemented pairwise summation algorithm for an iterator parameter

70d32b7

Implemented pairwise summation algorithm for an iterator parameter wi…

c66dc98

…th array items

Refactored: use fold to sum

175f9a2

Refactored: use a constant to reuse size value of recursion base case…

8e403af

… across implementations

Added documentation

465063f

Minor edits to the docs

6427d45

LukeMathWalker changed the title ~~Pairwise summation~~ [WIP] Pairwise summation Jan 4, 2019

Don't forget to add the sum of the last elements (<512 ending block).

4414450

LukeMathWalker changed the title ~~[WIP] Pairwise summation~~ Pairwise summation Jan 4, 2019

LukeMathWalker added 6 commits January 5, 2019 12:24

Add a benchmark for summing a contiguous array

aeaad0e

Benchmarks for arrays of different length

75109b1

Don't split midpoint, saving one operation

3085194

Revert "Don't split midpoint, saving one operation"

797e212

This reverts commit 3085194.

Benches for sum_axis

d2b636b

Bench for contiguous sum with integer values

b3d2b42

LukeMathWalker mentioned this pull request Jan 9, 2019

Means rust-ndarray/ndarray-stats#20

Merged

Alternative implementation for sum_axis

8f95705

LukeMathWalker added 2 commits January 9, 2019 08:55

Revert "Alternative implementation for sum_axis"

74a74ae

This reverts commit 8f95705.

Ensure equal block size independently of underlying implementation

a592a7d

jturner314 added the enhancement label Jan 11, 2019

jturner314 added 5 commits February 3, 2019 00:55

Add more benchmarks of sum/sum_axis

b98e30b

Improve performance of iterator_pairwise_sum

ed88e2e

Make sum pairwise over all dimensions

e7835ee

Implement contiguous sum_axis in terms of Zip

8301c25

This is slightly faster and is easier to understand.

Remove redundant len_of call

82453df

Merge pull request #3 from jturner314/pairwise-summation

1d51f70

Improve pairwise summation

LukeMathWalker added 4 commits February 3, 2019 16:30

Added test for axis independence

978f45a

Make sure we actually exercise the pairwise technique

fa0ba30

Test discontinuous arrays

b4136d7

Add more integer benchmark equivalents

4a63cb3

jturner314 added 7 commits February 3, 2019 18:35

Fix min_stride_axis to prefer axes with length > 1

f306b5f

Specialize min_stride_axis for Ix3

b7951df

Enable min_stride_axis as pub(crate) method

3326de4

Simplify fold to use min_stride_axis

65b6046

This causes no change in performance according to the relevant benchmarks in `bench1`.

Improve performance of sum in certain cases

b0b391a

This improves the performance of `sum` when an axis is contiguous but the array as a whole is not contiguous.

Update quickcheck and use quickcheck_macros

7f04e6f

Clarify capacity calculation in iterator_pairwise_sum

1ed1a63

Merge pull request #4 from jturner314/pairwise-summation

1e88385

Improve pairwise summation

This was referenced Mar 26, 2019

Mean #580

Merged

Work on performance issues in summary statistics due to using ArrayBase::sum rust-ndarray/ndarray-stats#35

Open

LukeMathWalker mentioned this pull request Oct 6, 2019

Weighted var rust-ndarray/ndarray-stats#55

Merged

xd009642 mentioned this pull request Oct 8, 2020

Status of project? #843

Closed

bluss self-requested a review January 11, 2021 20:47

bluss removed their request for review November 22, 2021 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pairwise summation #577

Pairwise summation #577

LukeMathWalker commented Jan 4, 2019 •

edited

Loading

bluss commented Jan 5, 2019 •

edited

Loading

LukeMathWalker commented Jan 5, 2019 •

edited

Loading

LukeMathWalker commented Jan 5, 2019

jturner314 commented Jan 8, 2019

LukeMathWalker commented Jan 9, 2019 •

edited

Loading

jturner314 commented Jan 11, 2019

jturner314 commented Feb 3, 2019 •

edited

Loading

LukeMathWalker commented Feb 3, 2019 •

edited

Loading

LukeMathWalker commented Feb 3, 2019 •

edited

Loading

jturner314 commented Feb 4, 2019

LukeMathWalker commented Feb 4, 2019

tanriol commented Jun 28, 2020

Pairwise summation #577

Are you sure you want to change the base?

Pairwise summation #577

Conversation

LukeMathWalker commented Jan 4, 2019 • edited Loading

Motivation

Public API

Technicalities

bluss commented Jan 5, 2019 • edited Loading

LukeMathWalker commented Jan 5, 2019 • edited Loading

LukeMathWalker commented Jan 5, 2019

jturner314 commented Jan 8, 2019

LukeMathWalker commented Jan 9, 2019 • edited Loading

jturner314 commented Jan 11, 2019

jturner314 commented Feb 3, 2019 • edited Loading

LukeMathWalker commented Feb 3, 2019 • edited Loading

LukeMathWalker commented Feb 3, 2019 • edited Loading

jturner314 commented Feb 4, 2019

LukeMathWalker commented Feb 4, 2019

tanriol commented Jun 28, 2020

LukeMathWalker commented Jan 4, 2019 •

edited

Loading

bluss commented Jan 5, 2019 •

edited

Loading

LukeMathWalker commented Jan 5, 2019 •

edited

Loading

LukeMathWalker commented Jan 9, 2019 •

edited

Loading

jturner314 commented Feb 3, 2019 •

edited

Loading

LukeMathWalker commented Feb 3, 2019 •

edited

Loading

LukeMathWalker commented Feb 3, 2019 •

edited

Loading