Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Signaling Accumulators" #12

Open
BenBrock opened this issue Oct 11, 2022 · 8 comments
Open

"Signaling Accumulators" #12

BenBrock opened this issue Oct 11, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@BenBrock
Copy link
Collaborator

BenBrock commented Oct 11, 2022

A "signaling accumulator" would be a useful concept for many graph algorithms where you want to keep track of whether an update has occurred in each iteration.

For example, in SSSP, we can test whether the algorithm has converged by checking whether any new updates were made to the distance vector during this iteration. Traditionally, you'd check this by performing an additional subtraction, or else otherwise checking whether the new distance vector has any new, shorter distances. Alternately, you could implement a new min operator that sets a flag if there's an update. If no update occurs, the algorithm has converged, and the algorithm can end.

    bool updated_value = false;


    auto signaling_min = [&](auto&& a, auto&& b) {
                           auto min = grb::min{}(a, b);
                           if (min != a) {
                             updated_value = true;
                           return min;
                         };

    auto update = grb::multiply(grb::transpose(a), dist,
                           grb::min{}, grb::plus{},
                           grb::full_vector_mask{});

    auto dist2 = grb::ewise_union(dist, update, signaling_min);


    if (!updated_value) {
      break;
    }

For GraphBLAS C++ 2.0, I would propose implementing a "signaling op" wrapper that indicates whether an update occurs when accumulating.

    // Proposal for signal API
    bool update
    grb::signal min(grb::min{}, update);

    template <typename Fn, typename Flag>
    struct signal {

      operator()(auto&& a, auto&& b) {
        auto c = fn_(a, b);
        if (c == a) {
          flag_ = true;
        }
        return c;
      }

      Flag& flag_;
      Fn fn_;
    };
@BenBrock BenBrock added the enhancement New feature or request label Oct 11, 2022
@DrTimothyAldenDavis
Copy link
Member

It would be difficult to implement this as the monoid of a semiring, inside a matrix multiply. It's a barrier for getting good performance in the parallel case. It's not clear from the example, but it looks like 'update' is being computed by the matrix-vector multiply. But your example only uses signaling_min in the ewise operation. How is it being computed by the matrix-vector multiply?

@BenBrock
Copy link
Collaborator Author

BenBrock commented Oct 11, 2022

In a matrix-vector multiply, you'd likely only ever want to use a signaling accumulator for the accumulator, not in the monoid. The purpose is to figure out whether c changed in c += a*b.

Take SSSP. Suppose you're doing the following matrix-vector multiply (C API syntax now), where dist is a vector holding the current shortest distance to each vertex, and graph holds your graph.

mxv(dist, GrB_NULL, GrB_MIN_INT32, GrB_PLUS_MIN_SEMIRING_INT32, graph, dist, GrB_TRAN);

dist now holds the new updated distances, if there are any that have been updated in dist by the GrB_MIN_INT32 accumulator. But how do we know whether there were updates? Typically, you'd have to store the result in a new vector, dist2, and compare.

What I'm suggesting (for C++) is you instead could use a wrapper around GrB_MIN_INT32 that writes to a flag if a value of dist is updated. This would be an idempotent write from false to true, so in terms of thread safety, you'd only need a memory fence/barrier after the mxv, which you already need anyway. One of these "signaling" operators is fairly straightforward to implement on your own in C++ using lambdas, but a grb::signal, etc., could make this very easy.

    bool update = false;

    // Wrap the `grb::min` operator.
    // Write the value `true` to `update` if a value is updated
    auto signaling_min = [&](auto&& a, auto&& b) {
                           auto min = grb::min{}(a, b);
                           if (min != a) {
                             update = true;
                           return min;
                         };

@BenBrock
Copy link
Collaborator Author

BenBrock commented Oct 11, 2022

As I look back at my original example, I used update twice, which is probably a point of confusion. The "signal" accumulation operator should only write to a single scalar flag, from true to false, if one of the values in the output is going to be changed. It shouldn't write to a matrix or vector.

@DrTimothyAldenDavis
Copy link
Member

Yes, that's where I was confused and thought you were using the update flag from signaling_min in the matrix-vector multiply, which then returned it as a scalar. So the return value of "update = grb::multiply (...)" is a vector, called 'update', right?

In that case, a binary operator 'signaling_min' would be well-defined for an ewise operation, and not be hard to implement in parallel. It could also appear as the accumulator op in the matrix multiply, outside of the semiring. It would be very difficult inside a semiring, where it becomes ill-defined and gums up the implemenation particularly in the parallel case.

It would be best not to assume a global value for the boolean 'update' flag, however. That would mess up the case where multiple user-threads are each doing their own calls to an ewise operation using signaling_min.

@BenBrock
Copy link
Collaborator Author

Yes update is a vector returning from multiply. (I updated my example to use updated_value for the boolean.)

My idea was that the signaling accumulator would be a user-defined (or user-assisted op), and that users could pass in a reference to the location they want updated. (I suppose if you wanted to do this in C, you could pass in a pointer.)

@BenBrock
Copy link
Collaborator Author

Although, unfortunately, I just realized that what I outlined above is not actually sufficient. We care not only about cases in which c is accumulated into, but also cases where c had no stored value, and a new value is inserted with no accumulation.

So, unless c already has a stored value in every spot, I think we'd need some actual modification to mxv or ewise to do what's needed (at least for SSSP where the dist vector is allowed to be sparse).

@DrTimothyAldenDavis
Copy link
Member

Yes, I thought about that too.

perhaps a better solution would be some kind of way to tell GraphBLAS: "compute C += (stuff) and tell me if C changes at all" (where "+=" is any accumulator op). That would be more general than a signaling_min. I think it would require the accum operator to be present, at least to implement it efficiently. Then "stuff" could be anything at all, any GrB operation.

The blocking/non-blocking case would be delicate for this output flag. What if the operation was non-blocking and the computation postponed? The "update" flag would have to force a block, if it were a non-opaque scalar.

Gets gnarly...

@BenBrock
Copy link
Collaborator Author

Yeah, I think this gets potentially very complicated in the mxm case. Not so much that it's a difficult thing to do, but that it's difficult to expose it in an elegant and thread-safe way.

In the ewise case, you can use the behavior we want by also checking if the number of stored values has changed in a new output vector. (e.g. in SSSP in this C++ draft impl).

However, element-wise creates a new object, and ideally you'd be able to avoid that by just using mxv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants