Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @assign, add support for Broadcasted objects #64

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Aug 13, 2024

This PR does two things:

  1. Adds the @assign macro to help reducing code verbosity for packages that use ClimaDiagnostics.
  2. Adds support for lazy compute functions that return Base.Broadcast.Broadcasted objects.

Closes #5

@charleskawczynski
Copy link
Member

Could we instead use LazyBroadcasts @lazy to reduce code duplication, instead? That way we can also easily fuse expressions.

@Sbozzolo
Copy link
Member Author

Yes, my goal is to eventually allow for kernel fusion (which is what I meant with "pathway for future improvement"). However, I don't know how to use LazyBroadcast for that, so I was going for something that would allow us to use it internally in the future, while starting to reduce code verbosity.

Is it just a matter of adding @lazy_broadcasted in front of $(esc(out)) .= $(esc(rhs2)) and explicitely calling materialize! when needed?

Maybe this is a good opportunity to add some user documentation to LazyBroadcast.jl.

@charleskawczynski
Copy link
Member

Yes, my goal is to eventually allow for kernel fusion (which is what I meant with "pathway for future improvement"). However, I don't know how to use LazyBroadcast for that, so I was going for something that would allow us to use it internally in the future, while starting to reduce code verbosity.

Is it just a matter of adding @lazy_broadcasted in front of $(esc(out)) .= $(esc(rhs2)) and explicitely calling materialize! when needed?

Maybe this is a good opportunity to add some user documentation to LazyBroadcast.jl.

I'm happy to add more docs to LazyBroadcast, what do you think is missing?

Using LazyBroadcast, compute_ta (from the docs) could be re-written as:

using LazyBroadcast: @lazy
compute_ta(state, cache, time) = @lazy @. state.ta

and the result could later be used as Base.Broadcast.materialize!(out, bc) where bc is the result of compute_ta.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Aug 14, 2024

Thank you! I got something to work and I managed to cluster all the materialize! together in a for loop so that it should be easy to fuse them. Two questions:

  • Should the broadcasted expressions be re-computed every time, or is it enough to re-materialize them?
  • With eager evaluation, we need to copy the first time we call compute! because ClimaDiagnostics modifies the output (to do accumulation/reduction). Do we need to do this for lazy evaluation too?

As for documentation: I am looking at LazyBroadcasts.jl and all the docs I see is the readme which does not mention @lazy or compute_ta.

This is what I think could be improved in the readme:

  • the problem that the package solves is not clearly identified
  • it assumes that the reader has specialized knowledge of how Brocasted objects work and how to use them
  • the only example provided is a test case which comes with no explanation and components that are not pedagogically useful (all the Test related stuff)
  • it is not explained how one would use bce or bco
  • the difference between bce and bco is left to the reader to understand
  • the fact that the package has restrictions is not mentioned
  • it does not guide the reader to further documentation, so the way I find myself trying to figure things out is by browsing the testcases

Here are few things I tried to get the code to work:

julia> import LazyBroadcast: @lazy
julia> compute_ta(state, cache, time) = @lazy @. state.ta
ERROR: LoadError: Invalid expression given to materialize_args
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] materialize_args(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/utils.jl:8
 [3] transform(e::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:30 [4] _lazy_broadcasted(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:43 [5] var"@lazy"(__source__::LineNumberNode, __module__::Module, expr::Any)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:54in expression starting at REPL[17]:1

julia> compute_ta(state, cache, time) = @lazy @. state
ERROR: LoadError: type Symbol has no field args
Stacktrace:
 [1] getproperty(x::Symbol, f::Symbol)
   @ Base ./Base.jl:37
 [2] code_info(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/code_lowered_single_expression.jl:10
 [3] code_lowered_single_expression(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/code_lowered_single_expression.jl:12
 [4] transform(e::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:29 [5] _lazy_broadcasted(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:43 [6] var"@lazy"(__source__::LineNumberNode, __module__::Module, expr::Any)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:54in expression starting at REPL[18]:1

I also found that these do not work:

julia> a, b = [1], [2]

julia> @lazy a .= b
Expr
  head: Symbol .=
  args: Array{Any}((2,))
    1: Symbol a
    2: Symbol b
dump(expr) = nothing
Expr
  head: Symbol .=
  args: Array{Any}((2,))
    1: Symbol a
    2: Symbol b
dump(expr) = nothing
ERROR: LoadError: Uncaught edge case
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] check_restrictions(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/utils.jl:48
 [3] _lazy_broadcasted(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:42 [4] var"@lazy"(__source__::LineNumberNode, __module__::Module, expr::Any)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:54in expression starting at REPL[12]:1

julia> @lazy @dot a = b
Expr
  head: Symbol macrocall
  args: Array{Any}((3,))
    1: Symbol @dot
    2: LineNumberNode
      line: Int64 1
      file: Symbol REPL[19]
    3: Expr
      head: Symbol =
      args: Array{Any}((2,))
        1: Symbol a
        2: Symbol b
dump(expr) = nothing
Expr
  head: Symbol macrocall
  args: Array{Any}((3,))
    1: Symbol @dot
    2: LineNumberNode
      line: Int64 1
      file: Symbol REPL[19]
    3: Expr
      head: Symbol =
      args: Array{Any}((2,))
        1: Symbol a
        2: Symbol b
dump(expr) = nothing
ERROR: LoadError: Uncaught edge case
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] check_restrictions(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/utils.jl:48
 [3] _lazy_broadcasted(expr::Expr)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:42 [4] var"@lazy"(__source__::LineNumberNode, __module__::Module, expr::Any)
   @ LazyBroadcast ~/.julia/packages/LazyBroadcast/SVKU8/src/LazyBroadcast.jl:54in expression starting at REPL[19]:1

Eventually I found that it works with @. (it returns Base.Broadcast.Broadcasted(identity, ([2],)))

From here, I think I understand that the correct function to write is

compute_ta(out, state, cache, time) = @lazy @. out = state.ta

This seems to work in my integration test case.

@charleskawczynski
Copy link
Member

Thank you! I got something to work and I managed to cluster all the materialize! together in a for loop so that it should be easy to fuse them. Two questions:

  • Should the broadcasted expressions be re-computed every time, or is it enough to re-materialize them?

Great question. That's safe to do so long the data passed into the broadcasted objects point to the same memory. That said, it's actually very cheap to do this because all it's doing is making an instance of a struct with a bunch of pointers, so I recommend we reconstruct the object every time to be safe.

  • With eager evaluation, we need to copy the first time we call compute! because ClimaDiagnostics modifies the output (to do accumulation/reduction). Do we need to do this for lazy evaluation too?

We may be able to use Base.Broadcast.materialize(bc) if we want an instance (for the first time), and then use Base.Broadcast.materialize!(out, bc) thereafter. I don't think we need to explicitly use copy, since I think Base.Broadcast.materialize will do that for us.

As for documentation: I am looking at LazyBroadcasts.jl and all the docs I see is the readme which does not mention @lazy or compute_ta.

Yeah, the compute_ta is implementation specific. I think we can provide some some simple example use-cases, though.

This is what I think could be improved in the readme:

  • the problem that the package solves is not clearly identified
  • it assumes that the reader has specialized knowledge of how Brocasted objects work and how to use them
  • the only example provided is a test case which comes with no explanation and components that are not pedagogically useful (all the Test related stuff)
  • it is not explained how one would use bce or bco
  • the difference between bce and bco is left to the reader to understand
  • the fact that the package has restrictions is not mentioned
  • it does not guide the reader to further documentation, so the way I find myself trying to figure things out is by browsing the testcases

Yeah, I agree on all of these points. LazyBroadcast.jl was born because it was originally a bunch of utilities in MultiBroadcastFusion.jl, and I realized that they were more generally useful, so I split those utilities off into a separate package. I hadn't fully thought out the use-cases, and if/what pieces could be made more flexible (like the broken examples you showed below), so I wasn't sure how to best document things. I'll open an issue with these points.

Here are few things I tried to get the code to work:...

julia> import LazyBroadcast: @lazy
julia> compute_ta(state, cache, time) = @lazy @. state.ta
ERROR: LoadError: Invalid expression given to materialize_args

julia> compute_ta(state, cache, time) = @lazy @. state
ERROR: LoadError: type Symbol has no field args
...

I also found that these do not work:

julia> a, b = [1], [2]

julia> @lazy a .= b
Expr
  head: Symbol .=
...
ERROR: LoadError: Uncaught edge case

Yeah, this is a bit technical I think, and it does seem that there are some sharp edges (I didn't expect that last example to fail). It might be easier to discuss some of this over zoom. One tricky aspect of this is that, for example, foo(x) could return a vector, in which case, we wouldn't want to assume that the user meant @lazy foo.(x). So, it allows for more flexibility (there are some such examples in the test suite, but not much discussion in the docs about this).

compute_ta(out, state, cache, time) = @lazy @. out = state.ta

Yeah, this will work, and that's how I'd expect it be used. There might be something we can do to remove needing out, but that should be fine for now.

Comment on lines +107 to +108
storage[diag] =
Base.Broadcast.materialize(broadcasted_expressions[diag])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage[diag] =
Base.Broadcast.materialize(broadcasted_expressions[diag])
Base.Broadcast.materialize!(storage[diag], out_or_broadcasted_expr)

This is where we can use the mutating version of materialize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would fail because storage[diag] does not exist at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I wrote a message earlier but must not have hit comment. Can we then check if isnothing(storage[diag]) and conditionally call materialize!? materialize will always allocate a new field

@Sbozzolo Sbozzolo force-pushed the gb/assign branch 2 times, most recently from f86e803 to ba077ff Compare August 14, 2024 17:49
The `@assign` macro helps reducing code verbosity for packages that use
`ClimaDiagnostics`. It also opens a potential pathway to future
improvements internal to `ClimaDiagnostics`.
@Sbozzolo Sbozzolo changed the title Add @assign Add @assign, add support for Broadcasted objects Aug 14, 2024
`LazyBroadcast.jl` provides a way to return an unevaluated function.
This is useful in two cases:
1. reduce code verbosity to handle the `isnothing(out)` case
2. allow clustering all the broadcasted expressions in a single place

In turn, 2. is useful because it is the first step in fusing different
broadcasted calls.

This commit adds support for such functions.
@Sbozzolo
Copy link
Member Author

Passing ClimaAtmos build: https://buildkite.com/clima/climaatmos-ci/builds/20044

I did not change the EDMF and radiation diagnostics because they are much more complex.

The allocation flame graphs are failing because I added more for loops.

@charleskawczynski
Copy link
Member

Passing ClimaAtmos build: https://buildkite.com/clima/climaatmos-ci/builds/20044

I did not change the EDMF and radiation diagnostics because they are much more complex.

The allocation flame graphs are failing because I added more for loops.

Nice! Do we still need @assign? I thought that it wouldn't be necessary since @lazy changes things to be functional

@Sbozzolo
Copy link
Member Author

Passing ClimaAtmos build: https://buildkite.com/clima/climaatmos-ci/builds/20044
I did not change the EDMF and radiation diagnostics because they are much more complex.
The allocation flame graphs are failing because I added more for loops.

Nice! Do we still need @assign? I thought that it wouldn't be necessary since @lazy changes things to be functional

I am not sure. Does @lazy support diagnostics like this

Fields.array2field(
            cache.radiation.rrtmgp_model.face_sw_flux_dn,
            axes(state.f),
        )

?

@charleskawczynski
Copy link
Member

Fields.array2field(
            cache.radiation.rrtmgp_model.face_sw_flux_dn,
            axes(state.f),
        )

For something like that, we don't even need @lazy, since there's no broadcasting. That said, we can still make a broadcasted object so that it's similar to what @lazy returns:

x = Fields.array2field(
            cache.radiation.rrtmgp_model.face_sw_flux_dn,
            axes(state.f),
        )
bc = Base.broadcasted(identity, x)

@charleskawczynski
Copy link
Member

In fact, any diagnostic that simply returns a field, could just wrap it in Base.broadcasted(identity, ::Fields.Field), and that way everything could be a broadcasted object that will work with Base.Broadcast.materialize!/Base.Broadcast.materialize

@Sbozzolo
Copy link
Member Author

Waiting for
CliMA/LazyBroadcast.jl#8
CliMA/LazyBroadcast.jl#7

@Sbozzolo Sbozzolo marked this pull request as draft September 9, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use lazy broadcasted expressions
2 participants