Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix threadlocal type-inferrability and add reduction functionality #134

Merged
merged 8 commits into from
Feb 26, 2024
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,8 +197,6 @@ BenchmarkTools.Trial: 10000 samples with 10 evaluations.

### Local per-thread storage (`threadlocal`)

**Warning: this feature is likely broken!**

You also can define local storage for each thread, providing a vector containing each of the local storages at the end.

```julia
Expand Down Expand Up @@ -234,6 +232,25 @@ julia> let
Float16[83.0, 90.0, 27.0, 65.0]
```

### `reduction`
The `reduction` keyword enables reduction of an already initialized `isbits` variable with certain supported associative operations (see [docs](https://JuliaSIMD.github.io/Polyester.jl/stable)), such that the transition from serialized code is as simple as adding the `@batch` macro. Contrary to `threadlocal` this does not incur any additional allocations

```julia
julia> function batch_reduction()
y1 = 0
y2 = 1
@batch reduction=((+, y1), (*, y2)) for i in 1:9
y1 += i
y2 *= i
end
y1, y2
end
julia> batch_reduction()
(45, 362880)
julia> @allocated batch_reduction()
0
```

## Disabling Polyester threads

When running many repetitions of a Polyester-multithreaded function (e.g. in an embarrassingly parallel problem that repeatedly executes a small already Polyester-multithreaded function), it can be beneficial to disable Polyester (the inner multithreaded loop) and multithread only at the outer level (e.g. with `Base.Threads`). This can be done with the `disable_polyester_threads` context manager. In the expandable section below you can see examples with benchmarks.
Expand Down
16 changes: 10 additions & 6 deletions src/Polyester.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ end
using ThreadingUtilities
import StaticArrayInterface
const ArrayInterface = StaticArrayInterface
using Base.Cartesian: @nexprs
using StaticArrayInterface: static_length, static_step, static_first, static_size
using StrideArraysCore: object_and_preserve
using ManualMemory: Reference
Expand All @@ -23,6 +24,15 @@ using CPUSummary: num_cores

export batch, @batch, disable_polyester_threads

const SUPPORTED_REDUCE_OPS = (:+, :*, :min, :max, :&, :|)
initializer(::typeof(+), ::T) where {T} = zero(T)
initializer(::typeof(+), ::Bool) = zero(Int)
initializer(::typeof(*), ::T) where {T} = one(T)
initializer(::typeof(min), ::T) where {T} = typemax(T)
initializer(::typeof(max), ::T) where {T} = typemin(T)
initializer(::typeof(&), ::Bool) = true
initializer(::typeof(|), ::Bool) = false

include("batch.jl")
include("closure.jl")

Expand All @@ -38,10 +48,4 @@ function reset_threads!()
foreach(ThreadingUtilities.checktask, eachindex(ThreadingUtilities.TASKS))
return nothing
end

# y = rand(1)
# x = rand(1)
# @batch for i ∈ eachindex(y,x)
# y[i] = sin(x[i])
# end
end
Loading
Loading