Skip to content

Commit

Permalink
Fix the docs (#34)
Browse files Browse the repository at this point in the history
  • Loading branch information
luraess authored Aug 15, 2024
2 parents ccb46ef + b9542d1 commit 8ba91ea
Show file tree
Hide file tree
Showing 6 changed files with 36 additions and 19 deletions.
2 changes: 2 additions & 0 deletions docs/src/concepts/architectures.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@ arch = Arch(CPU())

```julia
using CUDA

arch = Arch(CUDABackend())
```

```julia
using AMDGPU

arch = Arch(ROCBackend())
```

Expand Down
3 changes: 3 additions & 0 deletions docs/src/concepts/grids.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,10 @@ Using the method `connectivity(::SG{N,T,C}, ::Dim{D}, ::Side{S})`, one can obtai

```julia-repl
julia> @assert connectivity(grid, Dim(1), Side(1)) isa Bounded "Left boundary is bounded"
julia> @assert connectivity(grid, Dim(1), Side(2)) isa Bounded "Right boundary is bounded"
julia> @assert connectivity(grid, Dim(2), Side(1)) isa Bounded "Upper boundary is bounded"
julia> @assert connectivity(grid, Dim(2), Side(2)) isa Bounded "Lower boundary is bounded"
```
32 changes: 15 additions & 17 deletions docs/src/concepts/kernels.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
# Kernels

The [KernelAbstactions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) package provides a macro-based dialect that hides the intricacies of vendor-specific GPU programming. It allows one to write hardware-agnostic kernels that can be instantiated and launched for different device backends without modifying the high-level code nor sacrificing performance.
The [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) package provides a macro-based dialect that hides the intricacies of vendor-specific GPU programming. It allows one to write hardware-agnostic kernels that can be instantiated and launched for different device backends without modifying the high-level code nor sacrificing performance.

In the following, we show how to write and launch kernels on various backends. We also explain the concept of a `Launcher` in [Chmy.jl](https://github.com/PTsolvers/Chmy.jl), that complements the default kernel launching, allowing us to hide the latency between the bulk of the computations and boundary conditions or MPI communications.

## Writing Kernels

This section highlights some important features of [KernelAbstactions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) that are essential for understanding the high-level abstraction of the kernel concept that is used throughout our package. As it barely serves for illustrative purposes, for more specific examples, please refer to their [documentation](https://juliagpu.github.io/KernelAbstractions.jl/stable/).
This section highlights some important features of [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) that are essential for understanding the high-level abstraction of the kernel concept that is used throughout our package. As it barely serves for illustrative purposes, for more specific examples, please refer to their [documentation](https://juliagpu.github.io/KernelAbstractions.jl/stable/).

```julia
using KernelAbstactions
using KernelAbstractions

# Define a kernel that performs element-wise operations on A
@kernel function mul2(A)
@kernel function mul2!(A)
# use @index macro to obtain the global Cartesian index of the current work item.
I = @index(Global, Cartesian)
A[I] *= 2
end
```

With the kernel `mul2` as defined using `@kernel` macro, we can launch it on the desired backend to perform the element-wise operations on host.
The kernel `mul2!` being defined using the `@kernel` macro, we can launch it on the desired backend to perform the element-wise operations on host.

```julia
# Define array and work group size
workgroup_size = 64
A = ones(1024, 1024)
backend = get_backend(A) # CPU
A = ones(1024, 1024)
backend = get_backend(A) # CPU

# Launch kernel and explicitly synchronize
mul2(backend, workgroup_size)(A, ndrange=size(A))
synchronize(backend)
kernel = mul2!(backend)
kernel(A, ndrange=size(A))
KernelAbstractions.synchronize(backend)

# Result assertion
@assert(all(A .== 2.0) == true)
Expand All @@ -41,7 +41,7 @@ To launch the kernel on GPU devices, one could simply define `A` as `CuArray`, `

Thread indexing is essential for memory usage on GPU devices; however, it can quickly become cumbersome to figure out the thread index, especially when working with multi-dimensional grids of multi-dimensional blocks of threads. The performance of kernels can also depend significantly on access patterns.

In the example above, we saw the usage of `I = @index(Global, Cartesian)`, which retrieves the global index of threads for the two-dimensional array `A`. Such powerful macros are provided by [KernelAbstactions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) for conveniently retrieving the desired index of threads.
In the example above, we saw the usage of `I = @index(Global, Cartesian)`, which retrieves the global index of threads for the two-dimensional array `A`. Such powerful macros are provided by [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) for conveniently retrieving the desired index of threads.

The following table is non-exhaustive and provides a reference of commonly used terminology. Here, [`KernelAbstractions.@index`](https://juliagpu.github.io/KernelAbstractions.jl/stable/api/#KernelAbstractions.@index) is used for index retrieval, and [`KernelAbstractions.@groupsize`](https://juliagpu.github.io/KernelAbstractions.jl/stable/api/#KernelAbstractions.@groupsize) is used for obtaining the dimensions of blocks of threads.

Expand All @@ -52,12 +52,10 @@ The following table is non-exhaustive and provides a reference of commonly used
| `@index(Local, Cartesian)[3]` | | `threadIdx().z` | `workitemIdx().z` |
| `@index(Group, Linear)` | `i ÷ g` | `blockIdx().x` | `workgroupIdx().x` |
| `@index(Group, Cartesian)[2]` | | `blockIdx().y` | `workgroupIdx().y` |
| `@groupsize()[3]` | | `blockDim().z` | `workgroupIdx().z` |
| `prod(@groupsize())` | `g` | `.x * .y * .z` | `.x * .y * .z` |
| `@groupsize()[3]` | | `blockDim().z` | `workgroupDim().z` |
| `@index(Global, Linear)` | `i` | global index computation needed | global index computation needed |
| `@index(Global, Cartesian)[2]` | | global index computation needed | global index computation needed |
| `@index(Global, NTuple)` | | `(threadIdx().x, ... )` | `(workitemIdx().x, ... )` |

| `@index(Global, NTuple)` | | global index computation needed | global index computation needed |

The `@index(Global, NTuple)` returns a `NTuple` object, allowing more fine-grained memory control over the allocated arrays.

Expand Down Expand Up @@ -85,8 +83,8 @@ Followingly, we define a kernel launcher associated with an `UniformGrid` object

```julia
# Define backend and geometry
arch = Arch(CUDABackend())
grid = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=(126, 126))
arch = Arch(CUDABackend())
grid = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=(126, 126))

# Define launcher
launch = Launcher(arch, grid)
Expand Down
3 changes: 3 additions & 0 deletions docs/src/developer_documentation/running_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
To run the Chmy test suite on the CPU, simple run `test` from within the package mode or using `Pkg`:
```julia-repl
julia> using Pkg
julia> Pkg.test("Chmy")
```

Expand All @@ -16,12 +17,14 @@ To run the Chmy test suite on CUDA or ROC backend (Nvidia or AMD GPUs), respecti

```julia-repl
julia> using Pkg
julia> Pkg.test("Chmy"; test_args=["--backend=CUDA"])
```

### For ROC backend (AMD GPUs):

```julia-repl
julia> using Pkg
julia> Pkg.test("Chmy"; test_args=["--backend=AMDGPU"])
```
12 changes: 11 additions & 1 deletion docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,22 @@ end
```

!!! note "Non-Cartesian indices"
You can use not only `Cartesian` indices, but more usual "subscript" indices as well. For example, `update_C!` will become:
Besides using `Cartesian` indices, more standard indexing works as well, using `NTuple`. For example, `update_C!` will become:
```julia
@kernel inbounds = true function update_C!(C, q, Δt, g::StructuredGrid, O)
ix, iy = @index(Global, NTuple)
(ix, iy) = (ix, iy) + O
C[ix, iy] -= Δt * divg(q, g, ix, iy)
end
```
where the dimensions could be abstracted by splatting the returned index (`I...`):
```julia
@kernel inbounds = true function update_C!(C, q, Δt, g::StructuredGrid, O)
I = @index(Global, NTuple)
I = I + O
C[I...] -= Δt * divg(q, g, I...)
end
```

## Model Setup

Expand All @@ -96,8 +104,10 @@ The diffusion model that we solve should contain the following model setup
# geometry
grid = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=(126, 126))
launch = Launcher(arch, grid)

# physics
χ = 1.0

# numerics
Δt = minimum(spacing(grid))^2 / χ / ndims(grid) / 2.1
```
Expand Down
3 changes: 2 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ To install Chmy.jl, one can simply add it using the Julia package manager:

```julia-repl
julia> using Pkg
julia> Pkg.add("Chmy")
```

Expand All @@ -34,7 +35,7 @@ A general list of the features is:
- Multi-dimensional, parameterizable discrete and continuous fields on structured grids
- High-level interface for specifying boundary conditions with automatic batching for performance
- Finite difference and interpolation operators on discrete fields
- Extensibility. The whole package is written in pure Julia, so adding new functions, simplification rules, and model transformations has no barrier.
- Extensibility; The package is written in pure Julia, so adding new functions, simplification rules, and model transformations has no barrier

## Funding

Expand Down

0 comments on commit 8ba91ea

Please sign in to comment.