Fix the docs (#34)

PTsolvers · Aug 15, 2024 · 8ba91ea · 8ba91ea
2 parents ccb46ef + b9542d1
commit 8ba91ea
Show file tree

Hide file tree

Showing 6 changed files with 36 additions and 19 deletions.
diff --git a/docs/src/concepts/architectures.md b/docs/src/concepts/architectures.md
@@ -11,11 +11,13 @@ arch = Arch(CPU())
 
 ```julia
 using CUDA
+
 arch = Arch(CUDABackend())
 ```
 
 ```julia
 using AMDGPU
+
 arch = Arch(ROCBackend())
 ```
 

diff --git a/docs/src/concepts/grids.md b/docs/src/concepts/grids.md
@@ -95,7 +95,10 @@ Using the method `connectivity(::SG{N,T,C}, ::Dim{D}, ::Side{S})`, one can obtai
 
 ```julia-repl
 julia> @assert connectivity(grid, Dim(1), Side(1)) isa Bounded "Left boundary is bounded"
+
 julia> @assert connectivity(grid, Dim(1), Side(2)) isa Bounded "Right boundary is bounded"
+
 julia> @assert connectivity(grid, Dim(2), Side(1)) isa Bounded "Upper boundary is bounded"
+
 julia> @assert connectivity(grid, Dim(2), Side(2)) isa Bounded "Lower boundary is bounded"
 ```
diff --git a/docs/src/concepts/kernels.md b/docs/src/concepts/kernels.md
@@ -1,35 +1,35 @@
 # Kernels
 
-The [KernelAbstactions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) package provides a macro-based dialect that hides the intricacies of vendor-specific GPU programming. It allows one to write hardware-agnostic kernels that can be instantiated and launched for different device backends without modifying the high-level code nor sacrificing performance.
+The [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) package provides a macro-based dialect that hides the intricacies of vendor-specific GPU programming. It allows one to write hardware-agnostic kernels that can be instantiated and launched for different device backends without modifying the high-level code nor sacrificing performance.
 
 In the following, we show how to write and launch kernels on various backends. We also explain the concept of a `Launcher` in [Chmy.jl](https://github.com/PTsolvers/Chmy.jl), that complements the default kernel launching, allowing us to hide the latency between the bulk of the computations and boundary conditions or MPI communications.
 
 ## Writing Kernels
 
-This section highlights some important features of [KernelAbstactions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) that are essential for understanding the high-level abstraction of the kernel concept that is used throughout our package. As it barely serves for illustrative purposes, for more specific examples, please refer to their [documentation](https://juliagpu.github.io/KernelAbstractions.jl/stable/).
+This section highlights some important features of [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) that are essential for understanding the high-level abstraction of the kernel concept that is used throughout our package. As it barely serves for illustrative purposes, for more specific examples, please refer to their [documentation](https://juliagpu.github.io/KernelAbstractions.jl/stable/).
 
 ```julia
-using KernelAbstactions
+using KernelAbstractions
 
 # Define a kernel that performs element-wise operations on A
-@kernel function mul2(A)
+@kernel function mul2!(A)
     # use @index macro to obtain the global Cartesian index of the current work item.
     I = @index(Global, Cartesian)
     A[I] *= 2
 end
 ```
 
-With the kernel `mul2` as defined using `@kernel` macro, we can launch it on the desired backend to perform the element-wise operations on host.
+The kernel `mul2!` being defined using the `@kernel` macro, we can launch it on the desired backend to perform the element-wise operations on host.
 
 ```julia
 # Define array and work group size
-workgroup_size = 64
-A              = ones(1024, 1024)
-backend        = get_backend(A) # CPU
+A       = ones(1024, 1024)
+backend = get_backend(A) # CPU
 
 # Launch kernel and explicitly synchronize
-mul2(backend, workgroup_size)(A, ndrange=size(A))
-synchronize(backend)
+kernel = mul2!(backend)
+kernel(A, ndrange=size(A))
+KernelAbstractions.synchronize(backend)
 
 # Result assertion
 @assert(all(A .== 2.0) == true)
@@ -41,7 +41,7 @@ To launch the kernel on GPU devices, one could simply define `A` as `CuArray`, `
 
 Thread indexing is essential for memory usage on GPU devices; however, it can quickly become cumbersome to figure out the thread index, especially when working with multi-dimensional grids of multi-dimensional blocks of threads. The performance of kernels can also depend significantly on access patterns.
 
-In the example above, we saw the usage of `I = @index(Global, Cartesian)`, which retrieves the global index of threads for the two-dimensional array `A`. Such powerful macros are provided by [KernelAbstactions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) for conveniently retrieving the desired index of threads.
+In the example above, we saw the usage of `I = @index(Global, Cartesian)`, which retrieves the global index of threads for the two-dimensional array `A`. Such powerful macros are provided by [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl) for conveniently retrieving the desired index of threads.
 
 The following table is non-exhaustive and provides a reference of commonly used terminology. Here, [`KernelAbstractions.@index`](https://juliagpu.github.io/KernelAbstractions.jl/stable/api/#KernelAbstractions.@index) is used for index retrieval, and [`KernelAbstractions.@groupsize`](https://juliagpu.github.io/KernelAbstractions.jl/stable/api/#KernelAbstractions.@groupsize) is used for obtaining the dimensions of blocks of threads.
 
@@ -52,12 +52,10 @@ The following table is non-exhaustive and provides a reference of commonly used
 | `@index(Local, Cartesian)[3]`     |                         | `threadIdx().z`                 | `workitemIdx().z`               |
 | `@index(Group, Linear)`           | `i ÷ g`                 | `blockIdx().x`                  | `workgroupIdx().x`              |
 | `@index(Group, Cartesian)[2]`     |                         | `blockIdx().y`                  | `workgroupIdx().y`              |
-| `@groupsize()[3]`                 |                         | `blockDim().z`                  | `workgroupIdx().z`              |
-| `prod(@groupsize())`              | `g`                     | `.x * .y * .z`                  | `.x * .y * .z`                  |
+| `@groupsize()[3]`                 |                         | `blockDim().z`                  | `workgroupDim().z`              |
 | `@index(Global, Linear)`          | `i`                     | global index computation needed | global index computation needed |
 | `@index(Global, Cartesian)[2]`    |                         | global index computation needed | global index computation needed |
-| `@index(Global, NTuple)`          |                         | `(threadIdx().x, ... )`         | `(workitemIdx().x, ... )`       |
-
+| `@index(Global, NTuple)`          |                         | global index computation needed | global index computation needed |
 
 The `@index(Global, NTuple)` returns a `NTuple` object, allowing more fine-grained memory control over the allocated arrays.
 
@@ -85,8 +83,8 @@ Followingly, we define a kernel launcher associated with an `UniformGrid` object
 
 ```julia
 # Define backend and geometry
-arch   = Arch(CUDABackend())
-grid   = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=(126, 126))
+arch = Arch(CUDABackend())
+grid = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=(126, 126))
 
 # Define launcher
 launch = Launcher(arch, grid)

diff --git a/docs/src/developer_documentation/running_tests.md b/docs/src/developer_documentation/running_tests.md
@@ -5,6 +5,7 @@
 To run the Chmy test suite on the CPU, simple run `test` from within the package mode or using `Pkg`:
 ```julia-repl
 julia> using Pkg
+
 julia> Pkg.test("Chmy")
 ```
 
@@ -16,12 +17,14 @@ To run the Chmy test suite on CUDA or ROC backend (Nvidia or AMD GPUs), respecti
 
 ```julia-repl
 julia> using Pkg
+
 julia> Pkg.test("Chmy"; test_args=["--backend=CUDA"])
 ```
 
 ### For ROC backend (AMD GPUs):
 
 ```julia-repl
 julia> using Pkg
+
 julia> Pkg.test("Chmy"; test_args=["--backend=AMDGPU"])
 ```
diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md
@@ -79,14 +79,22 @@ end
 ```
 
 !!! note "Non-Cartesian indices"
-    You can use not only `Cartesian` indices, but more usual "subscript" indices as well. For example, `update_C!` will become:
+    Besides using `Cartesian` indices, more standard indexing works as well, using `NTuple`. For example, `update_C!` will become:
     ```julia
     @kernel inbounds = true function update_C!(C, q, Δt, g::StructuredGrid, O)
         ix, iy = @index(Global, NTuple)
         (ix, iy) = (ix, iy) + O
         C[ix, iy] -= Δt * divg(q, g, ix, iy)
     end
     ```
+    where the dimensions could be abstracted by splatting the returned index (`I...`):
+    ```julia
+    @kernel inbounds = true function update_C!(C, q, Δt, g::StructuredGrid, O)
+        I = @index(Global, NTuple)
+        I = I + O
+        C[I...] -= Δt * divg(q, g, I...)
+    end
+    ```
 
 ## Model Setup
 
@@ -96,8 +104,10 @@ The diffusion model that we solve should contain the following model setup
 # geometry
 grid   = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=(126, 126))
 launch = Launcher(arch, grid)
+
 # physics
 χ = 1.0
+
 # numerics
 Δt = minimum(spacing(grid))^2 / χ / ndims(grid) / 2.1
 ```

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -8,6 +8,7 @@ To install Chmy.jl, one can simply add it using the Julia package manager:
 
 ```julia-repl
 julia> using Pkg
+
 julia> Pkg.add("Chmy")
 ```
 
@@ -34,7 +35,7 @@ A general list of the features is:
 - Multi-dimensional, parameterizable discrete and continuous fields on structured grids
 - High-level interface for specifying boundary conditions with automatic batching for performance
 - Finite difference and interpolation operators on discrete fields
-- Extensibility. The whole package is written in pure Julia, so adding new functions, simplification rules, and model transformations has no barrier.
+- Extensibility; The package is written in pure Julia, so adding new functions, simplification rules, and model transformations has no barrier
 
 ## Funding