update README

JuliaFolds2 · Jan 28, 2024 · be84771 · be84771
1 parent 9b30e55
commit be84771
Showing 1 changed file with 44 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,8 @@ multithreaded calculations via higher-order functions, with a focus on [data par
 It re-exports the very useful function `chunks` from [ChunkSplitters.jl](https://github.com/m3g/ChunkSplitters.jl), and
 provides the following functions
 
-___________________
+<details><summary>tmapreduce</summary>
+<p>
 
     tmapreduce(f, op, A::AbstractArray;
                [init],
@@ -42,7 +43,11 @@ This data is divided into chunks to be worked on in parallel using [ChunkSplitte
 - `outputtype::Type` (default `Any`) will work as the asserted output type of parallel calculations. This is typically only
 needed if you are using a `:static` schedule, since the `:dynamic` schedule is uses [StableTasks.jl](https://github.com/MasonProtter/StableTasks.jl), but if you experience problems with type stability, you may be able to recover it with the `outputtype` keyword argument.
 
-___________________
+</details>
+</p>
+
+<details><summary>treducemap</summary>
+<p>
 
     treducemap(op, f, A::AbstractArray;
                [init],
@@ -77,7 +82,12 @@ This data is divided into chunks to be worked on in parallel using [ChunkSplitte
 - `outputtype::Type` (default `Any`) will work as the asserted output type of parallel calculations. This is typically only
 needed if you are using a `:static` schedule, since the `:dynamic` schedule is uses [StableTasks.jl](https://github.com/MasonProtter/StableTasks.jl), but if you experience problems with type stability, you may be able to recover it with the `outputtype` keyword argument.
 
-_____________________________________
+</details>
+</p>
+
+<details><summary>treduce</summary>
+<p>
+
 
     treduce(op, A::AbstractArray; [init],
             nchunks::Int = 2 * nthreads(),
@@ -111,8 +121,11 @@ This data is divided into chunks to be worked on in parallel using [ChunkSplitte
 - `outputtype::Type` (default `Any`) will work as the asserted output type of parallel calculations. This is typically only
 needed if you are using a `:static` schedule, since the `:dynamic` schedule is uses [StableTasks.jl](https://github.com/MasonProtter/StableTasks.jl), but if you experience problems with type stability, you may be able to recover it with the `outputtype` keyword argument.
 
-_______________________________________________
+</details>
+</p>
 
+<details><summary>tforeach</summary>
+<p>
 
     tforeach(f, A::AbstractArray;
              nchunks::Int = 2 * nthreads(),
@@ -128,7 +141,12 @@ Apply `f` to each element of `A` on multiple parallel tasks, and return `nothing
 - `schedule::Symbol` either `:dynamic` or `:static` (default `:dynamic`), determines how the parallel portions of the calculation are scheduled. `:dynamic` scheduling is generally preferred since it is more flexible and better at load balancing, but `:static` scheduling can sometimes be more performant when the time it takes to complete a step of the calculation is highly uniform, and no other parallel functions are running at the same time.
 
 
-__________________________
+</details>
+</p>
+
+<details><summary>tmap</summary>
+<p>
+
 
     tmap(f, ::Type{OutputType}, A::AbstractArray; 
          nchunks::Int = 2 * nthreads(),
@@ -144,6 +162,26 @@ A multithreaded function like `Base.map`. Create a new container `similar` to `A
 - `split::Symbol` (default `:batch`) is passed to `ChunkSplitters.chunks` to inform it if the data chunks to be worked on should be contiguous (:batch) or shuffled (:scatter). If `scatter` is chosen, then your reducing operator `op` **must** be [commutative](https://en.wikipedia.org/wiki/Commutative_property) in addition to being associative, or you could get incorrect results!
 - `schedule::Symbol` either `:dynamic` or `:static` (default `:dynamic`), determines how the parallel portions of the calculation are scheduled. `:dynamic` scheduling is generally preferred since it is more flexible and better at load balancing, but `:static` scheduling can sometimes be more performant when the time it takes to complete a step of the calculation is highly uniform, and no other parallel functions are running at the same time.
 
-______________________________
 
+</details>
+</p>
+
+<details><summary>tmap!</summary>
+<p>
+
+    tmap!(f, out, A::AbstractArray; 
+          nchunks::Int = 2 * nthreads(),
+          split::Symbol = :batch,
+          schedule::Symbol =:dynamic)
+
+A multithreaded function like `Base.map!`. In parallel on multiple tasks, this function assigns each element
+of `out[i] = f(A[i])` for each index `i` of `A` and `out`.
+
+## Keyword arguments:
+
+- `nchunks::Int` (default 2 * nthreads()) is passed to `ChunkSplitters.chunks` to inform it how many pieces of data should be worked on in parallel. Greater `nchunks` typically helps with [load balancing](https://en.wikipedia.org/wiki/Load_balancing_(computing)), but at the expense of creating more overhead.
+- `split::Symbol` (default `:batch`) is passed to `ChunkSplitters.chunks` to inform it if the data chunks to be worked on should be contiguous (:batch) or shuffled (:scatter). If `scatter` is chosen, then your reducing operator `op` **must** be [commutative](https://en.wikipedia.org/wiki/Commutative_property) in addition to being associative, or you could get incorrect results!
+- `schedule::Symbol` either `:dynamic` or `:static` (default `:dynamic`), determines how the parallel portions of the calculation are scheduled. `:dynamic` scheduling is generally preferred since it is more flexible and better at load balancing, but `:static` scheduling can sometimes be more performant when the time it takes to complete a step of the calculation is highly uniform, and no other parallel functions are running at the same time.
 
+</details>
+</p>