An experimental goroutine pool implemented using a lock-free stack
By limiting concurrency with a fixed pool size and recycling goroutines using a stack, itogami saves a lot of memory as compared to using unlimited goroutines and remaining just as fast.
Benchmarks to support the above claims here
Note:- This work is experimental and should not be used in production
You need Golang 1.19.x or above
$ go get github.com/alphadose/itogami
package main
import (
"fmt"
"sync"
"sync/atomic"
"time"
"github.com/alphadose/itogami"
)
const runTimes uint32 = 1000
var sum uint32
func myFunc(i uint32) {
atomic.AddUint32(&sum, i)
fmt.Printf("run with %d\n", i)
}
func demoFunc() {
time.Sleep(10 * time.Millisecond)
println("Hello World")
}
func examplePool() {
var wg sync.WaitGroup
// Use the common pool
pool := itogami.NewPool(10)
syncCalculateSum := func() {
demoFunc()
wg.Done()
}
for i := uint32(0); i < runTimes; i++ {
wg.Add(1)
// Submit task to the pool
pool.Submit(syncCalculateSum)
}
wg.Wait()
println("finished all tasks")
}
func examplePoolWithFunc() {
var wg sync.WaitGroup
// Use the pool with a pre-defined function
pool := itogami.NewPoolWithFunc(10, func(i uint32) {
myFunc(i)
wg.Done()
})
for i := uint32(0); i < runTimes; i++ {
wg.Add(1)
// Invoke the function with a value
pool.Invoke(i)
}
wg.Wait()
fmt.Printf("finish all tasks, result is %d\n", sum)
}
func main() {
examplePool()
examplePoolWithFunc()
}
Benchmarking was performed against:-
- Unlimited goroutines
- Ants
- Gamma-Zero-Worker-Pool
- golang.org/x/sync/errgroup
- Bytedance GoPool
Pool size -> 50k
CPU -> M1, arm64, 8 cores, 3.2 GHz
OS -> darwin
Results were computed from benchstat of 30 cases
name time/op
UnlimitedGoroutines-8 331ms ± 4%
ErrGroup-8 515ms ± 9%
AntsPool-8 582ms ± 9%
GammaZeroPool-8 740ms ±13%
BytedanceGoPool-8 572ms ±18%
ItogamiPool-8 337ms ± 1%
name alloc/op
UnlimitedGoroutines-8 96.3MB ± 0%
ErrGroup-8 120MB ± 0%
AntsPool-8 22.4MB ± 6%
GammaZeroPool-8 18.8MB ± 1%
BytedanceGoPool-8 82.2MB ± 2%
ItogamiPool-8 25.6MB ± 2%
name allocs/op
UnlimitedGoroutines-8 2.00M ± 0%
ErrGroup-8 3.00M ± 0%
AntsPool-8 1.10M ± 2%
GammaZeroPool-8 1.08M ± 0%
BytedanceGoPool-8 2.59M ± 1%
ItogamiPool-8 1.08M ± 0%
The following conclusions can be drawn from the above results:-
- Itogami is the fastest among all goroutine pool implementations and slightly slower than unlimited goroutines
- Itogami has the least
allocs/op
and hence the memory usage scales really well with high load - The memory used per operation is in the acceptable range of other pools and drastically lower than unlimited goroutines
- The tolerance (± %) for Itogami is quite low for all 3 metrics indicating that the algorithm is quite stable overall
Benchmarking code available here