Identifying Goroutine Performance Bottlenecks – Getting Help

I parse a very large json files (20-30GB) line by line with the buffio package, extract the values ​​and do some math on it. Profiling with pyprof showed great results, happy with it, it’s not much philosophy on the end.

I have an 8 core (16 threads) workstation, so the next logical step is to process the 8 files in parallel with goroutines, which is what I did. At first glance, I’d expect around 8x speedup, but it’s barely 4x. I asked myself what could be the reason for this. Perhaps the limitation of SSD disk I/O in conjunction with the bufio package (?!). Thus, I did a dummy test, copied a large file (120MB/s load) from a network and in the meantime redo the parsein etc., which should have made the speed worse, but it didn’t, barely the same 4x speedup. So how can I identify what is the obstacle next to it? Why is the speedup only 4x but should be closer to 8x? Any suggestions appreciated?

Leave a Comment