I’m running an insert test (described in depth in https://github.com/cockroachdb/cockroach/issues/38778) on a three (hardware, two CRDB per machine) node HDD hybrid (HDD for general storage and SSD as read cache) system.
What I can observe is with the number of committed transactions the performance deteriorates quickly:
and while 90th percentile latency remains somewhat consistent, 99th percentile drops significantly:
Disk read bytes show a massive amount of increase after a given time:
Also, it’s quite imbalanced:
You could say that I’m doing something wrong (stressing one hot range exclusively), but what’s strange here is the same physical node has the outstanding amount of reads, so I’m not sure.
/_status/hotranges doesn’t show this type of imbalance.
Maybe it’s after running out of RocksDB cache, because the first ~day was quiet disk read-wise (while writes happened of course):
So I would like to know what happens here, how can I dig deeper to understand why there are more reads with the number of inserted rows/database sizes increasing and how to handle this more effectively?