Databases with Automatic Rebalance Benchmark (TIDB vs YugabyteDB vs CockroachDB)
Automatic rebalance/repair/self-healing (we can remove or add new node, and it will distribute the data and rebalance itself, data are replicated to more than 1 node). Previous benchmark doesn’t really care about this awesome feature (no more cutoff downtime to kill master instance and promote slave as master then switch every client to connect to new master — if not using any proxy).
Some databases that I found that support this feature:
Reproducibility
The repository are here: https://github.com/kokizzu/hugedbbench on the 2021 folder. We’re going to test local single (if possible) and multi server deployment using docker. Why using docker? because i don’t want to ruin my computer/server with trash files they are creating in system directory (if any). Some of databases not included if not supporting SQL or if a license key required to start. Why only benchmarking 2 column? because it fit my project’s most common use case, where there’s 1 PK (bigint or string), and 1 unique key (mostly string), and the rest mostly some indexed or non-indexed column. Why are you even doing this? Just want to select the best thing for my next side project’s techstack (and because my past companies I’ve work with seems love to move around database server location a lot).
The specs for the server that used in this benchmark: 32-core 128GB RAM 500GB NVMe disk.
CockroachDB
CockroachDB is one of NewSQL movement that support PostgreSQL syntax, to deploy in single node we can use docker compose. The UI for cluster monitor on port 8080 is quite ok :3 better than nothing.
Here’s the result for 100 inserts x 1000 goroutines:
CockroachDB InsertOne 10.034616078s
CockroachDB Count 42.326487ms
CockroachDB UpdateOne 12.804722812s
CockroachDB Count 78.221432ms
CockroachDB SelectOne 2.281355728s
TiDB
TiDB is one of NewSQL movement that support MySQL syntax, the recommended way is using tiup command, but we’re going to use docker so it would be fair with other database product. The official docker use 3 placement driver and 3 kv server, so I try that first. The cluster monitor in port 10080 but it blocked by chrome, so I moved it on 10081, it’s very plaintexty compared to other products.
# reducing to single server mode (1 pd, 1 kv, 1 db), first run:
TiDB InsertOne 3.216365486s
TiDB UpdateOne 3.913131711s
TiDB SelectOne 1.991229179s
YugaByteDB is one of NewSQL movement that support PostgreSQL syntax, to deploy in single node we can use docker compose too. The cluster monitor on port :7000 is quite ok. The tmp directory mounted because if it isn’t it would stuck starting on 2nd time unless the temporary file manually deleted. limits.conf applied.
YugaByteDB Count 159.357304ms
YugaByteDB Count 214.389496ms
YugaByteDB SelectOne 2.778803557s
YugaByteDB Total 33.834838111s
YugaByteDB InsertOne 38.614091068s
YugaByteDB Count 76.615212ms
YugaByteDB UpdateOne 56.796680169s
YugaByteDB Count 84.35411ms
YugaByteDB SelectOne 3.14747611s
Here’s the recap of 100 records x 1000 goroutine insert/update/select duration, only for single instance:
So, at best, it roughly on average take 29 μs to insert, 39 μs to update, 19 μs to select one record.
Comparing only multi (RF=2+):
So, at best, it roughly on average take 31 μs to insert, 41 μs to update, 21 μs to select one record.
Comparing only multi with replication factor with true HA:
It seems TiDB has most balanced performance in expense the need to have pre-allocated disk space, while CockroachDB has worst performance on multi-instance update task, and YugabyteDB has worst performance on multi-instance insert task.
What happened if we do the benchmark once more, remove one storage node (docker stop), then redo the benchmark (only for RF=2+)?
Yugabytedb test doesn’t even entering the insert stage after 5 minutes ‘__’) may be because of truncate is slow? so I changed the benchmark scenario only for yugabyte to be 1 node be killed after 2 seconds of insertion phase, but still yugabyte giving an error “ERROR: Timed out: Write RPC (request call id 3873) to 172.21.0.5:9100 timed out after 60.000s (SQLSTATE XX000)”, it cannot complete. EDIT yugabyte staff on slack suggested that it should be using RF=3 so it would still survive when one node died.
TiDB seems to be the winner also for case when a node died, in expense of the need of 7 initial node (1 tidb [should be at least 2 for HA], 3 tipd, 3 tikv, but probably can be squeezed to be 1 tidb, 1 tipd, 2 tikv, since apparently the default replication factor is 3), where cockroachdb only need 3, and yugabytedb need 4 (1 ybmaster, 3 ybserver). Not sure tho what would happened if 1 tidb/ybmaster instance is died. The recap spreadsheet are here.
:3 myahaha! So this is probably the reason lots of companies Next time we’re gonna test how simple is it to add and remove node (and securely, if possible only limited set of servers can join without have to set firewall/DMZ to restrict unprivileged servers) then re-benchmark with more complex common use case (like UPSERT, range queries, WHERE-IN, JOIN, and secondary index). If automatic rebalance not in the requirement, I would still use Tarantool (since 2020.09) and Clickhouse (since 2021.04), but now I found one more new favorite automatic-rebalance database other than Aerospike (since 2016.11), moving to TiDB.
Btw do not comment on this blog (since it’s too much spammy comment and there’s no notification whether new comment added), just use github issue or reddit instead.
UPDATE: redo the benchmark for all database after updating the limits.conf, TiDB improved by a lot, while CockroachDB remains the same except for update benchmark.
Originally published at http://kokizzu.blogspot.com.