Huge List of Database Benchmark (2019)

K Prayogo
7 min readJan 5, 2022

Today we will benchmark a single node version of distributed database (and some non-distributed database for comparison), the client all written with Go (with any available driver). The judgement will be about performance (that mostly write, and some infrequent read), not about the distribution performance (I will take a look in some other time). I searched a lot of database from DbEngines for database that could suit my needs for my next project. For session kv-store I’ll be using obviously first choice is Aerospike, but since they cannot be run inside server that I rent (that uses OpenVZ), so I’ll go for second choice that is Redis. Here’s the list of today’s contender:

  • CrateDB, a highly optimized for huge amount of data (they said), probably would be the best for updatable time series, also with built-in search engine, so this one is quite fit my use case probably to replace [Riot (small scale) or Manticore (large scale)] and [InfluxDB or TimescaleDB], does not support auto increment
  • CockroachDB, self-healing database with PostgreSQL-compatible connector, the community edition does not support table partitioning
  • MemSQL, which also can replace kv-store, there’s a limit of 128GB RAM for free version. Row-store tables can only have one PRIMARY key or one UNIQUE key or one AUTO increment column that must be a SHARD key, and it cannot be updated or altered. Column-store tables does not support UNIQUE/PRIMARY key, only SHARD KEY. The client/connector is MySQL-compatible
  • MariaDB (MySQL), one of the most popular open source RDBMS, for the sake of comparison
  • PostgreSQL, my favorite RDBMS, for the sake of comparison
  • NuoDB on another benchmark even faster than GoogleSpanner or CockroachDB, the community edition only support 3 transaction engine (TE) and 1 storage manager (SM)
  • YugaByteDB, distributed KV+SQL with Cassandra and PostgreSQL compatible protocol. Some of SQL syntax not yet supported (ALTER USER, UNIQUE on CREATE TABLE).
  • ScyllaDB, a C++ version of Cassandra. All Cassandra-like databases has a lot of restrictions/annoyances by design compared to traditional RDBMS (cannot CREATE INDEX on composite PRIMARY KEY, no AUTO INCREMENT, doesn’t support UNION ALL or OR operator, must use COUNTER type if you want to UPDATE x=x+n, cannot mix COUNTER type with non-counter type on the same table, etc), does not support ORDER BY other than clustering key, does not support OFFSET on LIMIT.
  • Clickhouse, claimed to be fastest and one of the most storage space efficient OLAP database, but doesn’t support UPDATE/DELETE-syntax (requires ALTER TABLE to UPDATE/DELETE), only support batch insert, does not support UNIQUE, AUTO INCREMENT. Since this is not designed to be an OLTP database, obviously this benchmark would be totally unfair for Clickhouse.

What’s the extra motivation of this post?

I almost never use distributed database, since all of my project have no more than 200 concurrent users/sec. I’ve encountered bottleneck before, and the culprit is multiple slow complex queries, that could be solved by queuing to another message queue, and process them one by one instead of bombing database’s process at the same time and hogging out the memory.

The benchmark scenario would be like this:
1. 50k inserts of single column string value, 200k inserts of 2 column unique value, 900k insert of unique
INSERT INTO users(id, uniq_str) — x50k
INSERT INTO items(fk_id, typ, amount) — x50k x4
INSERT INTO rels(fk_low, fk_high, bond) — x900k

2. while inserting at 5%+, there would be at least 100k random search queries of unique value/, and 300k random search queries, every search queries, there would be 3 random update of amount
SELECT * FROM users WHERE uniq_str = ? — x100k
SELECT * FROM items WHERE fk_id = ? AND typ IN (?) — x100k x3
UPDATE items SET amount = amount + xxx WHERE id = ? — x100k x3

3. while inserting at 5%+, there would be also at least 100k random search queries
SELECT * FROM items WHERE fk_id = ?
4. while inserting at 5%+, there also at least 200k query of relations and 50% chance to update the bond
SELECT * FROM rels WHERE fk_low = ? or fk_high = ? — x200k
UPDATE rels SET bond = bond + xxx WHERE id = ? — x200k / 2

This benchmark represent simplified real use case of the game I’m currently develop. Let’s start with PostgreSQL 10.7 (current one on Ubuntu 18.04.1 LTS), the configuration generated by pgtune website:

For slow databases, all values reduced by 20 except query-only.

[Pg] RandomSearchItems (100000, 100%) took 24.62s (246.21 µs/op)

innodb_buffer_pool_size=4G

CREATE USER ‘b1’@’localhost’ IDENTIFIED BY ‘b1’;

GRANT ALL PRIVILEGES ON b1.* TO ‘b1’@’localhost’;

sudo mysqltuner # not sure if this useful

[My] RandomSearchItems (100000, 100%) took 16.62s (166.20 µs/op)
[My] SearchRelsAddBonds (10000, 100%) took 86.32s (8631.74 µs/op)
[My] UpdateItemsAmounts (5000, 100%) took 172.35s (34470.72 µs/op)
[My] InsertUsersItems (2500, 100%) took 228.52s (91408.86 µs/op)
USERS CR : 2500 / 4994
ITEMS CRU : 17500 / 14982 + 696542 / 13485
RELS CRU : 2375 / 12871 / 6435
SLOW FACTOR : 20
CRU µs/rec : 10213.28 / 23.86 / 13097.44

Next we’ll try with MemSQL 6.7.16–55671ba478, while the insert and update performance is amazing, the query/read performance is 3–4x slower than traditional RDBMS:

$ memsql-admin start-node — all

$ go run memsql.go lib.go # 4 sec before start RU

$ go run memsql.go lib.go # SLOW FACTOR 5

[Crate] SearchRelsAddBonds (10000, 100%) took 49.11s (4911.38 µs/op)
[Crate] RandomSearchItems (100000, 100%) took 101.40s (1013.95 µs/op)
[Crate] UpdateItemsAmounts (5000, 100%) took 246.42s (49283.84 µs/op)
[Crate] InsertUsersItems (2500, 100%) took 306.12s (122449.00 µs/op)
USERS CR : 2500 / 4965
ITEMS CRU : 17500 / 14894 + 690161 / 14895
RELS CRU : 2375 / 4336 / 2168
SLOW FACTOR : 20
CRU µs/rec : 13681.45 / 146.92 / 19598.85

Next is CockroachDB 19.1, the result:

$ go run cockroach.go lib.go
[Cockroach] SearchRelsAddBonds (10000, 100%) took 59.25s (5925.42 µs/op)
[Cockroach] RandomSearchItems (100000, 100%) took 85.84s (858.45 µs/op)
[Cockroach] UpdateItemsAmounts (5000, 100%) took 261.43s (52285.65 µs/op
[Cockroach] InsertUsersItems (2500, 100%) took 424.66s (169864.55 µs/op)
USERS CR : 2500 / 4988
ITEMS CRU : 17500 / 14964 + 699331 / 14964
RELS CRU : 2375 / 5761 / 2880
SLOW FACTOR : 20
CRU µs/rec : 18979.28 / 122.75 / 19022.43

Next is NuoDB 3.4.1, the storage manager and transaction engine config and the benchmark result:

$ chown nuodb:nuodb /media/nuodb
$ nuodbmgr — broker localhost — password nuodb1pass
start process sm archive /media/nuodb host localhost database b1 initialize true
start process te host localhost database b1
— dba-user b2 — dba-password b3
$ nuosql b1 — user b2 — password b3

$ go run nuodb.go lib.go
[Nuo] RandomSearchItems (100000, 100%) took 33.79s (337.90 µs/op)
[Nuo] SearchRelsAddBonds (10000, 100%) took 72.18s (7218.04 µs/op)
[Nuo] UpdateItemsAmounts (5000, 100%) took 117.22s (23443.65 µs/op)
[Nuo] InsertUsersItems (2500, 100%) took 144.51s (57804.21 µs/op)
USERS CR : 2500 / 4995
ITEMS CRU : 17500 / 14985 + 698313 / 14985
RELS CRU : 2375 / 15822 / 7911
SLOW FACTOR : 20
CRU µs/rec : 6458.57 / 48.39 / 8473.22

Next is YugaByte 1.2.5.0, the result:

export YB_PG_FALLBACK_SYSTEM_USER_NAME=user1
./bin/yb-ctl — data_dir=/media/yuga create
# edit yb-ctl set use_cassandra_authentication = True
./bin/yb-ctl — data_dir=/media/yuga start

./bin/cqlsh -u cassandra -p cassandra

When INSERT is not batched on Clickhouse 19.7.3.9:

$ go run clickhouse-1insertPreTransaction.go lib.go
[Click] InsertUsersItems (2500, 100%) took 110.78s (44312.56 µs/op)
[Click] RandomSearchItems (100000, 100%) took 306.10s (3060.95 µs/op)
[Click] SearchRelsAddBonds (10000, 100%) took 534.91s (53491.35 µs/op)
[Click] UpdateItemsAmounts (5000, 100%) took 710.39s (142078.55 µs/op)
USERS CR : 2500 / 4999
ITEMS CRU : 17500 / 14997 + 699615 / 15000
RELS CRU : 2375 / 18811 / 9405
SLOW FACTOR : 20
CRU µs/rec : 4951.12 / 437.52 / 52117.48

These benchmark performed using i7–4720HQ 32GB RAM with SSD disk. At first there’s a lot that I want to add to this benchmark (maybe someday) to make this huge ‘__’), such as:

  • DGraph, a graph database written in Go, the backup is local (same as MemSQL, so you cannot do something like this ssh foo@bar “pg_dump | xz — -c” | pv -r -b > /tmp/backup_`date +%Y%m%d_%H%M%S`.sql.xz”)
  • Cayley, a graph layer written in Go, can support many backend storage
  • ArangoDB, multi-model database, with built-in Foxx Framework for creating REST APIs, has unfamiliar AQL syntax
  • MongoDB, one of the most popular open source document database, for the sake of comparison, I’m not prefer this one because of the memory usage.
  • InfluxDB or TimeScaleDB or SiriDB or GridDB for comparison with Clickhouse
  • Redis or SSDB or LedisDB or Codis or Olric or SummitDB, obviously for the sake of comparison. Also Cete, distributed key-value but instead using memcache protocol this one uses gRPC and REST
  • Tarantool, a redis competitor with ArrangoDB-like features but with Lua instead of JS, I want to see if this simpler to use but with near equal performance as Aerospike
  • Aerospike, fastest distributed kv-store I ever test, just for the sake of comparison, the free version limited to 2 namespace with 4 billions object. Too bad this one cannot be started on OpenVZ-based VM.
  • Couchbase, document oriented database that support SQL-like syntax (N1QL), the free for production one is few months behind the enterprise edition. Community edition cannot create index (always error 5000?).
  • GridDB, in-memory database from Toshiba, benchmarked to be superior to Cassandra
  • ClustrixDB (New name: MariaDB XPand), distributed columnstore version of MariaDB, community version does not support automatic failover and non-blocking backup
  • Altibase, open source in-memory database promoted to be Oracle-compatible, not sure what’s the limitation of the open source version.
  • RedisGraph, fastest in-memory graph database, community edition available.
  • RethinkDB, document-oriented database, last ubuntu package cannot be installed, probably because the project no longer maintained
  • OrientDB, multi model (document and graph database), their screenshot looks interesting, multi-model graph database, but too bad both Golang driver are unmaintained and probably unusable for latest version (3.x)
  • TiDB, a work in progress approach of CockroachDB but with MySQL-compatible connector, as seen from benchmark above, there’s a lot of errors happening
  • RQLite, a distributed SQLite, the go driver by default not threadsafe
  • VoltDB, seems not free, since the website shows “free evaluation”
  • HyperDex, have good benchmark on paper, but no longer maintained
  • LMDB-memcachedb, faster version of memcachedb, a distributed kv, but no longer maintained
  • FoundationDB, a multi-model database, built from kv-database with additional layers for other models, seems to have complicated APIs
  • TigerGraph, fastest distributed graph database, developer edition free but may not be used for production

The chart (lower is better) shown below:

Other 2018’s benchmark here (tl;dr: CockroachDB mostly higher throughput, YugabyteDB lowest latency, TiDB lowest performance among those 3).

Originally published at http://kokizzu.blogspot.com.

--

--