I have set up a 3-node cluster to evaluate the use of CockroachDB for storing a large table. In my testing, I found a way to reproducibly crash the cockroach daemon on every node in the cluster.
I created a table called “bucket”, and generated 165 million rows of artificial data. When I execute a “DELETE FROM bucket” SQL command via the psycopg2 Python driver, the memory usage of the cockroach process increases rapidly until the machine runs out of memory and the process crashes with a std::bad_alloc error. Here are a few lines of context from the error log around the crash:
terminate called after throwing an instance of 'terminate called recursively terminate called recursively terminate called recursively std::bad_alloc' what(): std::bad_alloc SIGABRT: abort
The cluster is comprised of 3 virtual machines running CentOS 7.4. Each machine has 4 vCPUs, 8GB RAM, 10GB of swap space, and 100GB of disk space. The size of the “bucket” table at the time of the crash is 8GB with 462 ranges. Memory overcommitting is disabled at the kernel level (sysctl vm.overcommit_memory=2).
I also discovered that when I try running the same DELETE query via the cockroach CLI, it rejects the query as being unsafe.
Here’s the error message:
$ cockroach sql --insecure # Welcome to the cockroach SQL interface. # All statements must be terminated by a semicolon. # To exit: CTRL + D. # # Server version: CockroachDB CCL v19.1.3 (x86_64-unknown-linux-gnu, built 2019/07/08 18:24:39, go1.11.6) (same version as client) # Cluster ID: d1a47f5c-a202-484d-845f-799e94f9e4df # # Enter \? for a brief introduction. # root@:26257/defaultdb> delete from bucket; pq: rejected: DELETE without WHERE clause (sql_safe_updates = true)