Deleting a large table crashes the cluster

I have set up a 3-node cluster to evaluate the use of CockroachDB for storing a large table. In my testing, I found a way to reproducibly crash the cockroach daemon on every node in the cluster.

I created a table called “bucket”, and generated 165 million rows of artificial data. When I execute a “DELETE FROM bucket” SQL command via the psycopg2 Python driver, the memory usage of the cockroach process increases rapidly until the machine runs out of memory and the process crashes with a std::bad_alloc error. Here are a few lines of context from the error log around the crash:

terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
terminate called recursively
std::bad_alloc'
  what():  std::bad_alloc
SIGABRT: abort

The cluster is comprised of 3 virtual machines running CentOS 7.4. Each machine has 4 vCPUs, 8GB RAM, 10GB of swap space, and 100GB of disk space. The size of the “bucket” table at the time of the crash is 8GB with 462 ranges. Memory overcommitting is disabled at the kernel level (sysctl vm.overcommit_memory=2).

I also discovered that when I try running the same DELETE query via the cockroach CLI, it rejects the query as being unsafe.

Here’s the error message:

$ cockroach sql --insecure
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
#
# Server version: CockroachDB CCL v19.1.3 (x86_64-unknown-linux-gnu, built 2019/07/08 18:24:39, go1.11.6) (same version as client)
# Cluster ID: d1a47f5c-a202-484d-845f-799e94f9e4df
#
# Enter \? for a brief introduction.
#
root@:26257/defaultdb> delete from bucket;
pq: rejected: DELETE without WHERE clause (sql_safe_updates = true)

Hey @heily,

Thank you for bringing this up.

I’ve created a GitHub issue for this error.

For large deletes like this (where you are deleting all rows), TRUNCATE is the better option.

As for the CLI error, this is just a warning for performing a delete without a WHERE clause (it’s complaining about a potetially dangerous sql statement).

You can disable this error by changing this cluster setting, set sql_safe_updates = false;

However please note that you will no longer be protected from short queries like that(that will delete the whole table). You can read more about that here. See the section about sql_safe_updates.

Thanks,
Matt

Hi Matt,

Thanks for looking into this. I will switch to TRUNCATE for my personal testing in the future.

However, in a typical production database deployment, we give users the ability to run arbitrary queries, and they expect that big queries will be slow but eventually complete. We can’t control how many rows are affected by each query, or the size of the transaction.

Are there other situations where a badly designed query can cause CockroachDB to consume all available memory?

Hey Mark,

Not that i am aware of, the issue you raised is a bug.

However, if you come across a poorly designed query that is consuming all available query. Please let us know.

Thanks,
Matt

I was trying to make a copy of the big table before truncating it, so I ran the following query:

create table bigtest as select * from bucket;

This caused one of the cockroach nodes to crash. I was running it in the foreground, and it didn’t show any error messages on stdout, but the process exited with return code 2.

Here’s the backtrace from the logs on the server node that crashed:

fatal error: runtime: out of memory
runtime stack:
runtime.throw(0x32ec9f0, 0x16)
        /usr/local/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc2a8000000, 0x90000000, 0x5bba258)
        /usr/local/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0x5b74ec0, 0x90000000, 0x40014b86ba, 0x7fe0f6ffe9a0)
        /usr/local/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0x5b74ec0, 0x47d2b, 0x0)
        /usr/local/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0x5b74ec0, 0x47d2b, 0x5bba268, 0x2030a900000000)
        /usr/local/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0x5b74ec0, 0x47d2b, 0x720100, 0x7fe0481ba000)
        /usr/local/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0x5b74ec0, 0x47d2b, 0x7fe048010100, 0x10a)
        /usr/local/go/src/runtime/mheap.go:758 +0x8a
runtime.largeAlloc(0x8fa56000, 0xc000060001, 0xaa)
        /usr/local/go/src/runtime/malloc.go:1019 +0x97
runtime.mallocgc.func1()
        /usr/local/go/src/runtime/malloc.go:914 +0x46
runtime.systemstack(0x7fe105600c10)
        /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1229

Here’s the logs from the client side:

root@:26257/defaultdb> create table bigtest as select * from bucket;
driver: bad connection
warning: connection lost!
opening new connection: all session settings will be lost
warning: error retrieving the transaction status: dial tcp 127.0.0.1:26257: connect: connection refused
warning: connection lost!
opening new connection: all session settings will be lost
warning: error retrieving the database name: dial tcp 127.0.0.1:26257: connect: connection refused
root@:26257/? ?>

Thanks for letting me know!

We have some significant improvements for create table <table2> as select * from <table2> in our upcoming 19.2 release.

We are aware of this performance issue, you can view it here.

Best,
Matt

After the crash reported above, I was unable to restart the cockroach daemon on the crashed node. Here’s the error it gave me:

* ERROR: [n1,s1] a panic has occurred!
*
tocommit(29897) is out of range [lastIndex(28814)]. Was the raft log corrupted, truncated, or lost?

I pressed Control-C to stop the other two nodes, and they both failed to stop cleanly. The error message was something like:

^CNote: a second interrupt will skip graceful shutdown and terminate forcefully
initiating graceful shutdown of server
*
* ERROR: time limit reached, initiating hard shutdown - node may take longer to restart & clients may need to wait for leases to expire
*
Failed running "start"

After all three nodes were shutdown, I was able to start them up using:

/usr/local/bin/cockroach start --insecure --advertise-addr=$(hostname -I | awk '{print $1}') --join=$node1_ip,$node2_ip,$node3_ip --cache=.25 --max-sql-memory=.25 --store=/data/cockroach-data

What you’re reporting is unexpected behavior, I would like to get to the bottom of this.

Is there any other information you can provide me regarding your deployment? Are you using a container?

Can you provide me with a debug.zip?

You can email it to support@cockroachlabs.com. If the file is too large i can send you a private link to upload it to me.

Can you confirm that node 1 is the crashed node?

Thanks,
Matt

Thanks for the offer of help. I’ve emailed a debug.zip file to the support@ address.

The node with the NodeID=1 is the one that crashes. This is the node that my psql client is communicating with when I issue the problematic query.

I can’t think of anything unusual about the deployment. I’m not using containers, these are three generic VMs that were built specifically for testing new databases, and CockroachDB was the first database I tested on them. For the filesystem underneath the database, I’m using ext4 with the mount options “noatime,barrier=0”.

Hey Mark,

Regarding this error:

* ERROR: [n1,s1] a panic has occurred!
*
tocommit(29897) is out of range [lastIndex(28814)]. Was the raft log corrupted, truncated, or lost?

When trying to restart the cockroach daemon on the crashed node.

What flags were you using on the start command? Are you using --logtostderr.

Thanks,
Matt

Additionally, the fix for your original issue should be in the next 19.1 release.

Just wanted to give you an update.