Vague error doing a full backup with AS OF SYSTEM TIME '-10s'

bug

(Alex Narayan) #1
BACKUP DATABASE backup_test TO "s3://... AS OF SYSTEM TIME '-10s';
2019/03/04 16:51:23 Failure to execute command on db pq: exporting 4 ranges: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])
--- FAIL: TestBackupFull (15.42s)
panic: Failure to execute command on db pq: exporting 4 ranges: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])
 [recovered]
	panic: Failure to execute command on db pq: exporting 4 ranges: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])


goroutine 19 [running]:
testing.tRunner.func1(0xc00016a100)
	/usr/local/Cellar/go/1.11.5/libexec/src/testing/testing.go:792 +0x387
panic(0x12cf1a0, 0xc000087b80)
	/usr/local/Cellar/go/1.11.5/libexec/src/runtime/panic.go:513 +0x1b9
log.Panicln(0xc0000b5c70, 0x2, 0x2)

I was connecting to cockroach via a kubectl port-forward issuing a full backup with as of system time and got the above abbreviated error. Re-issuing the command caused the backup to succeed, but was curious why it failed.


(Roland Crosby) #2

Hi Alex, thanks for the report. I’m not sure exactly what may have caused this; it looks like an intermittent connectivity problem internal to the database. I’ve surfaced this to the team that works on BACKUP and RESTORE to determine if this is something they’ve seen before; will keep you posted.

What is your cluster topology? A set of similar nodes inside a single Kubernetes cluster?


(Roland Crosby) #3

Also, if the logs from your nodes are still around, could you send us a debug zip so we can look at the logs from around this time? You can send it to my email (roland@cockroachlabs.com). Thanks!


(Alex Narayan) #4

I’ll get that to you in the morning.

The cluster is a three node setup on a larger 5 node setup with ceph storage. I am almost certain I was forwarding to my local
Workstation via port-forward through kubectl.


(Alex Narayan) #5

My apologies: I am running cockroach master branch in my testing cluster. I need to downgrade to 2.1.5 by rebuilding the deployment. So since this was against a branch not really supported I think this can be closed. If I see a similar issue again on a stable production level release i.e 2.1.5 I’ll report back.


(Raphael 'kena' Poss) #6

Thank you for clarifying, however if there’s an issue with the master branch we may also want to investigate it. If you get a chance we probably still want to look at these log files.


(Alex Narayan) #8

I have the zip I will try to get it pulled out of k8s and sent to you Roland tomorrow. Thanks guys!


(Alex Narayan) #9

@rolandcrosby @knz incoming email to Roland with the debug zip


(Alex Narayan) #10

Did you guys get the zip file?