Node often throw panic and stop when I force to terminate my application

,

Hi
When I kill the process of my application which is connected Cockroach DB, a node dies.
Not always but often.
Only one thread of one computer is accessing the DB.
DB logs of the node are:

gossip client (1/3 cur/max conns)
3: 192.168.1.128:26253 (12m0s: infos 589/163 sent/received, bytes 117800B/74317B sent/received)
gossip server (0/3 cur/max conns, infos 589/163 sent/received, bytes 117800B/74317B sent/received)
W171113 08:55:03.267332 54 storage/store.go:3581 [n1,s1] handle raft ready: 0.7s [processed=0]
I171113 08:55:04.310781 91 server/status/runtime.go:223 [n1] runtime stats: 135 MiB RSS, 150 goroutines, 16 MiB/23 MiB/52 MiB GO alloc/idle/total, 138 KiB/740 KiB CGO alloc/total, 743.30cgo/sec, 0.02/0.00 %(u/s)time, 0.00 %gc (2x)
W171113 08:55:13.881648 165 storage/node_liveness.go:399 [n1,hb] slow heartbeat took 1.6s
I171113 08:55:14.310738 91 server/status/runtime.go:223 [n1] runtime stats: 139 MiB RSS, 154 goroutines, 35 MiB/5.4 MiB/52 MiB GO alloc/idle/total, 138 KiB/740 KiB CGO alloc/total, 3861.02cgo/sec, 0.06/0.00 %(u/s)time, 0.00 %gc (5x)
E171113 08:55:14.378715 37826 sql/pgwire/v3.go:271 [n1,client=192.168.1.128:21708] write tcp 192.168.1.128:26251->192.168.1.128:21708: wsasend: An existing connection was forcibly closed by the remote host.
E171113 08:55:14.378784 37826 server/server.go:753 [n1,client=192.168.1.128:21708] WireFailureError: write tcp 192.168.1.128:26251->192.168.1.128:21708: wsasend: An existing connection was forcibly closed by the remote host.
W171113 08:55:14.378888 37872 kv/txn_coord_sender.go:786 [client=192.168.1.128:21708,user=root,n1] heartbeat to “sql txn” id=145a29b7 key=/Table/79/1/2017-11-13T17:55:08.195796Z/296497585593679873/0 rw=true pri=0.02264804 iso=SERIALIZABLE stat=PENDING epo=0 ts=1510563310.932934200,0 orig=1510563310.932934200,0 max=1510563311.432934200,0 wto=false rop=false seq=2 failed: heartbeat for transaction “sql txn” id=145a29b7 key=/Table/79/1/2017-11-13T17:55:08.195796Z/296497585593679873/0 rw=true pri=0.02264804 iso=SERIALIZABLE stat=PENDING epo=0 ts=1510563310.932934200,0 orig=1510563310.932934200,0 max=1510563310.932934200,0 wto=false rop=false seq=3 failed; record not present
W171113 08:55:22.005603 74 storage/store.go:3581 [n1,s1] handle raft ready: 0.7s [processed=0]
W171113 08:55:22.005654 77 storage/store.go:3581 [n1,s1] handle raft ready: 0.5s [processed=0]
W171113 08:55:23.130771 59 storage/store.go:3581 [n1,s1] handle raft ready: 0.7s [processed=0]
W171113 08:55:23.570454 38083 storage/replica.go:2696 [n1,s1,r3/1:/System/NodeLiveness{-Max}] context cancellation after 2.3s of attempting command [txn: 467b91f2], BeginTransaction [/System/NodeLiveness/1,/Min), ConditionalPut [/System/NodeLiveness/1,/Min), EndTransaction [/System/NodeLiveness/1,/Min)
W171113 08:55:23.570598 165 storage/node_liveness.go:399 [n1,hb] slow heartbeat took 2.3s
W171113 08:55:23.570604 165 storage/node_liveness.go:342 [n1,hb] failed node liveness heartbeat: context deadline exceeded
W171113 08:55:23.897921 69 storage/store.go:3581 [n1,s1] handle raft ready: 0.6s [processed=0]
I171113 08:55:24.312275 91 server/status/runtime.go:223 [n1] runtime stats: 144 MiB RSS, 151 goroutines, 20 MiB/25 MiB/58 MiB GO alloc/idle/total, 138 KiB/740 KiB CGO alloc/total, 668.50cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (1x)
W171113 08:55:31.541185 165 storage/node_liveness.go:399 [n1,hb] slow heartbeat took 1.2s
I171113 08:55:34.311149 91 server/status/runtime.go:223 [n1] runtime stats: 145 MiB RSS, 150 goroutines, 21 MiB/25 MiB/58 MiB GO alloc/idle/total, 138 KiB/740 KiB CGO alloc/total, 782.69cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)
E171113 08:55:40.490313 38186 sql/session.go:1370 [client=192.168.1.128:21870,user=root,n1] a SQL panic has occurred!
E171113 08:55:40.490373 38186 sql/pgwire/v3.go:271 [n1,client=192.168.1.128:21870] write tcp 192.168.1.128:26251->192.168.1.128:21870: wsasend: An existing connection was forcibly closed by the remote host.
W171113 08:55:40.490414 38215 kv/txn_coord_sender.go:786 [client=192.168.1.128:21870,user=root,n1] heartbeat to “sql txn” id=c89e1362 key=/Table/79/1/2017-11-13T17:55:36.190667Z/296497585593679873/0 rw=true pri=0.02589156 iso=SERIALIZABLE stat=PENDING epo=0 ts=1510563337.971842400,0 orig=1510563337.971842400,0 max=1510563338.471842400,0 wto=false rop=false seq=2 failed: heartbeat for transaction “sql txn” id=c89e1362 key=/Table/79/1/2017-11-13T17:55:36.190667Z/296497585593679873/0 rw=true pri=0.02589156 iso=SERIALIZABLE stat=PENDING epo=0 ts=1510563337.971842400,0 orig=1510563337.971842400,0 max=1510563337.971842400,0 wto=false rop=false seq=3 failed; record not present
E171113 08:55:40.490415 38186 util/log/crash_reporting.go:101 [n1] a panic has occurred!
E171113 08:55:41.614758 38186 util/log/crash_reporting.go:344 [n1] Reported as error 3d905b96624e4541afdad818ff6764bc
panic: updateStateAndCleanupOnErr called in state with no KV txn. State: NoTxn [recovered]
** panic: panic while executing “COMMIT”: updateStateAndCleanupOnErr called in state with no KV txn. State: NoTxn [recovered]**
** panic: panic while executing “COMMIT”: updateStateAndCleanupOnErr called in state with no KV txn. State: NoTxn [recovered]**
** panic: panic while executing “COMMIT”: updateStateAndCleanupOnErr called in state with no KV txn. State: NoTxn**

goroutine 38186 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc042092000, 0x4eb8b88, 0xc0423eddd0)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:181 +0x9f
panic(0x1820f00, 0xc0422ee460)
/usr/local/go/src/runtime/panic.go:489 +0x2dd
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).serve.func1(0xc042753500, 0x4eb8b88, 0xc042fd8420)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:423 +0xa8
panic(0x1820f00, 0xc0422ee460)
/usr/local/go/src/runtime/panic.go:489 +0x2dd
github.com/cockroachdb/cockroach/pkg/sql.(*Session).maybeRecover(0xc042ca2000, 0x1a0ca86, 0x9, 0xc0425d4000, 0x6)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/session.go:1381 +0x294
panic(0x17446e0, 0xc042fe95a0)
/usr/local/go/src/runtime/panic.go:489 +0x2dd
github.com/cockroachdb/cockroach/pkg/sql.(*txnState).updateStateAndCleanupOnErr(0xc042ca2088, 0x29afb80, 0xc042fe9590, 0xc0423ebb00, 0xc042fe9590, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/session.go:1210 +0x53b
github.com/cockroachdb/cockroach/pkg/sql.runTxnAttempt(0xc0423ebb00, 0xc042ca2000, 0xc042fed300, 0x1, 0x1, 0x0, 0x2, 0xc0438f0000, 0x0, 0x29c97a0, …)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/executor.go:1207 +0x56a
github.com/cockroachdb/cockroach/pkg/sql.runWithAutoRetry(0xc0423ebb00, 0xc042ca2000, 0xc042fed300, 0x1, 0x1, 0x0, 0x0, 0x0, 0x44b200, 0x2fbe2e0, …)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/executor.go:1016 +0x1f7
github.com/cockroachdb/cockroach/pkg/sql.(*Executor).execParsed(0xc0423ebb00, 0xc042ca2000, 0xc042fed300, 0x1, 0x1, 0x0, 0x0, 0xc042017300, 0x1)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/executor.go:834 +0x271
github.com/cockroachdb/cockroach/pkg/sql.(*Executor).execRequest(0xc0423ebb00, 0xc042ca2000, 0xc0425d4000, 0x6, 0x0, 0x0, 0x6, 0x1000)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/executor.go:757 +0x49a
github.com/cockroachdb/cockroach/pkg/sql.(*Executor).ExecuteStatements(0xc0423ebb00, 0xc042ca2000, 0xc0425d4000, 0x6, 0x0, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/executor.go:638 +0x130
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).handleSimpleQuery(0xc042753500, 0xc042753528, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:612 +0x14f
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).serve(0xc042753500, 0x4eb8b88, 0xc042fd8420, 0xc0436b5de0, 0x5400, 0xc04254e368, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:503 +0xbd7
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn(0xc04254e200, 0x4eb8dd0, 0xc043983140, 0x29cff80, 0xc043cf7540, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:421 +0x973
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start.func9.1(0x29cff80, 0xc043cf7540)
/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:749 +0x150
github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith.func1(0xc042092000, 0x4eb8b88, 0xc0423eddd0, 0xc0421c0048, 0x29cff80, 0xc043cf7540, 0xc04353bf50)
/go/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:140 +0xb0
created by github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith
/go/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:142 +0x26e

My application is written in python, and using psycopg2 v.2.7.3.2.
I’m using psycopg2.pool.SimpleConnectionPool from single thread and
doing some simple works such as just select and insert with ISOLATION_LEVEL_DEFAULT isolation level.
All transactions are wrapped by

When I change ‘isolation level’ to ISOLATION_LEVEL_AUTOCOMMIT and remove commit and rollback, so that not to use any transaction, then none of a node dies while application terminates. So I assume this is related to transaction.

Am I using incorrectly? Or is this a bug?

Thank you :slight_smile:

binary version:
Build Tag: v1.1.2
Build Time: 2017/11/02 19:36:20
Distribution: CCL
Platform: windows amd64
Go Version: go1.8.3
C Compiler: gcc 6.3.0
Build SHA-1: c726df22152a771cdd1882679c772d4e7799440d
Build Type: release

Hello!
thank you for reporting this!
This is likely a bug. Please file it on Github, and cc @andreimatei and myself. Thank!

1 Like

Hi, I’ve gone ahead and created a GitHub issue to track this. You can follow the status here: https://github.com/cockroachdb/cockroach/issues/20007. Thanks for reporting!

1 Like