Hi, I’m not sure if this issue is same as https://github.com/cockroachdb/cockroach/issues/9540
stability: run insert&delete-heavy workload with intent/GC queue pressure #9540
If so, it may help to provide some information, and myself may want to know some of of why ( the cause ) ( if there’s any )
And why that table is special
I do the following
To have that table ( bag_index ) have many rows ( about 123612 rows )
Then doing delete will cause no response ( after Control + C, the select doesn’t work too ) that table can’t do any operation ( long time no response, say one hour, I don’t wait that long ) ( the problem exists that long, say one day )
For now it only happen for that table ( bag_index )
beta-20161027 ( done restart that node once, I tried delete again, restart won’t fix)
beta-20161103 ( doing restart that node, make it can select again )
Select have no response
root@192.168.100.241:8000> SELECT count(*) FROM bag_index;
Other tables are fine
root@192.168.100.241:8000> SELECT count(*) FROM bag_index_compute;
+----------+
| count(*) |
+----------+
| 71314 |
+----------+
(1 row)
After restart the table is okay now
root@192.168.100.241:8000> select count(*) from bag_index;
+----------+
| count(*) |
+----------+
| 123612 |
+----------+
(1 row)
Here are some logs
[wen@hadoop241 logs]$ less cockroach.hadoop241.wen.log.INFO.2016-11-15T09_45_50+08_00.3327
I161115 09:45:50.648868 1 util/log/clog.go:988 [config] file created at: 2016/11/15 09:45:50
I161115 09:45:50.648868 1 util/log/clog.go:988 [config] running on machine: hadoop241
I161115 09:45:50.648868 1 util/log/clog.go:988 [config] binary: CockroachDB beta-20161103 (linux amd64, built 2016/11/03 15:03:54, go1.7.3)
I161115 09:45:50.648868 1 util/log/clog.go:988 [config] arguments: [/home/wen/clusterthree/cockroach start --insecure --host= --store=node1 --port=8000 --http-port=7000]
I161115 09:45:50.648868 1 util/log/clog.go:988 line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg
I161115 09:45:50.648867 1 cli/start.go:299 CockroachDB beta-20161103 (linux amd64, built 2016/11/03 15:03:54, go1.7.3)
I161115 09:45:50.756303 1 cli/backtrace.go:40 backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
I161115 09:45:50.756366 1 cli/start.go:315 starting cockroach node
W161115 09:45:50.756421 1 server/server.go:154 [n?] running in insecure mode, this is strongly discouraged. See --insecure.
I161115 09:45:50.756657 1 gossip/gossip.go:237 [n?] initial resolvers: []
W161115 09:45:50.756678 1 gossip/gossip.go:1055 [n?] no resolvers found; use --join to specify a connected node
I161115 09:45:50.766629 1 storage/engine/rocksdb.go:326 opening rocksdb instance at "node1"
I161115 09:45:51.360771 1 server/config.go:443 1 storage engine initialized
I161115 09:45:51.364954 1 base/node_id.go:62 [n1] NodeID set to 1
W161115 09:45:51.630882 1 storage/store.go:766 [n1] storeMu: mutex held by github.com/cockroachdb/cockroach/pkg/storage.(*Store).Start for 265.797643ms (>100ms):
goroutine 1 [running]:
runtime/debug.Stack(0x1d25710, 0x1d25750, 0x3b)
/usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/pkg/util/syncutil.ThresholdLogger.func1(0xfd7c00b)
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:65 +0xe1
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*TimedMutex).Unlock(0xc4203977b8)
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:92 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Start(0xc420397500, 0x7f6a4b6b9dc0, 0xc420213c80, 0xc420465cb0, 0xc4205ecc70, 0x1)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:1052 +0x520
github.com/cockroachdb/cockroach/pkg/server.(*Node).initStores(0xc4201a0780, 0x7f6a4b6b9dc0, 0xc420213c80, 0xc4201461f0, 0x1, 0x1, 0xc420465cb0, 0x0, 0x0, 0x7f6a4b6b9dc0)
/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:418 +0x213
github.com/cockroachdb/cockroach/pkg/server.(*Node).start(0xc4201a0780, 0x7f6a4b6b9dc0, 0xc420213c80, 0x232c420, 0xc4203c5f60, 0xc4201461f0, 0x1, 0x1, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:340 +0x11d
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start(0xc420202000, 0x7f6a4b6b9dc0, 0xc420213c80, 0xc42046b7c0, 0x1)
/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:575 +0xccc
github.com/cockroachdb/cockroach/pkg/cli.runStart(0x2316b40, 0xc42044f680, 0x0, 0x5, 0x7fffb067c83f, 0x4)
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:324 +0x920
github.com/cockroachdb/cockroach/pkg/cli.maybeDecorateGRPCError.func1(0x2316b40, 0xc42044f680, 0x0, 0x5, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/cli/error.go:30 +0x5a
github.com/spf13/cobra.(*Command).execute(0x2316b40, 0xc42044f630, 0x5, 0x5, 0x2316b40, 0xc42044f630)
/go/src/github.com/spf13/cobra/command.go:632 +0x23e
github.com/spf13/cobra.(*Command).ExecuteC(0x2312740, 0xc42000e220, 0xebcdac937f56b809, 0xc42000e258)
/go/src/github.com/spf13/cobra/command.go:722 +0x367
github.com/spf13/cobra.(*Command).Execute(0x2312740, 0xc42000e240, 0xebcdac937f56b809)
/go/src/github.com/spf13/cobra/command.go:681 +0x2b
github.com/cockroachdb/cockroach/pkg/cli.Run(0xc42000c0f0, 0x6, 0x6, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:95 +0x6d
main.main()
/go/src/github.com/cockroachdb/cockroach/main.go:37 +0xe1
I161115 09:45:51.631284 1 storage/store.go:1173 [n1] [n1,s1]: failed initial metrics computation: [n1,s1]: system config not yet available
I161115 09:45:51.631321 1 server/node.go:433 [n1] initialized store [n1,s1]: {Capacity:105689415680 Available:57766645760 RangeCount:0}
I161115 09:45:51.631455 1 server/node.go:318 [n1] node ID 1 initialized
I161115 09:45:51.631566 1 gossip/gossip.go:280 [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"hadoop241:8000" > attrs:<> locality:<>
I161115 09:45:51.631713 1 storage/stores.go:296 [n1] read 2 node addresses from persistent storage
I161115 09:45:51.632068 1 server/node.go:563 [n1] connecting to gossip network to verify cluster ID...
I161115 09:45:51.632098 1 server/node.go:584 [n1] node connected via gossip and verified as part of cluster {"104a7b1e-070a-4472-94fd-6c8b26850864"}
I161115 09:45:51.632131 1 server/node.go:368 [n1] node=1: started with [[]=node1] engine(s) and attributes []
I161115 09:45:51.632172 1 server/server.go:605 [n1] starting http server at hadoop241:7000
I161115 09:45:51.632188 1 server/server.go:606 [n1] starting grpc/postgres server at hadoop241:8000
I161115 09:45:51.632382 1 server/server.go:607 [n1] advertising CockroachDB node at hadoop241:8000
I161115 09:45:51.774141 168 gossip/client.go:126 [n1] node 1: started gossip client to hadoop243:8001
I161115 09:45:54.579276 201 storage/raft_transport.go:423 [n1] raft transport stream to node 2 established
I161115 09:45:54.789133 161 storage/raft_transport.go:423 [n1] raft transport stream to node 3 established
I161115 09:45:54.802167 145 sql/event_log.go:95 [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:{NetworkField:tcp AddressField:hadoop241:8000} Attrs: Locality:} ClusterID:{UUID:104a7b1e-070a-4472-94fd-6c8b26850864} StartedAt:1479174351632104074}
I161115 09:45:56.082623 231 storage/replica_raftstorage.go:574 [n1,s1,r116/1:/System/tsd/cr.store.{mu…-ra…}] [n1,s1,r116/1:/System/tsd/cr.store.{mu…-ra…}]: with replicaID 1, applying Raft snapshot at index 24327 (id=989003f9, encoded size=16, 25 rocksdb batches, 4048 log entries)
I161115 09:45:57.265915 326 storage/replica_raftstorage.go:574 [n1,s1,r1/1:/{Min-System/tsd/c…}] [n1,s1,r1/1:/{Min-System/tsd/c…}]: with replicaID 1, applying Raft snapshot at index 159280 (id=9d6b22b8, encoded size=16, 41 rocksdb batches, 3124 log entries)
I161115 09:45:57.550411 231 storage/replica_raftstorage.go:577 [n1,s1,r116/1:/System/tsd/cr.store.{mu…-ra…}] [n1,s1,r116/1:/System/tsd/cr.store.{mu…-ra…}]: with replicaID 1, applied Raft snapshot in 1.468s
W161115 09:45:57.551112 231 storage/replica.go:528 [n1,s1,r116/1:/System/tsd/cr.store.{mu…-ra…}] raftMu: mutex held by github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest for 1.468760541s (>500ms):
goroutine 231 [running]:
runtime/debug.Stack(0x1d28bf8, 0x1d28c38, 0x48)
/usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/pkg/util/syncutil.ThresholdLogger.func1(0x578b81dd)
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:65 +0xe1
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*TimedMutex).Unlock(0xc42026d1f8)
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:92 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest(0xc420397500, 0x7f6a4b6b9dc0, 0xc420212db0, 0xc4203b4d78, 0x3f459234f9039098, 0x3e23945eaf8aba90, 0x74, 0xc420212a20, 0x2e, 0x30, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2923 +0xc88
github.com/cockroachdb/cockroach/pkg/storage.(*Store).HandleSnapshot(0xc420397500, 0xc4203b4d20, 0x233e340, 0xc4207621f0, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2567 +0x6ac
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1.1(0x233e340, 0xc4207621f0, 0xc420079500, 0xc4203caf38, 0xc4203caf40)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:352 +0x14d
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1(0x7f6a4b6b9dc0, 0xc42026ff50)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:353 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc420465cb0, 0x1ffde26, 0x19, 0x162, 0x0, 0x0, 0xc420212930, 0x7f6a4b6b9dc0, 0xc42026ff50)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:264 +0xdf
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:265 +0x234
I161115 09:45:58.131184 326 storage/replica_raftstorage.go:577 [n1,s1,r1/1:/{Min-System/tsd/c…}] [n1,s1,r1/1:/{Min-System/tsd/c…}]: with replicaID 1, applied Raft snapshot in 0.865s
W161115 09:45:58.131592 326 storage/replica.go:528 [n1,s1,r1/1:/{Min-System/tsd/c…}] raftMu: mutex held by github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest for 865.770425ms (>500ms):
goroutine 326 [running]:
runtime/debug.Stack(0x1d28bf8, 0x1d28c38, 0x48)
/usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/cockroachdb/cockroach/pkg/util/syncutil.ThresholdLogger.func1(0x339a9bb9)
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:65 +0xe1
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*TimedMutex).Unlock(0xc42026c778)
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/timedmutex.go:92 +0x80
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest(0xc420397500, 0x7f6a4b6b9dc0, 0xc4207343c0, 0xc420e5c058, 0xb0416caeb8226b9d, 0xccb6d6981cf7f5bd, 0x1, 0x26d30f8, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2923 +0xc88
github.com/cockroachdb/cockroach/pkg/storage.(*Store).HandleSnapshot(0xc420397500, 0xc420e5c000, 0x233e340, 0xc420e961d0, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2567 +0x6ac
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1.1(0x233e340, 0xc420e961d0, 0xc420079500, 0xc4207d7f38, 0xc4207d7f40)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:352 +0x14d
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1(0x7f6a4b6b9dc0, 0xc420785c50)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:353 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc420465cb0, 0x1ffde26, 0x19, 0x162, 0x0, 0x0, 0xc420785e90, 0x7f6a4b6b9dc0, 0xc420785c50)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:264 +0xdf
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:265 +0x234
I161115 09:45:58.660700 346 storage/replica_raftstorage.go:574 [n1,s1,r132/1:{-}] [n1,s1,r132/1:{-}]: with replicaID 1, applying Raft snapshot at index 7329 (id=d2ed84e5, encoded size=16, 13 rocksdb batches, 2494 log entries)
I161115 09:45:58.848294 346 storage/replica_raftstorage.go:577 [n1,s1,r132/1:/System/tsd/cr.node.{sq…-tx…}] [n1,s1,r132/1:/System/tsd/cr.node.{sq…-tx…}]: with replicaID 1, applied Raft snapshot in 0.188s
I161115 09:45:59.559487 297 storage/replica_raftstorage.go:574 [n1,s1,r131/1:{-}] [n1,s1,r131/1:{-}]: with replicaID 1, applying Raft snapshot at index 16085 (id=5d7ff75c, encoded size=16, 18 rocksdb batches, 4282 log entries)
I161115 09:45:59.823922 297 storage/replica_raftstorage.go:577 [n1,s1,r131/1:/System/tsd/cr.{node.…-store…}] [n1,s1,r131/1:/System/tsd/cr.{node.…-store…}]: with replicaID 1, applied Raft snapshot in 0.264s
I161115 09:46:00.526882 333 http2_server.go:236 transport: http2Server.HandleStreams received bogus greeting from client: "GET / HTTP/1.1\r\nHost: 19"
I161115 09:46:01.633268 163 server/status/runtime.go:228 runtime stats: 611 MiB RSS, 130 goroutines, 103 MiB/237 MiB/362 MiB GO alloc/idle/total, 77 MiB/111 MiB CGO alloc/total, 0.00cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (25x)
I161115 09:46:01.635334 143 storage/store.go:3732 [n1,s1] sstables (read amplification = 4):
bag_index
CREATE TABLE "crebusiness.bag_index" (
sharding_code STRING(4) NULL,
ticket_all_no STRING(32) NOT NULL,
ticket_sign STRING(1) NULL,
ticket_no STRING(12) NULL,
form_code STRING(3) NULL,
ticket_type STRING(1) NULL,
trans_type STRING(2) NULL,
print_date STRING(8) NULL,
print_time STRING(6) NULL,
send_man_name STRING(64) NULL,
send_province_code STRING(2) NULL,
send_city_code STRING(4) NULL,
send_area_code STRING(6) NULL,
send_man_address STRING(128) NULL,
send_man_phone STRING(20) NULL,
send_man_fax STRING(20) NULL,
send_man_mobile STRING(20) NULL,
send_man_id_card STRING(18) NULL,
send_man_email STRING(64) NULL,
receive_man_name STRING(64) NULL,
receive_province_code STRING(2) NULL,
receive_city_code STRING(4) NULL,
receive_area_code STRING(6) NULL,
receive_man_address STRING(128) NULL,
receive_man_phone STRING(20) NULL,
receive_man_fax STRING(20) NULL,
receive_man_mobile STRING(20) NULL,
receive_man_id_card STRING(18) NULL,
receive_man_email STRING(64) NULL,
goods_name STRING(20) NULL,
num STRING(11) NULL,
weight STRING(11) NULL,
declare_price STRING(11) NULL,
passenger_num STRING(4) NULL,
train_code STRING(8) NULL,
train_date STRING(8) NULL,
train_ticket_no STRING(20) NULL,
ticket_station_code STRING(5) NULL,
create_datetime STRING(14) NULL,
update_datetime STRING(14) NULL,
cre_sheet_no STRING(32) NULL,
row_id INT NULL,
CONSTRAINT "primary" PRIMARY KEY (ticket_all_no),
UNIQUE INDEX idx_row_id (row_id),
INDEX idx_update_datetime (update_datetime),
INDEX idx_create_datetime (create_datetime),
INDEX idx_print_date (print_date, print_time),
FAMILY "primary" (sharding_code, ticket_all_no, ticket_sign, ticket_no, form_code, ticket_type, trans_type, print_date, print_time, send_man_name, send_province_code, send_city_code, send_area_code, receive_province_code),
FAMILY fam_1_send_man_address_send_man_phone_send_man_fax_send_man_mobile_receive_city_code (send_man_address, send_man_phone, send_man_fax, send_man_mobile, receive_city_code),
FAMILY fam_2_send_man_id_card_send_man_email_receive_man_name_receive_area_code_receive_man_phone_receive_man_id_card (send_man_id_card, send_man_email, receive_man_name, receive_area_code, receive_man_phone, receive_man_id_card),
FAMILY fam_3_receive_man_address_receive_man_fax_receive_man_mobile_goods_name_num (receive_man_address, receive_man_fax, receive_man_mobile, goods_name, num),
FAMILY fam_4_receive_man_email_weight_declare_price_passenger_num_train_code_train_date_train_ticket_no_ticket_station_code_create_datetime_row_id (receive_man_email, weight, declare_price, passenger_num, train_code, train_date, train_ticket_no, ticket_station_code, create_datetime, row_id),
FAMILY fam_5_update_datetime_cre_sheet_no (update_datetime, cre_sheet_no)
)
Yesterday it has this issue ( the select count for bag_index isn’t response at this time 17:34 )
E161114 17:34:05.899731 214 storage/queue.go:568 [n1,raftlog] on [n1,s1,r115/1:/Table/59/1/"102120{1K0…-9K3…}]: failed to send RPC: sending to all 3 replicas failed; last error: rpc error: code = 4 desc = context deadline exceeded
E161114 17:37:40.921581 214 storage/queue.go:568 [n1,raftlog] on [n1,s1,r115/1:/Table/59/1/"102120{1K0…-9K3…}]: failed to send RPC: sending to all 3 replicas failed; last error: rpc error: code = 4 desc = context deadline exceeded
E161114 17:43:53.169373 629354 server/admin.go:278 namespace information_schema with ParentID 0 not found
E161114 17:43:53.170946 629669 server/admin.go:278 namespace pg_catalog with ParentID 0 not found
E161114 17:43:54.184960 621246 server/admin.go:278 namespace information_schema with ParentID 0 not found
E161114 17:43:54.192395 621247 server/admin.go:278 namespace pg_catalog with ParentID 0 not found
E161114 17:50:20.995330 214 storage/queue.go:568 [n1,raftlog] on [n1,s1,r115/1:/Table/59/1/"102120{1K0…-9K3…}]: failed to send RPC: sending to all 3 replicas failed; last error: rpc error: code = 4 desc = context deadline exceeded
E161114 17:55:11.022894 214 storage/queue.go:568 [n1,raftlog] on [n1,s1,r115/1:/Table/59/1/"102120{1K0…-9K3…}]: failed to send RPC: sending to all 3 replicas failed; last error: rpc error: code = 4 desc = context deadline exceeded
Yesterday’s graph (for version beta-20161103, till this morning select count for bag_index still no response, restart that node makes it okay now )
Provided sql dump may help to re-produce the problem
Table dump and all logs files
http://www.files.com/shared/582a7fe44d75b/2016-11-15_dump_and_logs.zip