Context canceled error after a few mins waiting for query result

bug

#1

Hey guys,

So I’ve been playing around with cockroach db, and followed your excellent guide on setting up a 3 node kubernetes cluster on google kubernetes engine. (As an aside, please add some steps to the guide on how to setup an ingress / load balancer to expose the database. There’s a lot of options on how to expose it - would be great to get an ‘official’ way from you guys).

Anyway…

Issue:

I ran this query in the built in sql client with 'kubectl exec -it cockroachdb-client-secure -- ./cockroach sql... on a table containing 10 million rows.

select tenant_id,event_id from (select tenant_id,event_id,count(1) as total from public.events2 group by 1,2) a where total>1;

After a few mins, I get this error:
pq: communication error: rpc error: code = Canceled desc = context canceled

I’m able to run other queries so I suspect the query is either crashing a node, or it’s being killed for some other reason. I suspect the error is because this query is highly unoptimized, none of the fields are in an index so the database will have to scan through every row.

I’m migrating event log data from a partitioned table in Postgres. Since PG doesn’t do primary key integrity checks across partitions, there are a few records that are duplicates. So here’s what I tried:

  1. I imported the csv data into a temp table with no primary key defined
  2. Run above query to identify what the duplicates are and delete them
  3. Recreate the table with a primary key and do batch inserts (since a large insert into… doesn’t work on crdb and I can only declare primary keys on table creation)

#2 failed, so I instead spun up a temporary postgres instance, did the above, and exported a CSV without the duplicates and imported it again into cockroachdb.

The above sounds like a bug to me. A smaller postgres instance didn’t need long to compute it. Please let me know if i can furnish more details / logs or if I made a mistake somewhere.

Since it said rpc error, I spun up a new google compute engine VM and tried the same thing there to eliminate any network connectivity issues from my local computer. No luck.

Shahram


(Ron Arévalo) #2

Hey @shrumm,

That context cancelled error isn’t super clear on our end, can you send over your DDL so we can test this out on our end.

You can send the files over to ron@cockroachlabs.com

Thanks,

Ron