Import into 4 node cluster fails - v1.2-alpha.20171026

A few smaller tables succeeded but then:

drop table if exists history;
DROP TABLE

Time: 127.991635ms

import TABLE history (
itemid bigint NOT NULL,
clock integer NOT NULL,
value numeric(20,4) NOT NULL,
ns integer NOT NULL
)
csv data (‘nodelocal:///home/cockroach/csv/crdb/data/history.csv’)
with temp=‘nodelocal:///home/cockroach/csv/crdb/tmp’, nullif = ‘\N’;
pq: result is ambiguous (job lease expired)
Error: pq: result is ambiguous (job lease expired)
Failed running “sql”

nodelocal:///home/cockroach/csv is on a nfs share that is mounted on all nodes.
all nodes are marked dead in the gui

the input is about 500M and contains 15730985 rows.

Good thing is that the nodes do rejoin the cluster after about 10 minutes but the import still fails on me,

any tips on this?

@ik_zelf The “result is ambiguous” error during IMPORT is a fairly poor
error message that we intend to improve. It means that the node you’re
taking to died, but that the IMPORT itself will be continued by the
cluster. You should be able to see it on the Jobs page of the admin ui and
it should still be running.

Your nodes certainly should not have died. Is there anything in the logs
that suggests why that might have happened? It’s also worth checking
whether any were killed by the OS for running out of memory. On linux you
can use “dmesg -T” to check this.

The issue where we’re tracking the error message improvement is
https://github.com/cockroachdb/cockroach/issues/19252 in case you’d like to
chime in or follow along.

Hi Dan,

dmesg -T|grep cockroach shows:

[Tue Nov  7 17:27:30 2017] cockroach invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[Tue Nov  7 17:27:30 2017] cockroach cpuset=/ mems_allowed=0
[Tue Nov  7 17:27:30 2017] CPU: 0 PID: 1056 Comm: cockroach Not tainted 3.10.0-693.el7.x86_64 #1
[Tue Nov  7 17:27:31 2017] [27035]  1001 27035   300003   191612     500        0             0 cockroach
[Tue Nov  7 17:27:31 2017] Out of memory: Kill process 27035 (cockroach) score 757 or sacrifice child
[Tue Nov  7 17:27:31 2017] Killed process 27035 (cockroach) total-vm:1200012kB, anon-rss:766448kB, file-rss:0kB, shmem-rss:0kB
[Tue Nov  7 17:50:51 2017] cockroach invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[Tue Nov  7 17:50:51 2017] cockroach cpuset=/ mems_allowed=0
[Tue Nov  7 17:50:51 2017] CPU: 0 PID: 16324 Comm: cockroach Not tainted 3.10.0-693.el7.x86_64 #1
[Tue Nov  7 17:50:52 2017] [16321]  1001 16321   295356   190957     534        0             0 cockroach
[Tue Nov  7 17:50:52 2017] Out of memory: Kill process 16321 (cockroach) score 754 or sacrifice child
[Tue Nov  7 17:50:52 2017] Killed process 16321 (cockroach) total-vm:1181424kB, anon-rss:763696kB, file-rss:132kB, shmem-rss:0kB

and this keeps repeating every 10 minutes.
The second node has the same errors, the other 2 nodes are inaccessible using ssh.

Also: the import just failed and stopped.

These vm’s have 1GB memory.

So that indicates that your process is indeed running out of memory. Our
production docs (
https://www.cockroachlabs.com/docs/stable/recommended-production-settings.html)
recommend at least 2GB of RAM. Try upping the vm size you’re using.

If the import failed, there should be an error message in the admin ui jobs
page as well as if you run “SHOW JOBS” in sql.