Import from csv fails on NULL data for int types

(Ronald) #1

I am trying to get migrate a real application from postgres to crdb. There are columns that are defined as integer and NULL, that allow NULL values to appear. I created a dump file using the postgres COPY to csv file:
“COPY $SCHEMA.$TBL TO STDOUT WITH CSV” $DB > $TBL.csv

during import I get:
pq: nodelocal:///Users/ronr/Documents/werk/zabbix/data/hosts.csv: row 4: parse “proxy_hostid” as INT: could not parse “” as type int: strconv.ParseInt: parsing “”: invalid syntax

drop table if exists hosts;

import TABLE hosts (
hostid bigint NOT NULL,
proxy_hostid bigint NULL,
host varchar(128) NOT NULL,
status integer NOT NULL,
disable_until integer NOT NULL,
error varchar(2048) NOT NULL,
available integer NOT NULL,
errors_from integer NOT NULL,
lastaccess integer NOT NULL,
ipmi_authtype integer NOT NULL,
ipmi_privilege integer NOT NULL,
ipmi_username varchar(16) NOT NULL,
ipmi_password varchar(20) NOT NULL,
ipmi_disable_until integer NOT NULL,
ipmi_available integer NOT NULL,
snmp_disable_until integer NOT NULL,
snmp_available integer NOT NULL,
maintenanceid bigint NULL,
maintenance_status integer NOT NULL,
maintenance_type integer NOT NULL,
maintenance_from integer NOT NULL,
ipmi_errors_from integer NOT NULL,
snmp_errors_from integer NOT NULL,
ipmi_error varchar(2048) NOT NULL,
snmp_error varchar(2048) NOT NULL,
jmx_disable_until integer NOT NULL,
jmx_available integer NOT NULL,
jmx_errors_from integer NOT NULL,
jmx_error varchar(2048) NOT NULL,
name varchar(128) NOT NULL,
flags integer NOT NULL,
templateid bigint NULL,
description text NOT NULL,
tls_connect integer NOT NULL,
tls_accept integer NOT NULL,
tls_issuer varchar(1024) NOT NULL,
tls_subject varchar(1024) NOT NULL,
tls_psk_identity varchar(128) NOT NULL,
tls_psk varchar(512) NOT NULL,
PRIMARY KEY (hostid)
)
csv data (‘nodelocal:///Users/ronr/Documents/werk/zabbix/data/hosts.csv’)
with temp=‘nodelocal:///tmp/tmp.txt’;

BTW: using crdb 1.1.1 on osx
Can this be fixed? This is just one of the many tables.

0 Likes

(Tobias Schottdorf) #2

Hi Ronald, thanks for your report! I’m taking a look at this. Will check back in shortly.

0 Likes

(Tobias Schottdorf) #3

Could you try using the nullif option? That looks like it does what you’d like.

https://www.cockroachlabs.com/docs/stable/import.html#nullif

0 Likes

(Ronald) #4

Thanks Tobias,

I also already tried using the nullif clause but that maken things even worse. In postgres an empty string is not null. Some tables have empty strings and NOT NULL for some columns,

0 Likes

(Ronald) #5

here is a row from the input csv:
10144,,Template SSL Cert Check External,3,0,"",0,0,0,0,2,"","",0,0,0,0,,0,0,0,0,0,"","",0,0,0,"",Template SSL Cert Check External,0,,"",1,1,"","","",""

The second column is a number (empty)
The sixth column is an empty string (quoted)

I think they should be handled differently by the import. NULLIF would work OK for this number but not for the empty strings.

0 Likes

(Tobias Schottdorf) #6

That’s a good point. I condensed this schema a little,

create table foo2 (id int primary key not null, proxy_hostid bigint NULL, nullstr varchar null, notnullstr varchar not null);
insert into foo2 values(1, NULL, NULL, '');
COPY foo2 TO STDOUT WITH CSV;
1,,,""

and it seems that what should happen is that nullif distinguishes the empty string from the empty quoted string and recognizes the former as empty but not the latter.

@mjibson, could you take a look?

0 Likes

(Tobias Schottdorf) #7

To complete this:

import table foo.foo3(id int primary key not null, proxy_hostid bigint NULL, nullstr varchar null, notnullstr varchar not null) csv data('nodelocal:///Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/import.csv') with temp='nodelocal:///tmp/foo', into_db='foo', nullif='';
pq: generate insert row: nodelocal:///Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/import.csv: row 1: null value in column "notnullstr" violates not-null constraint
0 Likes

(Matt Jibson) #8

I think the CRDB behavior is correct and probably shouldn’t change. It would be fairly difficult to change, anyway, since the CSV format is well defined, and a quoted empty string is the same as an empty string. Also, other users could reasonably expect that an empty string is empty either way, so the nullif option should work equivalently.

However, I do have a workaround for you: use the NULL option in your COPY command that produces the data. It is documented here: https://www.postgresql.org/docs/current/static/sql-copy.html. That should export the null ints as something specific, but allow keeping empty strings as strings.

0 Likes

(Ronald) #9

And even more reasons to prefer something like fdw to get data from other databases. Messing with files is pretty awkward.

0 Likes

#10

@mjibson
I also encounter this problem when using benchmarksql generate tpc-c csv data file, and use import grammer to import the csv file.
maybe crdb should compare with postgresql for how to handle the NULL value when importing csv file

0 Likes

(Ronald) #11

Well, I got this running using nullif and the copy NULL as '\N’
Now keep fingers crossed and hope that \N is not in the data … more and more reasons to prefer FDW as a means to access foreign data. Files are just so …

0 Likes

(Matt Jibson) #12

Thanks for bringing this up more. I now agree this is a bug and not just an oddity.

I was under the incorrect assumption that postgres would parse non-quoted and quoted empty strings identically, but that was wrong. During copy in only the non-quoted string becomes a NULL, while a quoted empty string becomes an empty string. I’ve opened https://github.com/cockroachdb/cockroach/issues/19743 to track this.

0 Likes