IMPORT csv file problem with id field


(Dave A.) #1

Greetings CRDBers!

Getting an error in IMPORT from csv file.

Lets say I have a csv that is 2 fields:
name, age
name, age
name, age

Following the instructions from the site:

IMPORT TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT,
age INT,
INDEX name_idx (name)
CSV DATA (–my local http file server–)
nullif = ‘’

I get a SQL error: Expected 3 fields, got 2.

The id field, which I am asking CDRB to create as a UUID plus the 2 fields from the file give this error.

Help would be most welcome!

Thank you.
Dave A.

(Rich Loveland) #2

Hi @DaveA,

I just ran into this issue myself when using IMPORT to pull in some CSV files.

AFAICT the issue is from the CSV-parsing side. According to the CSV parser, the file is invalid CSV, possibly due to a comma inside a field, or some other issue.

I was able to resolve the issue with my CSV files by loading them into LibreOffice Calc and then exporting them back out to disk as CSV.

I thought I had “fixed” the CSV files manually, but whatever was causing the CSV parser to barf was not visually apparent, and the IMPORT kept failing, so I had to use the spreadsheet program.

I am planning to file an issue against IMPORT to improve the CSV parser error messages - I’ll add a link to that issue here once I’ve done that.

Hope that helps!

(Raphael 'kena' Poss) #3

Hi Dave,
thanks for your interest in CockroachDB.
I have filed your feature request here:

At this point this feature is not (yet) supported by CockroachDB. There are two workarounds:

  • modify your CSV file to add a UUID column manually, or
  • import the data into a temporary table customers_tmp with only the name and age columns, then do: INSERT INTO customers (name, age) TABLE customers_tmp

The second solution should work although it will be slower if there are many rows (hundreds of thousands) in the CSV file.

(Dave A.) #4

Rich and Raphael: thank you for your responses and support, with the issue on GH.

I can tell you that I’ve done the insert into a temp table letting CRDB assign rowids and even on a laptop with 3 nodes hitting (crushing) the same disk, I am able to IMPORT more than 5 million rows of 7 column csv data in less than 1m45s. Pretty darn solid for a completely un-optimized pseudo cluster on an old laptop. This thing is going to scream in prod.

Very much looking forward to progress on the IMPORT and assign UUID issue. Thanks!

(Rich Loveland) #5

Glad to hear the IMPORT perf is meeting your needs, Dave.

As promised, here’s the issue I filed to improve the CSV parser error messages: