IMPORT CSV data only to an existing table

import

#1

Hi,

I can’t seem to find a way to import CSV data into an existing table. There is a document for importing CSV data along with table creation, but there’s none for data only import.

Is this feature not available yet, or did I missed anything?

Here’s what I’ve tried so far to no avail:

database> IMPORT TABLE orders CSV DATA ('https://storage.googleapis.com/cockroach-fixtures/tpch-csv/sf-1/orders.tbl.2') WITH DELIMITER = '|';                                      
invalid syntax: statement ignored: syntax error at or near "csv"
DETAIL: source SQL:
IMPORT TABLE orders CSV DATA ('https://storage.googleapis.com/cockroach-fixtures/tpch-csv/sf-1/orders.tbl.2') WITH DELIMITER = '|';
                    ^
HINT: try \h IMPORT

database> IMPORT TABLE orders (o_orderkey,o_custkey,o_orderstatus,o_total_price,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment) CSV DATA ('https://storage.googleapis.com/cockroach-fixtures/tpch-csv/sf-1/orders.tbl.2') WITH DELIMITER = '|';
invalid syntax: statement ignored: syntax error at or near ","
DETAIL: source SQL:
IMPORT TABLE orders (o_orderkey,o_custkey,o_orderstatus,o_total_price,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment) CSV DATA ('https://storage.googleapis.com/cockroach-fixtures/tpch-csv/sf-1/orders.tbl.2') WITH DELIMITER = '|';
                               ^
HINT: try \h IMPORT

Thanks


(Jesse) #2

Hi @masai,

As you’ve seen, IMPORT only works for new tables. It’s not supported for existing tables. That needs to be clearer in the docs.

IMPORTing into existing tables is on the roadmap for a future release. In the meantime, for bulk inserting into existing tables, we recommend using multi-row INSERT statements for best performance.

Hope that helps.

Best,
Jesse


#3

Hi @jesse,

Thanks for the update. I sure hope the feature will be available in the near future because this would help us (me) to add (huge) exported data to the existing tables.


#4

Just thinking outside the square here but… Import to a new table and then do a
SELECT INTO [ExistingTable]
FROM [TempImportTable]

DROP [TempImportTable]


#5

select into [existingTable] from [TemporoaryTable] syntax does not exists
Insert into [ExistingTable] select * from [TemporaryTable] is terrible for performance. It looks like cockroachdb is actually trying to construct insert statement row by row in memory before insert. So you will hit with “command is too large” if there is too many rows as illustrated below. Even worse, this error is not reported until the full select completes (why keep selecting if the final insert command is already over the limit?)

insert into product select * from product_partial limit 10000;
INSERT 0 10000
Time: 884.680 ms

insert into product select * from product_partial limit 200000;
ERROR: command is too large: 83074012 bytes (max: 67108864)
Time: 13214.683 ms (00:13.215)

insert into product select * from product_partial limit 1000000;
ERROR: command is too large: 324523279 bytes (max: 67108864)
Time: 60946.455 ms (01:00.946)

Unless there is a way to import into existing table or optimize insert into as select, cockroachdb is not really suitable to handle our daily batch load.

Another enhancement required is IMPORT need to spit out bad records into a reject log or table instead of failing the whole thing, this will help to identify and reload bad records more efficiently. Hope all these enhancement can be made, as I do like the product.


(Raphael 'kena' Poss) #6

The correct, supported way is to use INSERT INTO … SELECT … like you did with suitable LIMIT clauses, like you did, but also WHERE clauses to select appropriate ranges that contain fewer than (e.g.) 10000 rows.

Then you’d use multiple INSERTs of 10000 rows each until you have imported the entire batch. for example:

insert into product select * from product_partial limit 10000 returning product_id;
insert into product select * from product_partial where product_id > <last_product_id> limit 10000 returning product_id;
insert into product select * from product_partial where product_id > <last_product_id2> limit 10000 returning product_id;
insert into product select * from product_partial where product_id > <last_product_id3> limit 10000 returning product_id;
-- etc until there is 0 row returned