I have tried to perform a batch insert into CRDB using…
INSERT INTO [table_name] (cols) VALUES (v1), (v2), … ;
Data is uploaded @ https://ufile.io/w5c6m (the file is 1.4 MB of text data containing 10k rows in a batch).
CREATE TABLE product_order_table ( country STRING, region STRING, order_id INTEGER, id UUID DEFAULT gen_random_uuid(), order_date DATE, ship_date DATE, sales_channel STRING, order_priority STRING, item_type STRING, units_sold INTEGER, unit_price FLOAT4, unit_cost FLOAT4, total_revenue FLOAT8, total_cost FLOAT8, total_profit FLOAT8, PRIMARY KEY(country, order_id, id) )
Time: 879.121506ms . <--- time from CRDB real 8m55.326s user 12m9.521s . <-- time measured from shell. sys 0m8.550s /tmp/product_order/part-00001
The following is what I did to write to CRDB.
time ./cockroach sql --insecure --database=sales < $file
- There seems to quite a bit of overhead time before the insert operation. I am not sure about the 12 minutes of wait time. I do understand that file read takes a bit of time. But this seem quite a bit.
- I did notice that if the same data when written to an interleaving table, seem atleast 10x slower.
To improve performance for interleaving tables, should the insert statements be batched per parent table key?