Error message "Command is too large" when performing "insert"

Hi all

Problem:
I’m trying to transfer all data from a source table to a target table (which is empty).
When I execute an SQL similar to “insert into X select * from Y” I constantly get the following error message:

ERROR: command is too large: 69082049 bytes (max: 67108864)

My source table has ~130 million rows, total file size as exported as CSV uncompressed is ~40GBs.

Question:
I did use the SQL statement “IMPORT TABLE…” to initially import all values into a (temporary for me) table, and that worked fine.
How can I copy all that data internally within CRDB from table “A” (all data) to table “B” (empty) without running into the above mentioned error?
(the source & trgt tables would not be identical → I have to perform some transformations on some columns)

Thank you

I think this old issue may be relevant.

Can you try CREATE TABLE AS instead of INSERT INTO.

If that doesn’t work, I would recommend batching the inserts.

Thank you! :+1:
I’ll try to execute that tomorrow
(currently my DB seems to be very busy rolling back the previously failed insert which seems to need at least twice the amount of time needed to perform the initial failed insert and I’m too tired to dump once more the whole DB and then reload all data and retry the data transfer)

So, tried twice with “CREATE TABLE AS” but it didn’t work - I start CRDB with the parameters…

--cache=4GiB \
--max-sql-memory=2GiB \

…but when I execute “CREATE TABLE AS” the CRDB-process starts grabbing very quickly all available 16GiB RAM from the VM, then the VM starts using the swap for a few GiB, and at the end the CRDB-process is killed.

As a workaround I ended up using the “IMPORT TABLE”-statement (re-importing from a MariaDB-export).

Question 1:
After the end of the execution of the “IMPORT TABLE”-statement I end up having in the filesystem ~6700 files that use a total of 16GiB, all “*.sst”-files being max 5MiB big → is it normal for the files to be that small?
I think that sometimes during the previous days I read somewhere that the files should be ~60MiB big (maybe even bigger) → is the sst file size something that I somehow have to tune/set somewhere or is maybe the small size a consequence of using “IMPORT TABLE”?

Question 2:
In your online documentation, in the menu list on the left, if I scroll to “CockroachDB → Reference → SQL → SQL Statements” I see that the entry “EXPORT” has next to it “(Enterprise)”.
Does that mean that the “EXPORT”-statement is available only if the Enterprise license is active? I don’t see this mentioned in the Enterprise Features section… .

Thank you :slight_smile:

Hi Stefano! Great questions.

Question 2:
In your online documentation, in the menu list on the left, if I scroll to “ CockroachDB → Reference → SQL → SQL Statements ” I see that the entry “ EXPORT ” has next to it “ (Enterprise) ”.
Does that mean that the “EXPORT”-statement is available only if the Enterprise license is active? I don’t see this mentioned in the Enterprise Features section… .

As of 20.2 EXPORT no longer requires an enterprise license. We will remove “(Enterprise)” next to export on the the sidebar since it’s no longer an enterprise feature as well as clarify some of the text on that page. Thanks for pointing this out!

I’ll follow up with an answer for question one.

1 Like

Question 1:
After the end of the execution of the “IMPORT TABLE”-statement I end up having in the filesystem ~6700 files that use a total of 16GiB, all “*.sst”-files being max 5MiB big → is it normal for the files to be that small?
I think that sometimes during the previous days I read somewhere that the files should be ~60MiB big (maybe even bigger) → is the sst file size something that I somehow have to tune/set somewhere or is maybe the small size a consequence of using “IMPORT TABLE”?

Pebble will compact the files into larger files over time, but if you want to try to reduce the file count, you could adjust kv.bulk_ingest.batch_size upwards. kv.bulk_ingest.batch_size defaults to 16MiB before compression, so we often end up seeing files that are more like 2MiB post-compression. If you’re looking at the file on your filesystem in Pebble, that’s after compression.

If you want to experiment with different settings, we’d suggest bumping kv.bulk_ingest.batch_size up to 64MibB to see if you like that any better.

1 Like

Thanks a lot to all of you for being always very helpful and for the quick (and relevant) answers :+1:

Concerning the file count vs. file size:
thanks a lot for those informations.

I honestly don’t know what would be “right” in the context of CRDB - I was just wondering about that, as ~usually having so many small files would mean that something is not working correctly → I was wondering if that was going to be a permanent situation.

I’m absolutely fine if, as you wrote…

Pebble will compact the files into larger files over time

… → it would be good to mention that behaviour in the docs (maybe mentioning those settings coupled with some recommendations).

I didn’t have problems during the load, so I won’t dare to touch those settings :stuck_out_tongue:, but I might change them in the future if I end up having a lot more data to be loaded (to avoid causing problems on the OS/filesystem-level).

Concerning the documentation in relation to “EXPORT”:
Thanks a lot - that kind of limitation would have been a NO-GO in my case (I would definitely get an enterprise license to at least get priority investigations&fixes for bugs if my project would end up being successful & generating money, but not being able to export data would exclude this DB as a candidate a priori).

Cheers :grinning:

Thanks Stefano! I’ve passed your feedback along to the docs team :slight_smile: