Error when importing data from cockroach dump output

(Jeremy Doupe) #1

When trying to import records that contain “large” data, I’m experiencing the following errors:
(1) Using IMPORT PGDUMP syntax:

Failed running "sql"```
(2) Using `cockroach sql < dump.sql`:
```driver: bad connection Error: driver: bad connection
Failed running "sql"```

It appears if any `INSERT` statement is over 16MB in length, it will fail with the above error.

It would seem a valid fix would be to be able to provide a number of values to batch per insert when performing the dump, or even have `dump` itself detect the length and batch appropriately - even if it is extremely wasteful by doing something like `floor(100,16MB/maxRowLength)`. Of course 100 may be too small for some datasets and being able to set this larger would also help these datasets get loaded faster.
0 Likes

(Ron Arévalo) #2

Hey @jdoupe,

Could you send over the dump file that you’re trying to import? You can email it directly to me here.

Thanks,

Ron

0 Likes

(Ron Arévalo) #3

Hey @jdoupe,

Just wanted to continue this back on the forum so we had a record of it for future issue.

For the second part of your issue, importing from the command line, it appears that you are running into this issue #25524. Which you seemed to have already commented on as well. I’ve followed up on that issue to see if we can get someone to look in on it now that we were able to reproduce on our end.

Regarding the IMPORT PGDUMP, we created a new github issue that you can follow here, however as a workaround, I would suggest splitting the INSERTS into multiple files. I was able to run IMPORT PGDUMP with only 1 insert successfully, it may be able to handle more than 1, that is just what I know worked.

Let me know if you have any other questions.

Thanks,

Ron

0 Likes

(Ron Arévalo) #4

@jdoupe,

Just wanted to follow up, we actually allow you to control the the max_row_size for imports.

You can read the documentation on how to change that here.

Setting it to 25MB worked for me.

Just wanted to update this response, seems like this still will not work even when changing the above setting, we’ve filed the issue and are looking into it.

Thanks,

Ron

0 Likes

(Ron Arévalo) #5

Hey @jdoupe,

After some digging into this by our engineers, setting the max_row_size to 25MB, should be fine.

One caveat is that there are some duplicate values that cause this error:

pq: SST creation error at /Table/55/1/"\x04\xdei0\b\x83L\b\x8e\xf4\xaa\xba\xa9\xbaR\xd7"/"1"/0; this can happen when a primary or unique index has duplicate keys: Invalid argument: Keys must be added in order

This means there’s two rows with the same key in a primary key (id, version).

The two values are:

(04de6930-0883-4c08-8ef4-aabaa9ba52d7, 1) shows up twice
(096b1ee8-c6a7-4856-aef4-5f04fddb1387, 1) shows up 6 times.

Thanks,

Ron

0 Likes

(Jeremy Doupe) #6

Update: The max_row_size setting does indeed fix this particular issue.

Thanks!

0 Likes