Updating JSON document with value from a column

We are trying to perform a DB migration where we have some data stored in an integer column and would like to move it to a key inside a jsonb column.

CREATE TABLE test(id int, score int, metadata jsonb);
INSERT INTO test(1, 10, '{"some-key": "some-value"}'::jsonb);

SET sql_safe_updates = false;

UPDATE test SET metadata = metadata::jsonb || jsonb_build_object('score', jsonb_build_array(score, 0, 0, 0, 0, 0))::jsonb;

ALTER TABLE test DROP COLUMN score;

We have about 500k rows in the table and its taking too long to perform the above update and we were wondering if we were hitting a performance bottleneck somewhere and if yes how to circumvent that.

(P.S.: I’m not sure if its an isolated incident but we have actually ran the above query and left it running for some time - SHOW QUERIES would show it but after a random amount of time, the query just disappeared without even affecting the rows).

Hi @yashsriv,

Sorry for the delay on this. Based on what you’ve shared, the update would take a long time, and probably not even successfully commit. If we take a look at the explain statement for the update, you’ll see that it spans every row, which I assume is what you’re trying to do since you’re looking to replace every score with a jsonb value:

explain UPDATE test SET metadata = metadata::jsonb || jsonb_build_object('score', jsonb_build_array(score, 0, 0, 0, 0, 0))::jsonb;
         tree         |  field   | description
+---------------------+----------+--------------+
  count               |          |
   └── update         |          |
        │             | table    | test
        │             | set      | metadata
        │             | strategy | updater
        └── render    |          |
             └── scan |          |
                      | table    | test@primary
                      | spans    | ALL

What you could do is something along the lines of:

UPDATE test SET metadata = metadata::jsonb || jsonb_build_object('score', jsonb_build_array(score, 0, 0, 0, 0, 0))::jsonb where WHERE pk > [last_pk] ORDER BY pk LIMIT 1000

where PK stands for Primary Key, which based on your schema that you provided, hasn’t been explicitly defined. Please note, if you create a table without defining a primary key, CockroachDB uses a unique identifier for each row, which it then uses for the primary index. Because you cannot meaningfully use this unique row identifier column to filter table data, it does not offer any performance optimization. This means you will always have improved performance by defining a primary key for a table.

As for the second ALTER TABLE statement, this is rewrites the whole table it has to write one record per row, either rewriting the col-family that column was to remove it, or tombstone-ing the whole col family if it was by itself.

As for why SHOW QUERIES wouldn’t’ show the query, it’s most likely that it failed to commit, so the transaction was aborted.

Let me know if you have any other questions.

Thanks,

Hi,

I haven’t mentioned the pk or the actual table which is actually more complex than this and has a complex primary key comprising of multiple columns which aren’t ordered (think uuids).

Would it still be possible to use the order method. Also how would I obtain the last_pk after each run?

Hey @yashsriv,

It would still be possible, here’s sample table with UUID as the pk ordered by id asc:

                   id                  | first_name | last_name |            address
+--------------------------------------+------------+-----------+-------------------------------+
  00007f08-2bda-45ef-b7aa-b801f09ef619 | Jasmine    | Guerrero  | 5283 Price Path Suite 960
                                       |            |           | New Jamesside, NY 30777
  00031dc5-296c-432a-bbb3-c5e5a7c8647f | Kathy      | King      | 86612 Smith Vista Apt. 406
                                       |            |           | Jamieside, NE 66623
  0003a120-6707-457e-9946-8d7aaccbbb1b | Michael    | Gates     | 410 Hansen Parks
                                       |            |           | Susanside, FL 58931
  0003edd8-7731-4711-89fa-53ce89a9037b | Lori       | Hickman   | USNS Green
                                       |            |           | FPO AP 42287
  0004e1b5-d611-42c7-8fa2-7528063e0aed | Jessica    | Parker    | 952 Victoria Manors Suite 180
                                       |            |           | Lake Christopher, NM 05286
  000517dd-92ac-45bd-943d-845012d397f0 | Cheryl     | Lawson    | 799 Jeremy Lake Suite 457
                                       |            |           | Francesside, NM 67307
  00066618-9aba-42e0-bb0e-030fcc7816ef | Alexander  | Edwards   | 39666 Donna Rapids Suite 291
                                       |            |           | Melissachester, KS 37073
  00070590-f071-4a1a-aeef-d0368582b5c1 | Craig      | Harris    | USNS Lin
                                       |            |           | FPO AP 31807
  00072933-e219-45a5-9f30-59ecddfa0d31 | Nicholas   | Harris    | 3889 Hanna Loop
                                       |            |           | Marshallmouth, NC 69337
  00080453-b7b1-4142-9d78-4246c3b3e3c1 | Amber      | Barber    | 6683 Smith Mission
                                       |            |           | Bullockfurt, IN 74877

As for obtaining the last_pk, you could use RETURNING and pass the result to the next update statement.

You can read more about RETURNING here.

Thanks,

Ron