I’m completely new to CockroachDB (read docs today for the first time) and MariaDB+MyRocks (trying to use it since a couple of weeks).
I read CockroachDB’s doc about Column Families (“CF”) here but I’m not sure if what I understood is correct.
On one hand, in the case of MyRocks/MariaDB, what I understood is that…
- …a primary index always stores the data of all table columns (doesn’t matter if they’re part of the PK or not - mentioned here) => therefore, any DML will always end up touching/modifying that primary index involving all data stored in all the table’s columns, right?
By doing a parallel with the Oracle DB, the MyRocks/MariaDB primary index seems similar to an “Index Organized Table”?
- …a CF can be defined only per index (mentioned here).
On the other hand, in the case of CockroachDB, I understand from what is written in the doc that the CF of the primary index does not (have to) include all non-primary-index-columns (only the ones explicitly mentioned). Am I wrong?
I have a table that stores temporary data.
The table has multiple columns:
the first col uniquely identifies a specific URL, the other cols store the result of the work that is done at each processing step.
Separate programs (that might or might not run concurrently) are responsible to process data for their individual step.
Once all steps for a URL/document have been performed the row is deleted.
Having separate cols for each processing step simplify debugging if anything goes wrong (“aha - data looked like this during processing step #1, then it looked like that during procstep #2, then it became like that during procstep #3”).
The table definition in my MariaDB/MyRocks currently basically looks like this:
create table temp_data_tbl ( url varbinary(1024) not null ,step1 mediumblob not null ,step2 mediumblob default null ,step3 mediumblob default null ,PRIMARY KEY (url) ) ENGINE=RocksDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
- When program1 runs it will create the row and add data to the cols “url” (max 1KB) and “step1” (max 4 MBs).
- When program2 runs it will update the table by adding data to the col “step2” (max 4 MBs).
- When program3 runs it will update the table by adding data to the col “step3” (max 1.5 MBs).
- When program4 runs it will delete the row (if it has processed it successfully).
If I understood the documentation of CockroachDB correctly, if I would use…
- a CF for “url”+“step1”
- another different CF for “step2”
- and as well another different CF for “step3”
…then when e.g. “program2” would update the contents of the col “step2” only the CF of “step2” would be updated (involving a tombstone for the old version of the row a the new version of the row added - and would have to go sometimes later through the compaction phase) while all other CFs would be left untouched? (and the same would be true in relation to “program3” & the col “step3”)
I’m trying to write an app and until a few months ago I was happy with MariaDB+TokuDB.
Unluckily TokuDB will be dropped by MariaDB&PerconaDB => trying to find a replacement:
- excluded MariaDB’s InnoDB & Columnstore (multiple reasons).
- already using Clickhouse for another part of the workload but it’s currently not too ideal to deal with this kind of table (feels like “too many hacks” for frequent DML ops (might change in the future as they might implement kind-of-atomic update/delete ops).
- struggling with this workload with MyRocks on MariaDB as compactions, especially when dealing with “step3” of the above example, do take a long time (a lot of I/O) - my wild guess is that during those compaction phases the DB is dealing as well with the data stored in all other columns and not only with the data stored in “step3”.