Questions regarding some usages

sql

#1

Hi,

  1. Is it possible to have sub-ms Latency of in-memory finely distributed query-by-primary-key tables ? Maybe even with follower reads ?
  2. I’ve seen some issues on github with concurrent updates/increment. What would be a safe value of doing updates/second that the db supports as of now for 1 particular row ? ( using same table as in 1, the updated column is in it’s own column family and not indexed)
  3. On 2, is the concurrency on the full row, single column, node ? Like if I split the counter into multiple columns+families (still single row) would it be better concurrency ?
    3.1. Can increase concurrency by doing some config like group commit (or?)?
    3.2. Can this be better with interleaving since each increment is also an insert into another table (so insert into child table, increment on parent row)?
  4. Better performance to work with autocommit from the drivers?
  5. Is it possible to get better select performance by reducing consistency or getting stale results ?
  6. Does the db early terminate when doing SELECT FROM t WHERE article_id=x AND ID IN (2000 values) LIMIT 4 ? (primary_key=(article_id, id))
  7. Can it do a select 10 articles BY ID, for each article array_agg(subquery comment ids with query above^)) ?
    7.1. Can this be better with interleaving (some child collections may grow very large but most will have < 10K values)
  8. Is the cache a normal rocksdb block-cache or some other form (serialized full row etc ?) ?
    Nodes will be on 128GB ram, 2xSSD drives, 24-cores.

Thank you,
ddorian43


(Raphael 'kena' Poss) #2

Hi ddorian!

These are very good questions, and they look like a great start for a technical conversation. I am making an attempt to answer your questions below, feel free to comment further.

Sub-ms? No. Even with follower reads. Even with in-memory storage.

Updates to the same table cell go to the same range and are resolved on a single node (the raft leader for that range). The QPS throughput in that case is mainly limited by the processing capacity of 1 cluster node. If replication is disabled, then it is only limited by that processing capacity; if replication is enabled, then network throughput and the processing capacity of the replication nodes also kicks in.

In practice with commonly available GCE hardware, you’d be able to get more than hundreds of updates on a single cell per second, if the node is not otherwise loaded with other work.

The concurrency is the range. A range is a sequence of consecutive keys, so SQL rows with primary keys close to each other will be on the same ranges. All operations going to these rows will land on the node that is raft leader for that range.

I don’t understand, can you clarify?

Again I don’t understand, can you clarify your thinking?

Yes, because this way you avoid some client/server roundtrips.

That may be true in other databases, but not with CockroachDB; I am not aware of any plans to move in that direction any time soon.

Yes. You can use EXPLAIN to verify this.

I think so. Do you mind pasting the full valid SQL query so we can check it indeed works like you intend?

I don’t understand – what can be better? The query above only uses one table. Where do you see interleaving being useful?

The main cache is the RocksDB cache indeed.


(Andrew Dona-Couch) #3

@knz has much deeper expertise than I, but I noticed one thing he didn’t mention.

It’s possible to improve performance under contention by using historical queries with AS OF SYSTEM TIME, see https://www.cockroachlabs.com/docs/stable/performance-best-practices-overview.html#use-as-of-system-time-to-decrease-conflicts-with-long-running-queries. If a long-running select conflicts with inserts this can help avoid the contention by explicitly opting in to the historical data for that particular query. This has a very limited sort of impact on performance, as far as I know it won’t in the general case improve select performance.


(Raphael 'kena' Poss) #4

Wow indeed I had completely blanked out on AS OF SYSTEM TIME. Very nice, thanks Andrew. Indeed it is a good answer to point 5.


#5

Hi @knz,

3.1. I was asking if the bottleneck of concurrent-update-single-key would be on the disk, and if group commit or some other config would make it more concurrent ?

3.2. Each increment is also an insert into another table. And interleaving both tables would make them more into 1 range, and better transaction performance. (most cases single leader compared to multiple)

  1. Something like select id, array_agg(select id from comment where article_id=article.id limit 10) from article limit 10

7.1. Can the above query be better if I put comments as child table of article ?


(Raphael 'kena' Poss) #6

I still do not understand your question, but I think I am starting to guess what this is about, see my other answer below.

I am guessing you are trying to use an increment in a table as a way to generate record IDs. If that is the case, I would recommend that you review this upcoming doc change: https://github.com/cockroachdb/docs/pull/3104

In particular, I recommend you check out these entries:

Can you review these FAQs and reply here whether they apply to your situation?

So first of all I think the proper syntax would be select id, array(select id from comment where article_id=article.id limit 10) from article limit 10.

This is a correlated subquery and is thus not yet supported by CockroachDB.

I have checked and for your use case (maximum 10 comments per article for 10 articles) even PostgreSQL would have a hard time optimizing this query.

For now I would rcommend the following instead:

select a.id, array_agg(c.id) 
  from (select * from article limit 10) a 
  join comment c 
    on a.id = c.article_id

and then restrict the size of each array on the client.

Yes once you transform the query to a join like I show above, using an interleaved table will improve the performance.


#7

I meant that doing increment on parent + insert on child should be faster since they’ll be in 1 range most of the time.

I can’t restrict the array on the client since it may be too big., but I can query it differently.