Is it (or will it be) possible to set a cluster to read-only?


(Pól Ua Laoínecháin) #1

Hi all,

My use case is kind of special - it involves genomic data which doesn’t normally change frequently. Many genomic systems (InterMine, Flybase) use PostgreSQL and many are very happy with it - also SDB (Saccharomyces DB) and MGB (Mouse Genome dB) recently moved from Oracle to PostgreSQL (a trend that I would like to see accelerating - and from there ultimately to CockroachDB :grinning:).

Now, I’m interested in CockroachDB for its scaling but am concerned about performance. Our system is updated approx. every 2 months or so which leads me to my question.

Is it possible to set tables and/or the entire db to be read-only to avoid/reduce inter-node communication traffic during “normal service” - we could then turn this off for updates during a maintenance window?

Pól…


(Tim O'Brien) #2

Hey @Dragam,

We don’t have a “read only” feature for the database, but there might be other ways to accomplish what you’re looking for. The databases should really only be sending a fair amount of data between nodes if a node goes down. It’d be helpful to know what issues you see with leaving the database at rest as is.

Thanks,

Tim


(Pól Ua Laoínecháin) #3

Blockquote: It’d be helpful to know what issues you see with leaving the database at rest as is.

Hi Tim, and thanks for getting back to me.

I think I see now that my question was based on a misunderstanding (at least as I understand things now after having read a little more - you can’t beat the old RTFM… :slight_smile: )!

I was thinking that if I have 3 nodes in my cluster and a load of data (not being updated, just queried) that there might still be a lot of traffic between nodes, but now, AIUI, if there’s no write traffic, the system “at rest” (in the writing sense) will have very little work to do in terms of inter-node traffic?

I suppose, I’m still wondering what sort of overhead there is associated with inter-node traffic, but I think it should be minimal? Any clear references/URLs/whatever showing this would be appreciated.

Rgs,

Pól…


(Tim O'Brien) #4

Hey @Dragam,

In 2.0, reads are executed by the leaseholder for a given range. Say you have a three node cluster with three ranges, each node is a leaseholder for one range:

n1: leaseholder r1
n2: leaseholder r2
n3: leaseholder r3

If a read for r1 was made to n3, n3 would send a request to n1 for the read, and n1 would send it back to n3. Given that the leaseholder is also the executor of any writes, this ensures that we don’t accidentally read stale data.

If you want to avoid unnecessary network traffic, there are a couple options. In 2.1, we’ll allow follower reads, which will allow followers to perform historical reads (i.e.: queries using AS OF SYSTEM TIME). If you set your replication factor equal to the number of nodes and read as of a second ago, you’d ensure a) every node has a copy of the data, and b) any node can execute the read without making a call to another node. There will still be some chatter between nodes, especially if a node becomes unavailable for a time, but I’d certainly expect less chatter.

If you want to test this out, the feature’s available in the most recent 2.1 beta.

Best,

Tim


(Tim O'Brien) #5

Hey @Dragam, just following up here since I found out follower reads have not been fully implemented in 2.1 - I misunderstood the scope of the RFC and PR.

Given that follower reads won’t be available, you still have some options - partitioning the table may help if there’s a defined access path. I’d recommend running through our performance tuning tutorial to get familiar with some common methods that might help.

Best,

Tim