Doubts about data read availability

Hello.

I can’t find a clear documentation about this.
What happen is a node gets isolated ? Could the local connection still be able to read the data it hosts even if it cannot connect to the rest of the cluster ? Is there a way to achieve this ?

Thanks.

Hi @kedare,

Thanks for the question. CockroachDB is a CP system (see here), so a lot of the answers to your questions fall our of that.

What happen is a node gets isolated

When a node is isolated, it won’t be able to write because it won’t be able to establish a quorum. This is necessary to maintain consistency in the presence of partitions.

Could the local connection still be able to read the data it hosts even if it cannot connect to the rest of the cluster

CockroachDB uses read leases to allow part of a replication quorum to read at the present time without coordinating with others. In order to maintain a read lease, a node must periodically make contact with other nodes. When a node is partitioned, it may or may not have an existing lease on a certain part of data (if that data has been replicated to it at all). If the node does not have a lease on the data then it won’t be able to read. This is required for correctness because the node may have missed some writes. If the node does have a lease on the data, it will temporarily be able to read the data, but once the lease expires (O(seconds)) it will stop being able to read. Again, this necessary to provide strong consistency.

However, CockroachDB does support “time travel queries”, which allows a query to read at a historical timestamp. We are currently working on a new feature called “follower reads”, which will allow time travel queries to read data even on nodes that don’t have a lease to read at the present time. One effect of this is that it will allow nodes that are partitioned off from other nodes to continue serving reads as long as those reads are sufficiently in the past (O(seconds)) such that it would be impossible for the majority partition to have affected the result of those reads.

You can read more about this in our FAQ: https://www.cockroachlabs.com/docs/v2.0/frequently-asked-questions.html and in our architecture documentation: https://www.cockroachlabs.com/docs/v2.0/architecture/replication-layer.html.

Please let me know if you have any other questions.

Hello Nathan.

Thank you for your answer.

Do you know if there is any plan to have configurable per connection/query consistency level?
I think in some case that would be something very interesting, and most of the distributed systems allow something equivalent (Cassandra/Scylla, MongoDB, etc.).

In my case I would like to be able to do a best-effort local read if the cluster is not healthy, would there be a way to achieve this? Or is there any plan to have a way to do so? (Assuming there is a local copy of the data of course). I am in a kind of workload where not being able to serve read query is the worst possible scenario and it’s better to read stale data)

I am not 100% the “follower reads” thing would allow that? As it looks like it would explicitly require to read with an “SELECT AS OF” ? Or the example of the “AS OF SYSTEM TIME STALE” would do it?

PS: Thank you for your amazing product and sorry I just saw that my first message was full of typos

Thank you.

Do you know if there is any plan to have configurable per connection/query consistency level?

No, we do not. We are pretty firmly under the belief that weaker consistency levels prioritize performance over the safety of your data. This is especially true of any database that maintains referential integrity across multiple indices, tables, and constraints, like a SQL database does. We actually wrote a blog post about this that you might be interested in - https://www.cockroachlabs.com/blog/acid-rain/.

In my case I would like to be able to do a best-effort local read if the cluster is not healthy, would there be a way to achieve this? Or is there any plan to have a way to do so? (Assuming there is a local copy of the data of course). I am in a kind of workload where not being able to serve read query is the worst possible scenario and it’s better to read stale data)

I am not 100% the “follower reads” thing would allow that? As it looks like it would explicitly require to read with an “SELECT AS OF” ? Or the example of the “AS OF SYSTEM TIME STALE” would do it?

You’re absolutely on the right track! An AS OF SYSTEM TIME STALE (or something of the sort) is how you would want to accomplish this in CockroachDB and it is on our roadmap. The important thing to note about this approach to a stale read is that it is still consistent, it is just at the latest time possible to serve a consistent read.