Setup for low multi-region latency

deployment

#1

Hi, I’m trying to understand how to achieve the best DB latency possible for a web service deployed to 3 distant geographic regions. Most workloads are regional in nature. It’s mainly organizations operating on their own isolated data.

Alternatives:

  1. The control: AWS RDS Postgres with a single master and two regional asynchronous read replicas. Reads go to regional endpoints and are quick, but writes need to reach the master. That and the sync delay make for 2x cross-region latency on an update.

  2. CRDB free. As I understand, the follow-the-workload mechanism would apply. But a range might contain data from customers in different regions and see contention for regional placement. Even as customers come online with the sun, this strikes me as a fair amount of network shuffling. Depending on where ranges are, reads and writes might take multiple cross-region latencies, at least for a while.

  3. CRDB enterprise. Table partitioning would allow row pinning to a region, but the region-aware partition key needs to be an explicit column in every table. As long as that’s satisfied and there are enough regional nodes for the replication factor, all reads and writes would be local.

Is this roughly accurate? Are there other ways to optimize latency for 2) CRDB free?

Thank you


(Rebecca Taft) #2

Hi @fabpopa,

Yes, your assessment is accurate except for your concern that follow-the-workload would cause a lot of network shuffling. Moving leases around is actually fairly cheap, so it’s not as much of a concern as you might think.

Hope this helps! Don’t hesitate to ask if you have other questions.
– Becca


#3

Hi @becca, thank you for your reply. That’s great to know.

Any plans to reconfigure ranges on the fly to better serve contention for leases? Achieving something close to row pinning without the extra location-aware column seems like a win. Though maybe I’m imagining more lease contention than actually occurs.


(Alex Robinson) #4

Can you elaborate on what you mean? It sounds like you want the system to automatically split ranges based on per-row access patterns, then place the leases for the resulting ranges intelligently?


#5

Hi @a-robinson, it’s just a thought, but yes. It seems that per-row access patters could inform how data is placed into ranges and perhaps keep range leases in a place longer. I’m not aware of a mechanism to move data to another range on-the-fly at the moment, other than range splitting for scale, and was wondering if something like this was on the horizon or even a sensible thing to do.


(Alex Robinson) #6

Theoretically, yes, but in practice getting this right requires a good deal of added overhead to track the necessary information (request frequency to each row, source of requests to each row, whether the requests are point lookups for a single row or range reads) and could lead to incredibly small ranges of just a single row if taken to the extreme. It would also lead to difficult-to-predict performance behavior.

So while it would be nice, it’s not on our near-term roadmap, at least not without a location-correlated column being at the start of the primary key.


#7

All right, thanks for the information!