Updating locality for Follow the Workload

Let’s say you want to take advantage of Follow the Workload.

You have a datacenter with some number of nodes in New York City and a second datacenter with some number of nodes in Sydney.

CREATE TABLE IF NOT EXISTS customer (
region STRING,
username STRING NOT NULL,
password STRING NOT NULL,
PRIMARY KEY (region_id, username)
);

Assuming you create the nodes specifying the appropriate locality, and set region appropriately when creating new customer records to “New York City” or “Sydney” (or any two distinct values that correspond to the two datacenter locations), “Follow the Workload” will get automatically utilized?

Two questions:

  1. Let’s say the leaseholder range is approximately 10,000 rows. Let’s say you have 2,000 customers, half from New York City, half from Sydney. My assumption is Follow the Workload wouldn’t work here as all customers are in a single leaseholder range. Is there a way to split customers in such a way that each leaseholder range has only a single “region” field (in this case, you’d want 1 leaseholder range with the New York City customers and a second leaseholder range with the Sydney customers).

  2. If a customer in this example moves from New York City to Syndey, what is the easiest way to preserve their Follow the Workload optimization? One idea is to create a new customer record with the same username but the Sydney region, and migrate their data (to the extent other tables are referring to the customer table as a foreign key) and then delete the original row. Any issues with this approach or is there something simpler that reduces the need to migrate data?

Thank you.

Hi,

  1. You can introduce range splits using ALTER TABLE customer SPLIT AT VALUES (region_id, username) or just VALUES (region_id).

  2. Assuming we have range splits between the regions, the customer row will indeed have to be moved from one region to the other. Unfortunately because the region is part of the primary key (which it has to be if we are to split ranges between regions), you are correct that any foreign references need to be updated.

It sounds like the feature you really need is table partitioning. This is a work in progress, but the design is in this document: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170921_sql_partitioning.md

Thanks for the quick reply.

This is really helpful. Table partitioning looks like it would take care of case 2 as you say, but I’ll get going with this in the meantime. The Follow the Workload capability will save a fair amount of time as otherwise I was considering having different databases at each locality with one central database for some aspects. This simplifies that significantly.

Thanks again.