How are rollbacks dealt with in a k8s deployment being rolled back?


I have read the docs about transparent schema changes etc. However I am interested to know what happens when a k8s deployment is rolled back? Naturally the schema rolls back and any data that was there is then “lost”.

What I want to know is if there is planned support for dealing more robustly/elegantly with backing out of rolling updates for nodes that have already been updated? At this point CockroachDB deals really well with deployments ultimately succeeding but not rollbacks.

See also:

You should be able to successfully roll back an upgrade, provided you haven’t finalized the upgrade by setting the node_executable_version(). Schema changes and the CRDB version are independent - you can upgrade, make a schema change, and roll back provided the schema change does not rely on features released in the new version.

Are you running into a specific issue when trying to upgrade and revert?

The thing I am trying to get my head around is say I have a v1 app with a v1 database containing a single non nullable column. The v1 app is responsible for setting the initial “default” value for the row value.

I then add a default constraint to that column as part of v2 of the DB and roll out a v2 app that expects that behavior. So how do I ensure the v2 app does not get to talk to a v1 database and vice versa.


a) How is the “interleave” between app rollout and database rollout work?
b) And then if a is rolled back half way through.

Gotcha. All statements in CRDB are ACID compliant, including schema changes. So in terms of behavior for any apps, assuming you start you alter the table at T1 to add the default and the alter job succeeds at T2, there will be no default value on all nodes until T2, and from T2 onward all nodes will have a default value on the column. If you cancel the alter job half way between T1 and T2, no nodes will have ever applied a default value to the column, and from the app’s perspective nothing will have changed.

Does that help?

Hi Tim,

That part does make sense. I am still however trying to grapple with the applications during this process. So say we have app A across three nodes a, b, c viz Aa, Ab, Ac. And the DB on node Da.

If Aa is upgraded and starts the ALTER for the default. How am I best to control Ab, Ac that are still expecting Da to be without the default? In other words how do I control making the deployment itself atomic between application and the database changing during a Kubernetes deployment?

Forgive me if I’m missing something, but in the case of adding a default value, you wouldn’t need to manage the application, correct? I.e.:

  • T0: Aa, Ab, and Ac are all filling in the value of Column1 app-side.
  • T1: ALTER TABLE to add a default value to Column1 initiated.
  • T2: ALTER TABLE completes.

At T2, your application is still (presumably) filling in the value of Column1 for all rows. You’ve just made the application more resilient if a value happened to be absent due to a bug, but continuing insert values into a column with a default will not throw an exception.

As far as the general question is concerned, it would depend a whole lot on how your app is actually deployed; there’s not a one-size-fits all solution. Broadly speaking, if you needed functionality to only change after an upgrade is complete across three nodes, then each node would need to:

  • know what version it’s running,
  • know what version other nodes are running,
  • if it’s on v2, and any node is on v1, then continue acting as if it’s on v1, and
  • if it’s on v2, and all other nodes are on v2, then act if it’s on v2.

The specifics matter a lot, though, and as I mentioned above in the case of adding a default value, there’s no need to manage versions like this; in the presence of a default value on a column, the app could provide a value for the column, or not, and it would be fine.

Hope that helps!

Thanks Tim. As I get into the specifics I will come back with more.