What's our story on compatibility between versions?

Heya, got a policy question for y’all.

  1. Say I want to supplant a field from an ErrorDetail inside a pErr to a more general message. This would break mixed-version clusters, as old nodes wouldn’t understand errors from new ones and vice-versa. I can populate both fields for the time being to work around this, but the question is how long are we supposed to maintain this? Just for one beta cycle? So is it OK to support upgrades from ver B to B+1, but not from B to B+2?

  2. The specific field in question is NotLeaseHolderError.LeaseHolder (the replica to which a request should be redirected), which I want to upgrade to a full Lease field (so I want to pass full information about the current lease in the error, as opposed to just the lease holder). This field is already optional in the proto and the code can deal with it missing in errors, but with some performance penalty in case a lease holder moves (the DistSender wouldn’t update the LeaseHolderCache on these errors). With this in mind, one options is to just screw old clients in order to save a followup code cleanup.
    So the question here is, should we say it’s OK for clusters undergoing an upgrade to suffer some degree of degraded performance while the upgrade is ongoing? In other words, saying that one is supposed to finish a beta upgrade in a timely fashion after she’s started doing it, as opposed to being fine with mixed clusters indefinitely.

@bdarnell @tobias

Our current policy is to strongly encourage but not strictly require
network-level compatibility for clusters that are temporarily running a mix
of two versions. (The encouragement is basically to serve as practice; we
can’t be strict about it until proposer-evaluated KV). Instead of removing
compatibility code after one beta cycle, we’ll probably leave things in
until we can remove a bunch en masse when we increase our minimum baseline.

Some performance degradation during upgrades is acceptable at this point,
and since older versions already gracefully handle the absence of a hint in
NotLeaseHolderError, just making the move without a backwards-compatible
field is OK with me.