Decommission problem

Hello,
I decommissioned 3 nodes this way…
cockroach node decommission 10 --certs-dir=certs --host=

I watched them drain, etc. and they now show decommissioned. All looked ok until I tried to run my java benchmark program. and it started throwing tons of these exceptions…

org.postgresql.util.PSQLException: ERROR: result is ambiguous (error=unable to dial n11: breaker open [exhausted])

The nodes in the exceptions are the three decommisioned nodes. The cockroach log on the remaining 9 nodes are also showing repeated messages trying to reach the decommisioned nodes, but the main problem is the SQL exceptions with the strange mention of these nodes.

I was perhaps too ambitious, this is 19.2 beta.
Thanks
Allen

1 Like

Hey @aherndon

Are you leveraging some kind of load balancer in front of your Cockroach cluster, or is the program set to run to a specific gateway or node? Also, have you tried to do this same scenario on a stable version of CRDB, since you mentioned you ran this on the beta? I would be interested in hearing more about how the program is set to reach out to the cluster. Any details you could provide could shed some light on this issue.

Let me know if its possible to share that info here, or you can reach out to me privately via my email ricardo@cockroachlabs.com.

Cheers,
Ricardo

If I connect with a list of hosts or to my haproxy host (which is not configured to use the decommissioned hosts anymore) I get the same exceptions back to my client app. I plan to revert to 19.1.5 and will retry the decommission test.

Thanks

Allen

Hello @aherndon

Before reverting to 19.1.5, please let me know if it is possible to get a copy of the logs from the cluster where the nodes were decommissioned? You can open a ticket with us at support.cockroachlabs.com, and when creating a ticket the option to attach a file will be available. Also, please clarify where you were watching the nodes drain when you said:

I watched them drain, etc. and they now show decommissioned. All looked ok until I tried...

Where did the nodes appear decommissioned? Are you referring to some place in the AdminUI?

Feel free to open the ticket, and let me know when the logs have been uploaded. If you have any questions, feel free to let us know.