Case Insensitive Collations

Hello everybody,
I’ve migrated a webapp from Mysql to CockroachDB and the migration was fairly painless but I’m stuck on one issue:

With Mysql I used the utf8_unicode_ci collation, where is the option to specify case-insensitive collation on Cockroach?

It’s a huge effort to rewrite my code to always use ‘ILIKE’ and I would rather avoid that.

Cheers,
Joe

It’s not yet documented, but we support case-insensitivity using the unicode-standard (but cryptic) locale names. You can append _u_ks_level2 to most locale names, such as en_u_ks_level2 for case-insensitive english (I’m not sure which locale mysql’s utf8_unicode_ci is using as its base).

Hi Ben,
Thanks for the info, that’s exactly what I needed.

Am I correct in assuming if I use for example the en_u_k2_level2 collation that my columns will still accept non-english characters but their sorting in an index will be less than ideal?

Although I know that most of the strings I will be storing in Cockroach are in English I can’t be sure that no-one will try to insert non-english characters.

Correct. Any characters may be used; the collation only affects how comparisons work.

So I’m trying this out and I can’t seem to get it to work:

I’m creating my table like this:

create table ci_table (
    value TEXT COLLATE en_u_ks_level2
)

As I only one one column case-insensitive

But I can’t insert into it. I get the following error from the Go PG driver:

ERROR: pq: error in argument for $2: unsupported OID 705 with format code formatText

And the following error when trying to insert from Data Grip:

ERROR: value type string doesn't match type COLLATEDSTRING of column "value"

The data_type returned from

select data_type
from information_schema.columns
where table_name = 'ci_table'

Is

STRING COLLATE en_u_ks_level2

Does anyone have an example of Case Insensitive columns working with cockroach db?

What is the exact insert statement that results in that error?

We have various tests around collated strings, all these are working: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/logictest/testdata/logic_test/collatedstring_index1

insert into ci_table (value) values ('Test');

I’ve tried with parameters and with hardcoded values like above

 pq: value type string doesn't match type COLLATEDSTRING of column "value"

I get the same thing if I use the da collation as well.

Also I’m running coackroachdb in a container tag cockroachdb/cockroach:latest

You can only insert a string value that has the same collation.

insert into ci_table (value) values ('Test' collate ...);

Ah interesting, so it doesn’t implicitly convert the collation.

Hi, I see weird behaviour when using collaction(maybe bug)

code bellow return error: “pq: TODO(eisen): cannot decode collation key:” Index aa_idx is problem from what I found. Everything work fine if I don’t create index aa_idx.

create table aaa (
id UUID PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
aa STRING COLLATE en_u_ks_level2,
bb STRING(80)
);

create index aa_idx on aaa(aa);

insert into aaa (aa) values (‘aBcD’ COLLATE en_u_ks_level2);
select * from aaa where aa=(‘abcd’ COLLATE en_u_ks_level2);

And everthing work fine if I create table only with 2 columns(with index) like bellow:

create table bbb (
id UUID PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
aa STRING COLLATE en_u_ks_level2
);

create index aa_idx on bbb(aa);

insert into bbb (aa) values (‘aBcD’ COLLATE en_u_ks_level2);
select * from bbb where aa=(‘abcd’ COLLATE en_u_ks_level2);

@martinfridrich This is definitely a bug. I’ve filed https://github.com/cockroachdb/cockroach/issues/24449.

@peter Thank you. Do you have any plans/estimation about 2.0 release date?