German umlaut characters saved as escaped UTF-8

I am converting my PHP and PostgreSQL based business simulation to CockroachDB. I observed something interesting.
When I insert a row into a table with the following command

insert into tlizeinst values (0,  'LOCDEV', 'Local development',   array['Hölderlinstrasse 13','na','na','na'], 'na', 'na', 0, 'na', 'na');

The German character “ö” is stored in the database as \u00F6. This is visible when I use the command

SELECT * FROM tlizeinst;
| isn |  iden  |     instname      |                    adre                     | plz | ort | tlizelaen_isn | emaildomadoze | emaildomastud |
+-----+--------+-------------------+---------------------------------------------+-----+-----+---------------+---------------+---------------+
|   0 | LOCDEV | Local development | {"H\u00F6lderlinstrasse 13","na","na","na"} | na  | na  |             0 | na            | na            |
+-----+--------+-------------------+---------------------------------------------+-----+-----+---------------+---------------+---------------+

This happens regardless of whether I enter the row as an SQL command within the CochroachDB SQL environment or use PHP psql.
I see nothing wrong in storing the German umlaut characters as escaped UTF-8.
My question is: Why doesn’t CockroachDB return the German umlaut character when I read from the database?

What you see here is a limitation of the textual representation of an array in the cockroach sql output in version 1.0 and 1.1.

However, the umlaut is properly stored and handled by CockroachDB, as demonstrated by the following query:

SELECT adre[1] FROM tlizeinst;
+---------------------+
|       adre[1]       |
+---------------------+
| Hölderlinstrasse 13 |
+---------------------+
(1 row)

This limitation (conversion of array to string to print out in the client app) has been lifted in CockroachDB 2.0, with the same data in CockroachDB 2.0 you would see:

SELECT * FROM tlizeinst;
| isn |  iden  |     instname      |                    adre                     | plz | ort | tlizelaen_isn | emaildomadoze | emaildomastud |
+-----+--------+-------------------+---------------------------------------------+-----+-----+---------------+---------------+---------------+
|   0 | LOCDEV | Local development | {"Hölderlinstrasse 13","na","na","na"}      | na  | na  |             0 | na            | na            |
+-----+--------+-------------------+---------------------------------------------+-----+-----+---------------+---------------+---------------+

Does this help?

Thanks, Raphael!

It is good to know that this is a small limitation in the current version and that the limitation would be removed in the upcoming V.2.

The problem is that V.2 is now only in the Alpha stage and may be available in a production version some months down the line.

In the meantime I have found a workaround in PHP. I am able to convert the escaped UTF-8 code back to the German umlaut characters with the following code snippet:

$columnValue = preg_replace_callback(’/\\u([0-9a-fA-F]{4})/’, function ($match) {

return mb_convert_encoding(pack(‘H*’, $match[1]), ‘UTF-8’, ‘UCS-2BE’);

}, $columnValue);

This also works for Romanian characters and hence should work for all non-English language characters.

Ashok

knz
Raphael ‘kena’ Poss
Roacher

    February 27

What you see here is a limitation of the textual representation of an array in the cockroach sql output in version 1.0 and 1.1.

However, the umlaut is properly stored and handled by CockroachDB, as demonstrated by the following query:

> SELECT adre[1] FROM tlizeinst;
> +---------------------+
> > adre[1] |
> +---------------------+
> > Hölderlinstrasse 13 |
> +---------------------+
> (1 row)

This limitation (conversion of array to string to print out in the client app) has been lifted in CockroachDB 2.0, with the same data in CockroachDB 2.0 you would see:

> SELECT * FROM tlizeinst;
> > isn | iden | instname | adre | plz | ort | tlizelaen_isn | emaildomadoze | emaildomastud |
> +-----+--------+-------------------+---------------------------------------------+-----+-----+---------------+---------------+---------------+
> > 0 | LOCDEV | Local development | {"Hölderlinstrasse 13","na","na","na"} | na | na | 0 | na | na |
> +-----+--------+-------------------+---------------------------------------------+-----+-----+---------------+---------------+---------------+