Is it possible to perform a partial conversion to UTF8 of a wrongly encoded sequence of bytes?
I’m connected in DBeaver to both MariaDB and CRDB.
If I simulate in this example that I have in a column the valid sequence of bytes (hex’ed) “68 c3 bc” I can convert and show them in DBeaver as UTF8 as follows:
- in MariaDB with:
select CONVERT(unhex('68c3bc') USING utf8)
- in CRDB with:
select convert_from( decode('68c3bc', 'hex'), 'utf8')
In both cases the result is shown in DBeaver as “hü”, which is perfect.
Unluckily my data is kind of weird, so my sequences of bytes can be often truncated at the wrong position or might just contain encoding errors => I’ll simulate this situation here by using the invalid sequence of bytes (hex’ed) “68 c3 bc c3” and I try again to convert that sequence to UTF8:
select CONVERT(unhex('68c3bcc3') USING utf8)
…the result shows in DBeaver as:
As you can see the 4th byte, which is invalid, is shown with the placeholder “?” (other invalid bytes are shown with a little char showing their hex value).
select convert_from( decode('68c3bcc3', 'hex'), 'utf8')
…that SQL returns just an error message that states:
ERROR: convert_from(): invalid byte sequence for encoding "UTF8"
Can I somehow make the conversion from bytes to UTF8 in CRDB more permissive/less strict, making it behave the same way of how the conversion in MariaDB works?
I would like to be able to see at least the partially-converted utf8-string.