Implementing the HBase data model on CockroachDB

sql
(Philippe Laflamme) #1

Hi,

We’re investigating replacing some of smaller HBase workloads with CockroachDB.
The HBase data model is very similar to the underlying CDB data model, so we’re trying to implement the HBase model on top of CDB with the hopes of minimal impact on performance and the obvious added benefit of things like consistent secondary indexing.

At this point, we’re able to replicate the HBase model using a table structure like the following:

CREATE TABLE kv(
  key BYTES,
  qualifier BYTES,
  timestamp TIMESTAMP,
  value BYTES,
  PRIMARY KEY (key,qualifier,timestamp DESC)
);

This would map to a single HBase column family.
Different families would map to different tables (this creates some challenges, but would probably be fine for our current use-cases).

HBase allows multiple versions of the same value through the use of different timestamps.
When reading data out of HBase, the client may specify how many versions it wants to read back (usually this is 1, the latest value).
We’ve worked out that the following SQL statement provides those semantics:

SELECT
  row_number() OVER w AS version,
  key,
  qualifier,
  timestamp,
  value
FROM kv
WHERE
  version <= n AND key = xxx
WINDOW w AS (PARTITION BY key,qualifier ORDER BY timestamp DESC)

The WINDOW is partitioning over the table’s primary key, so, in theory, this should be a simple scan + filter.
This seems to be confirmed by EXPLAIN (though the window node is a bit opaque):

       tree      | field |  description   
+----------------+-------+---------------+
  window         |       |                
   └── render    |       |                
        └── scan |       |                
                 | table | kv@primary  
                 | spans | ALL

This seems fine, but for the scanning use case (scanning over a key range), we also need things to be returned in primary key order.
This can be fixed by adding an ORDER BY clause:

SELECT
  row_number() OVER w AS version,
  key,
  qualifier,
  timestamp,
  value
FROM kv
WHERE
  version <= n AND key >= xxx AND key < yyy
WINDOW w AS (PARTITION BY key,qualifier ORDER BY timestamp DESC)
ORDER BY PRIMARY KEY kv

Unfortunately, this introduces a global sort in the plan:

         tree         | field |         description           
+---------------------+-------+------------------------------+
  sort                |       |                               
   │                  | order | +key,+qualifier,-"timestamp"  
   └── window         |       |                               
        └── render    |       |                               
             └── scan |       |                               
                      | table | kv@primary                    
                      | spans | ALL                           

This global sorting shouldn’t be necessary since all we’re doing is scanning the primary key.
Even in distributed mode, it should be possible to return rows in primary key order without resorting to a global sort since every node should be simply be scanning the primary key.

Perhaps the WINDOW statement preventing the planner to figure this out?
It seems like it should be possible to implement this as a simple scan of the primary key, but the SQL statement is getting in the way somewhat.
Perhaps I’m missing something that is making this hard / impossible?

0 Likes

(Tim O'Brien) #2

Hey @Phil - it looks like the query planner is not currently optimized for this use case. Our window processor doesn’t guarantee preservation of the ordering, necessitating the final sort. FWIW, postgres’s doesn’t either.

It’d be helpful to know what the impact is here.

0 Likes

(Philippe Laflamme) #3

@tim-o Thanks for pointing this out.

We’re not at the point where we’ve actually moved any workloads, but at a high level, the impact of introducing a global sort will be large for access patterns that cover large portions of the key space.

For example, a common access pattern in HBase is scanning. This is a cheap operation even on large key ranges since it is implemented as a scan + filtering (no sorting involved).

To minimize the cost of migrating, we’d like to guarantee a similar performance profiles (the absolute performance will be different, but its distribution should be similar). This global sort will cause the profile to be wildly different and makes it hard for us to reason about and eventually deal with.

Is having order-preserving window processor a feature on the horizon? Or, alternatively, is there another way to structure this query that would avoid this global sort? For example, providing hints or otherwise “forcing” the physical plan to use a primary key scan.

0 Likes

(Ron Arévalo) #4

Hey @Phil

When you say:

Do you mean you want to do something like what is described here?

If so, the syntax would be SELECT * FROM <TABLE>@<INDEX>.

Let me know if this helps!

Thanks,

Ron

0 Likes

(Philippe Laflamme) #5

Yes, something like that, but for the WINDOW operation. The issue isn’t really that it’s not using the index scan, but rather that the planner cannot leverage this to optimize out the global sort.

Something like this:

WINDOW w AS (PARTITION BY key,qualifier ORDER BY timestamp DESC)@primary

Which would force the windowing to operate over the primary index and thus, perhaps guarantee the ordering of its results (which could then lead to further optimizations).

0 Likes

(Ron Arévalo) #6

Hey @Phil,

I’ve created a feature request to allow window functions to maintain ordering, you can follow along here.

Thanks,

Ron

1 Like

(Philippe Laflamme) #7

Thanks very much! I’ll follow along, but am also willing and able to help move this forward.

0 Likes

(Tim O'Brien) #8

@Phil That’s great - we’re definitely open to contributions from the community. I’ll let Yahor comment on how to move forward.

0 Likes